Bitget App
Trade smarter
Buy cryptoMarketsTradeFuturesEarnSquareMore
New initiative enhances AI access to Wikipedia information

New initiative enhances AI access to Wikipedia information

Bitget-RWA2025/10/01 13:25
By:Bitget-RWA

On Wednesday, Wikimedia Deutschland revealed a new database designed to make Wikipedia’s extensive information more easily available to AI systems.

Named the Wikidata Embedding Project, this platform utilizes a vector-based semantic search method—a process that enables computers to interpret the meanings and connections between words—on the vast data from Wikipedia and its related sites, which together hold close to 120 million records.

By integrating support for the Model Context Protocol (MCP)—a standard that enables AI to interact with data sources—the initiative allows LLMs to access the data through natural language queries more effectively.

Wikimedia’s German division developed the project in partnership with neural search company Jina.AI and DataStax, a real-time data training firm owned by IBM.

For years, Wikidata has provided machine-readable information from Wikimedia sites, but previous tools only supported keyword searches and SPARQL, a specialized query language. The updated system is better suited for retrieval-augmented generation (RAG) setups, which let AI models incorporate external knowledge, giving developers the ability to anchor their models in content reviewed by Wikipedia editors.

The data is organized to deliver essential semantic context. For example, searching for “scientist” in the database will yield lists of notable nuclear scientists, scientists affiliated with Bell Labs, translations of “scientist” in various languages, an approved Wikimedia image of scientists at work, and related terms like “researcher” and “scholar.”

Anyone can access the database on Toolforge. Additionally, Wikidata will host a webinar for developers interested in the project on October 9th.

This initiative arrives at a time when AI developers are urgently seeking reliable, high-quality data to refine their models. Training environments have grown more advanced—often built as intricate systems rather than simple datasets—but they still depend on carefully curated information. For applications demanding high precision, trustworthy data is crucial. While Wikipedia may have its critics, its content is far more fact-based than broad collections like Common Crawl, which aggregates vast numbers of web pages from the internet.

Sometimes, the pursuit of top-tier data can be costly for AI companies. For instance, in August, Anthropic agreed to pay $1.5 billion to settle a lawsuit with a group of authors whose works were used for training, resolving all related claims.

In a statement to the media, Wikidata AI project manager Philippe Saadé highlighted the project’s independence from major tech firms or leading AI labs. “The launch of this Embedding Project demonstrates that advanced AI doesn’t need to be dominated by a few corporations,” Saadé said. “It can be open, collaborative, and designed to benefit everyone.”

0

Disclaimer: The content of this article solely reflects the author's opinion and does not represent the platform in any capacity. This article is not intended to serve as a reference for making investment decisions.

PoolX: Earn new token airdrops
Lock your assets and earn 10%+ APR
Lock now!

You may also like

UK SFO's NFT Scam Case Signals Change in Crypto Regulation

- UK SFO investigates $28M NFT fraud case, arresting two men over Basis Markets scheme using false algorithmic trading promises. - Scheme combined NFT sales with hedge-fund pitches, siphoning funds into personal wallets instead of developing promised products. - Case marks first criminal prosecution centered on NFTs, signaling regulatory shift from enforcement to criminal charges in crypto fraud. - SFO highlights UK's blockchain tracking capabilities, urging victims to come forward as courts may set legal

Bitget-RWA2025/11/21 10:38
UK SFO's NFT Scam Case Signals Change in Crypto Regulation

Data Reporting Lags from Shutdown Complicate Fed's Rate Decision Amid Increase in Unemployment Claims

- U.S. jobless claims rose to 232,000 in the week ending October 18, exceeding forecasts and indicating a cooling labor market despite a recent decline in initial claims. - Continuing claims hit 1.957 million, the highest since early August, while a government shutdown delayed data releases, creating uncertainty ahead of the Fed’s December meeting. - The Fed’s rate-cut probability dropped to 30% as mixed labor market signals weakened arguments for aggressive easing, with Bitcoin and Treasury yields reactin

Bitget-RWA2025/11/21 10:38
Data Reporting Lags from Shutdown Complicate Fed's Rate Decision Amid Increase in Unemployment Claims

DASH Aster DEX Experiences On-Chain Growth: Signaling a Revival in DeFi

- DASH Aster DEX drove DeFi's 2025 revival with 330,000 new wallets and $27.7B daily trading volume via hybrid AMM-CEX model. - Platform's multi-chain AI routing engine and 1,650% ASTER token surge attracted institutional partnerships and $1.4B TVL. - Tokenomics with 5-7% annual burns and institutional credibility from Binance/YZi Labs partnerships reshaped DeFi's liquidity dynamics. - Sector-wide $181B DeFi market cap rebound reflects renewed retail/institutional demand for secure, yield-generating decent

Bitget-RWA2025/11/21 10:38

Bitcoin News Update: Fundamental Flaws Trigger $3 Trillion Cryptocurrency Collapse During Speculative Frenzy

- Cryptocurrency markets collapsed on Nov 21, 2025, with $1.93B in liquidations erasing $3T in value as Bitcoin and Ethereum plummeted amid unexplained volatility. - Speculative panic and structural fragility drove Bitcoin below $87,000 while 391,164 traders faced losses, highlighting market instability and regulatory gaps. - UK authorities seized $33M in crypto linked to Russian sanctions evasion, while Brazilian firm Rental Coins filed bankruptcy to recover fraud-linked assets. - Analysts warned of prolo

Bitget-RWA2025/11/21 10:22
Bitcoin News Update: Fundamental Flaws Trigger $3 Trillion Cryptocurrency Collapse During Speculative Frenzy