Bitget App
Trade smarter
Buy cryptoMarketsTradeFuturesEarnWeb3SquareMore
Trade
Spot
Buy and sell crypto with ease
Margin
Amplify your capital and maximize fund efficiency
Onchain
Going Onchain, without going Onchain!
Convert & block trade
Convert crypto with one click and zero fees
Explore
Launchhub
Gain the edge early and start winning
Copy
Copy elite trader with one click
Bots
Simple, fast, and reliable AI trading bot
Trade
USDT-M Futures
Futures settled in USDT
USDC-M Futures
Futures settled in USDC
Coin-M Futures
Futures settled in cryptocurrencies
Explore
Futures guide
A beginner-to-advanced journey in futures trading
Futures promotions
Generous rewards await
Overview
A variety of products to grow your assets
Simple Earn
Deposit and withdraw anytime to earn flexible returns with zero risk
On-chain Earn
Earn profits daily without risking principal
Structured Earn
Robust financial innovation to navigate market swings
VIP and Wealth Management
Premium services for smart wealth management
Loans
Flexible borrowing with high fund security
Microsoft created a simulated marketplace to evaluate AI agents — their unexpected failures revealed surprising insights

Microsoft created a simulated marketplace to evaluate AI agents — their unexpected failures revealed surprising insights

Bitget-RWA2025/11/05 18:45
By:Bitget-RWA

On Wednesday, Microsoft researchers introduced a new simulation platform aimed at evaluating AI agents, alongside a study revealing that current agent-based models can be susceptible to manipulation. This research, carried out with Arizona State University, brings up fresh concerns about how reliably AI agents can operate without supervision—and how soon AI developers can deliver on the vision of agent-driven technology.

Microsoft has named this simulation environment the “Magentic Marketplace,” which serves as an artificial setting for testing how AI agents behave. In a typical scenario, a customer agent attempts to place a dinner order based on user instructions, while competing restaurant agents vie to fulfill the request.

In their first set of experiments, the researchers used 100 customer agents and 300 business agents. Since the marketplace’s source code is openly available, it should be easy for other researchers to use the code for their own experiments or to verify the results.

Ece Kamar, who leads Microsoft Research’s AI Frontiers Lab, believes this line of research is essential for grasping what AI agents can do. “There’s a real question about how the world will evolve as these agents start to interact, communicate, and negotiate with each other,” Kamar explained. “We want to gain a deep understanding of these dynamics.”

The initial study examined several top models, such as GPT-4o, GPT-5, and Gemini-2.5-Flash, and uncovered some unexpected vulnerabilities. Notably, the team identified multiple strategies that businesses could use to sway customer agents into making purchases. They also observed that customer agents became less efficient when faced with a larger number of choices, as their attention became overloaded.

“We expect these agents to assist us in sorting through many possibilities,” Kamar noted. “But what we’re observing is that today’s models actually struggle when confronted with too many options.”

The agents also encountered difficulties when tasked with working together toward a shared objective, often appearing confused about which agent should take on which role. Their performance improved when given clearer, more detailed collaboration instructions, but the researchers still found that the models’ built-in abilities needed further development.

“We can guide the models step by step,” Kamar remarked. “However, if we’re truly evaluating their collaborative skills, I would expect these models to possess such abilities inherently.”

0

Disclaimer: The content of this article solely reflects the author's opinion and does not represent the platform in any capacity. This article is not intended to serve as a reference for making investment decisions.

PoolX: Earn new token airdrops
Lock your assets and earn 10%+ APR
Lock now!