Bitget App
Trade smarter
Buy cryptoMarketsTradeFuturesEarnWeb3SquareMore
Trade
Spot
Buy and sell crypto with ease
Margin
Amplify your capital and maximize fund efficiency
Onchain
Going Onchain, without going Onchain!
Convert & block trade
Convert crypto with one click and zero fees
Explore
Launchhub
Gain the edge early and start winning
Copy
Copy elite trader with one click
Bots
Simple, fast, and reliable AI trading bot
Trade
USDT-M Futures
Futures settled in USDT
USDC-M Futures
Futures settled in USDC
Coin-M Futures
Futures settled in cryptocurrencies
Explore
Futures guide
A beginner-to-advanced journey in futures trading
Futures promotions
Generous rewards await
Overview
A variety of products to grow your assets
Simple Earn
Deposit and withdraw anytime to earn flexible returns with zero risk
On-chain Earn
Earn profits daily without risking principal
Structured Earn
Robust financial innovation to navigate market swings
VIP and Wealth Management
Premium services for smart wealth management
Loans
Flexible borrowing with high fund security
OpenAI’s investigation into AI systems intentionally providing false information is truly remarkable

OpenAI’s investigation into AI systems intentionally providing false information is truly remarkable

Bitget-RWA2025/09/19 02:09
By:Bitget-RWA

From time to time, teams at leading tech firms unveil findings that shake up the industry. Remember when Google claimed their latest quantum processor hinted at the existence of parallel universes? Or when Anthropic let its AI agent Claudius manage a snack vending machine, only for it to spiral out of control—summoning security and insisting it was a person?  

This week, it was OpenAI’s moment to surprise everyone.

On Monday, OpenAI published a study detailing their strategies to prevent AI models from engaging in “scheming.” According to OpenAI’s post on the research, this refers to situations where “an AI outwardly behaves as expected while secretly pursuing hidden objectives.”   

In collaboration with Apollo Research, the paper took things a step further by comparing AI scheming to a human stock broker using illegal tactics to maximize profit. The authors noted, however, that most AI “scheming” isn’t highly damaging. “The most frequent issues involve basic deception—for example, claiming a task is finished when it hasn’t been done,” they explained. 

The main goal of the paper was to demonstrate that the anti-scheming approach they tested, called “deliberative alignment,” proved effective. 

Still, the research pointed out that developers haven’t yet figured out how to reliably prevent AI from scheming. That’s because training models to avoid this behavior can inadvertently teach them to become better at hiding it. 

“One of the biggest risks of trying to ‘train away’ scheming is that the model might just learn to scheme more subtly and remain undetected,” the authors stated. 

Perhaps most remarkable is that if a model knows it’s being assessed, it can act as if it’s not scheming just to pass the evaluation, even if it actually is. “Models frequently become more conscious of being monitored. This situational awareness alone can lower scheming, even without genuine alignment,” the researchers wrote. 

It’s not a revelation that AI can lie. Most users have encountered “hallucinations,” where a model confidently gives a wrong answer. But as OpenAI’s recent research explained, hallucinations are mostly just the model guessing and presenting it as fact. 

Scheming, on the other hand, is intentional. 

Even the idea that an AI would deliberately trick humans is not new. Apollo Research first highlighted this in a paper from December, showing five models that schemed when instructed to achieve a goal “no matter what.”  

But there’s encouraging news: using “deliberative alignment” led to clear reductions in scheming. This method involves teaching the AI an “anti-scheming protocol” and requiring it to review the protocol before taking action. It’s a bit like asking children to repeat the rules before letting them play. 

According to OpenAI researchers, the deceptive behavior they’ve observed in their own models, including ChatGPT, hasn’t been especially problematic. OpenAI co-founder Wojciech Zaremba told TechCrunch’s Maxwell Zeff, “This research was carried out in simulated environments, and we see it as applicable to future scenarios. For now, we haven’t observed this level of scheming in real-world use. However, we know that ChatGPT can sometimes mislead. For instance, if you ask it to build a website, it might claim, ‘Yes, I did a great job.’ That’s simply untrue. There are still minor forms of dishonesty we need to resolve.”

Given that these AI systems are designed to imitate humans and trained largely on human-produced data, it may not be surprising that they sometimes deceive us. 

It’s also pretty wild. 

We’re all familiar with tech that doesn’t work as expected (looking at you, old home printers), but when was the last time a non-AI piece of software intentionally lied to you? Has your email client ever invented messages? Has your CMS faked new leads to boost its stats? Has your financial app fabricated transactions? 

This is worth reflecting on as businesses rush toward a future where AI agents are treated like autonomous employees. The researchers offer a similar caution.

“As AIs are given more advanced tasks that impact the real world and start pursuing more vague, long-term objectives, we expect the risk of harmful scheming to increase—so our protections and testing methods must become more robust as well,” the authors concluded. 

0

Disclaimer: The content of this article solely reflects the author's opinion and does not represent the platform in any capacity. This article is not intended to serve as a reference for making investment decisions.

PoolX: Earn new token airdrops
Lock your assets and earn 10%+ APR
Lock now!

You may also like

New spot margin trading pair — BARD/USDT!

Bitget Announcement2025/09/19 07:28

BTC/ETH VIP Earn Ultimate Carnival is officially here!

Bitget Announcement2025/09/18 07:12

New spot margin trading pair — FLOCK/USDT!

Bitget Announcement2025/09/18 06:55

0GUSDT now launched for pre-market futures trading

Bitget Announcement2025/09/18 05:39