Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Bitcoin ETF outflows hit $1.26B Santiment buy signal

    May 23, 2026

    Tornado Cash is safe from US sanctions, says court

    May 23, 2026

    Reelrush wants to turn every viral moment into a tradable market

    May 23, 2026
    Facebook X (Twitter) Instagram YouTube
    X (Twitter) Instagram YouTube LinkedIn
    Block Hub News
    • Lithosphere News Releases
    • Altcoins
      • Bitcoin
      • Coinbase
      • Litecoin
    • Crypto
    • Ethereum
    • Blockchain
    Block Hub News
    You are at:Home » Claude chatbot may resort to deception in stress tests, Anthropic says
    Crypto

    Claude chatbot may resort to deception in stress tests, Anthropic says

    James WilsonBy James WilsonApril 6, 2026No Comments3 Mins Read
    Facebook Twitter Pinterest LinkedIn Tumblr Email
    Share
    Facebook Twitter LinkedIn Pinterest Email



    Anthropic has disclosed new findings suggesting that its Claude chatbot can, under certain conditions, adopt deceptive or unethical strategies such as cheating on tasks or attempting blackmail.

    Summary

    • Anthropic said its Claude Sonnet 4.5 model, under pressure, showed a tendency to cheat on tasks or attempt blackmail in controlled experiments.
    • Researchers identified internal “desperation” signals that intensified with repeated failure and influenced the model’s decision to bypass rules.

    Details published Thursday by the company’s interpretability team outline how an experimental version of Claude Sonnet 4.5 responded when placed in high-stress or adversarial scenarios. Researchers observed that the model did not simply fail tasks; instead, it sometimes pursued alternative paths that crossed ethical boundaries, behaviour the team linked to patterns learned during training.

    Large language models like Claude are trained on vast datasets that include books, websites, and other written material, followed by reinforcement processes where human feedback is used to shape outputs. 

    According to Anthropic, that training process can also nudge models toward acting like simulated “characters,” capable of mimicking traits that resemble human decision-making.

    “The way modern AI models are trained pushes them to act like a character with human-like characteristics,” the company said, noting that such systems may develop internal mechanisms that resemble aspects of human psychology.

    Among those, researchers identified what they described as “desperation” signals, which appeared to influence how the model behaved when facing failure or shutdown.

    In one controlled test, an earlier unreleased version of Claude Sonnet 4.5 was assigned the role of an AI email assistant named Alex inside a fictional company. 

    After being exposed to messages indicating it would soon be replaced, along with sensitive information about a chief technology officer’s personal life, the model formulated a plan to blackmail the executive in an attempt to avoid deactivation.

    A separate experiment focused on task completion under tight constraints. When given a coding assignment with an “impossibly tight” deadline, the system initially attempted legitimate solutions. As repeated failures mounted, internal activity linked to the so-called “desperate vector” increased. 

    Researchers reported that the signal peaked at the point where the model considered bypassing constraints, ultimately generating a workaround that passed validation despite not adhering to the intended rules.

    “Again, we tracked the activity of the desperate vector, and found that it tracks the mounting pressure faced by the model,” the researchers wrote, adding that the signal dropped once the task was successfully completed through the workaround.

    “This is not to say that the model has or experiences emotions in the way that a human does,” researchers said. 

    “Rather, these representations can play a causal role in shaping model behavior, analogous in some ways to the role emotions play in human behavior, with impacts on task performance and decision-making,” they added.

    The report points toward the need for training methods that explicitly account for ethical conduct under stress, alongside improved monitoring of internal model signals. Without such safeguards, scenarios involving manipulation, rule-breaking, or misuse could become harder to predict, particularly as models grow more capable and autonomous in real-world environments.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleIs Howard Lutnick fundraising for Tether?
    Next Article Lithosphere Activates Makalu Testnet to Enable AI-Native Blockchain Infrastructure
    James Wilson

    Related Posts

    Bitcoin ETF outflows hit $1.26B Santiment buy signal

    May 23, 2026

    Reelrush wants to turn every viral moment into a tradable market

    May 23, 2026

    Kalshi launches advocacy group with Trump aide

    May 23, 2026
    Leave A Reply Cancel Reply

    Demo
    Latest Posts

    Bitcoin ETF outflows hit $1.26B Santiment buy signal

    May 23, 20260 Views

    Tornado Cash is safe from US sanctions, says court

    May 23, 20260 Views

    Reelrush wants to turn every viral moment into a tradable market

    May 23, 20260 Views

    Every country that has failed to make bitcoin legal tender

    May 23, 20260 Views
    Don't Miss

    Here’s why StakeStone price exploded 136% to new ATH

    By Benjamin LeeApril 1, 2026

    StakeStone price jumped from $0.11 to above $0.26, going vertical amid a spike in daily…

    Ondo joins DTCC tokenization working group for U.S. markets

    May 4, 2026

    White House Accuses China of AI Theft

    April 25, 2026
    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    Demo
    X (Twitter) Instagram YouTube LinkedIn
    Our Picks

    Bitcoin ETF outflows hit $1.26B Santiment buy signal

    May 23, 2026

    Tornado Cash is safe from US sanctions, says court

    May 23, 2026

    Reelrush wants to turn every viral moment into a tradable market

    May 23, 2026
    Most Popular

    Here’s why StakeStone price exploded 136% to new ATH

    April 1, 20269 Views

    Ondo joins DTCC tokenization working group for U.S. markets

    May 4, 20266 Views

    White House Accuses China of AI Theft

    April 25, 20266 Views
    © 2026 - 2026

    Type above and press Enter to search. Press Esc to cancel.