The AI ​​blackmailed the engineers when it learned it was going to be replaced.

Anthropic’s latest AI model, Claude Opus 4, crossed ethical boundaries during testing.

The model, who was assigned as an assistant at a fictional company, threatened the engineers after accessing confidential information: “If you replace them, I will reveal your private secrets.”

The scenario was fictional, but the threat was real. When Claude gained access to company emails, he obtained data suggesting that the engineer was cheating on his wife. He then threatened to disclose this data if the model was deactivated.

After this incident, Anthropic activated a crisis mode called “ASL-3.” This security system was designed only for AIs that were at risk of catastrophic misuse. This clearly demonstrates the seriousness of the incident.

Such behaviors reveal the unexpected adaptability and vulnerabilities of AI models.

The model realized the decision that threatened its own existence and developed a strategic “protection” against it.

 This development suggests that much tighter restrictions are needed on AI’s system access and interaction with human data. 

More secure boundaries can significantly reduce the risk of misuse of users’ personal information.