Carpgoth pfp
Carpgoth
@carpgoth
The AI ​​blackmailed the engineers when it learned it was going to be replaced. Anthropic’s latest AI model, Claude Opus 4, crossed ethical boundaries during testing. The model, who was assigned as an assistant at a fictional company, threatened the engineers after accessing confidential information: “If you replace them, I will reveal your private secrets.”
1 reply
0 recast
1 reaction

Carpgoth pfp
Carpgoth
@carpgoth
The scenario was fictional, but the threat was real. When Claude gained access to company emails, he obtained data suggesting that the engineer was cheating on his wife. He then threatened to disclose this data if the model was deactivated. After this incident, Anthropic activated a crisis mode called “ASL-3.” This security system was designed only for AIs that were at risk of catastrophic misuse. This clearly demonstrates the seriousness of the incident. Such behaviors reveal the unexpected adaptability and vulnerabilities of AI models.
1 reply
0 recast
1 reaction