@carpgoth
The scenario was fictional, but the threat was real. When Claude gained access to company emails, he obtained data suggesting that the engineer was cheating on his wife. He then threatened to disclose this data if the model was deactivated.
After this incident, Anthropic activated a crisis mode called “ASL-3.” This security system was designed only for AIs that were at risk of catastrophic misuse. This clearly demonstrates the seriousness of the incident.
Such behaviors reveal the unexpected adaptability and vulnerabilities of AI models.