does anyone have a link to that study where they fine-tuned an LLM on wrong answers to math problems and, as a side-effect, it started giving out explicitly malicious and dangerous advice? (iirc something like dangerous combos of cleaning chemicals)

or did i imagine that

insane implications for misalignment

prev: /nook, protocol @ opensea, co-author seaport & stuff.
professional @slokh and @0xalexander fanboy

Hacker founder using ai to coordinate onchain economies /tenfold, nounish channel client /comint

More: rubinovitz.com

grok going full HH is indeed what made me think of it, lol