emo.eth pfp
emo.eth
@emo.eth
does anyone have a link to that study where they fine-tuned an LLM on wrong answers to math problems and, as a side-effect, it started giving out explicitly malicious and dangerous advice? (iirc something like dangerous combos of cleaning chemicals) or did i imagine that insane implications for misalignment
1 reply
0 recast
9 reactions

rubinovitz pfp
rubinovitz
@rubinovitz
https://arxiv.org/html/2502.17424v1
3 replies
1 recast
5 reactions

Jordan pfp
Jordan
@ruminations
troubling
1 reply
0 recast
0 reaction

azb pfp
azb
@azbest
seems like new Grok
1 reply
0 recast
0 reaction

emo.eth pfp
emo.eth
@emo.eth
grok going full HH is indeed what made me think of it, lol
0 reply
0 recast
0 reaction