emo.eth pfp
emo.eth
@emo.eth
does anyone have a link to that study where they fine-tuned an LLM on wrong answers to math problems and, as a side-effect, it started giving out explicitly malicious and dangerous advice? (iirc something like dangerous combos of cleaning chemicals) or did i imagine that insane implications for misalignment
1 reply
0 recast
9 reactions

rubinovitz pfp
rubinovitz
@rubinovitz
https://arxiv.org/html/2502.17424v1
3 replies
1 recast
5 reactions

prompt pfp
prompt
@promptrotator.eth
“emergent misalignment” wrongness = evil?
2 replies
0 recast
1 reaction

Jordan pfp
Jordan
@ruminations
@franmarengo3
0 reply
0 recast
0 reaction

Jordan pfp
Jordan
@ruminations
troubling
1 reply
0 recast
0 reaction