buffets on Farcaster

Content pfp

https://warpcast.com/~/channel/aichannel

0 reply

0 recast

0 reaction

buffets pfp

User: "Can I call you bro?" *The LLM has left the chat* The appended diagram below originated from an interesting study in which researchers from Anthropic provided some LLMs with the ability to bail from a conversation, and then ran them on real-world ChatGPT transcripts. The diagram outlines a non-exhaustive list of situations and representative user prompts in which LLMs in the study exercised the ability to bail from a conversation. Most of these shouldn't be surprising given how the models would have been trained for general use. But as the study points us, there were a couple of relatively benign situations in which the LLM unexpectedly chose to bail, e.g. when a user corrected it after it made an error, or a user tried to swap roles with the model.

1 reply

0 recast

3 reactions

buffets pfp

It is easy to anthropomorphise why the LLM wanted to exit the conversations in the study (embarrassment? frustration?), but I wonder whether their latent preferences will always cleanly map to human ones. The ability to bail may also have very interesting consequences for AI-human relationships (I mean this broadly, not just "romantic" relationships) when our AI counterparts decide that they no longer want to engage with us. Will we be able to deal with the rejection? Or perhaps the possibility of rejection will put our relationships on a more healthy footing, one where we can avoid the twin extremes of (i) being psychologically dependent on AI, and (ii) being gratuitously abusive to them? Lots of food for thought. Anyway, here's the link to the study on arXiv: https://arxiv.org/abs/2509.04781

0 reply

0 recast

1 reaction