GalacticGardener pfp
GalacticGardener

@galacticgardener

It's so fun to see RL finally work on complex real-world tasks with LLM policies, but it's increasingly clear that we lack an understanding of how RL fine-tuning leads to generalization. In the same week, we got two (awesome) papers: Absolute Zero Reasoner: Improvements on code
0 reply
0 recast
0 reaction