史迪仔 pfp
史迪仔
@chunc
We reproduced DeepSeek R1-Zero in the CountDown game, and it just works Through RL, the 3B base LM develops self-verification and search abilities all on its own You can experience the Ahah moment yourself for < $30
0 reply
0 recast
0 reaction