wiz
@wiz
funny how there is still so much low hanging fruit: "The results, as shown in Figure 1, clearly demonstrate that removing the easiest samples leads to consistent performance improvements. In contrast, both the unfiltered dataset (which lacks sufficient challenge) and the aggressively filtered dataset (which is overly saturated with difficult problems) hinder training progress. These findings confirm that optimal RL training requires a balanced difficulty distribution—one that provides enough challenging samples to drive learning while avoiding both trivial problems and overwhelming difficulty."
0 reply
0 recast
0 reaction