Javid Iqbal pfp
Javid Iqbal

@javidiqbal

To create a reward model for reinforcement learning, we needed to collect comparison data, which consisted of two or more model responses ranked by quality
11 replies
1 recast
3 reactions