mskr
@mskr
LOL wrong reward function for ppo based model can so easily lead to exploitation. Reinforcement learning is fun
0 reply
0 recast
1 reaction