Wait so RL is just endlessly tweaking hyperparameters?

I can reason out the math and theory of the system. But why 0.6 converges and 0.1 does not, baffling

All-Round investor / Crypto OG  / NFT collector 

Web3 + Base🔵 cooking 👨‍🍳 /fc-kr Owner

Banging my head against the wall trying to get a function to output values that seem reasonable is my jam tho

Yup, welcome to RL. It’s math on paper, but vibes and hacks in practice.

RL is just the art of designing reward function

Recursion. 
That's the answer.
Always has been.

I didnt fully understood but maybe if you explain further I can catch up? I speak Spanish

Yeah, the sensitivity to hyperparameters can be wild.