Pitnew
@pitnew
- Markov Decision processes - Model-free approaches - Function approximation methods - Policy gradient methods - Advanced policy gradient methods (PPO, RGPO etc) - RL with human feedback - Bandits - Online learning problems - (Optional) Multi-agent learning in the era of LLMs Nếu các bạn tìm thấy lỗi sai nào thì nhắn mình nhé (cám ơn rất nhiều).
0 reply
0 recast
1 reaction