@kazani
Experiential Reinforcement Learning: a step toward AI that truly learn from experience.
ERL process:
1. First Attempt: The model makes an initial attempt at a task and receives environmental feedback.
2. Self-Reflection: If the first attempt fails, the model generates a verbal self-reflection to analyze what went wrong and how to improve.
3. Second Attempt: The model uses this reflection as guidance to produce a refined second attempt.
4. Internalization: Successful second attempts are "internalized" into the base policy using self-distillation. This allows the model to reproduce the improved behavior in the future without needing the extra reflection step at deployment.
5. Cross-Episode Memory: Successful reflections are stored in a persistent memory, providing stable corrective patterns that the model can reuse across different tasks.
It moves LLM training toward a system grounded in experience, where agents continually adapt and learn from their own interactions.
https://arxiv.org/abs/2602.13949