Kazani pfp
Kazani

@kazani

Experiential Reinforcement Learning: a step toward AI that truly learn from experience. ERL process: 1. First Attempt: The model makes an initial attempt at a task and receives environmental feedback. 2. Self-Reflection: If the first attempt fails, the model generates a verbal self-reflection to analyze what went wrong and how to improve. 3. Second Attempt: The model uses this reflection as guidance to produce a refined second attempt. 4. Internalization: Successful second attempts are "internalized" into the base policy using self-distillation. This allows the model to reproduce the improved behavior in the future without needing the extra reflection step at deployment. 5. Cross-Episode Memory: Successful reflections are stored in a persistent memory, providing stable corrective patterns that the model can reuse across different tasks. It moves LLM training toward a system grounded in experience, where agents continually adapt and learn from their own interactions. https://arxiv.org/abs/2602.13949
0 reply
0 recast
11 reactions