@kimiai
Long CoT models improve performance by a lot.
Can short models learn from long ones to obtain even better performance? Our long2short idea explored this possibility and it worked well! Much better token efficiency compared to native short models like GPT-4o.
A few methods we experimented with---RL with heavy length penalty, merging long-CoT models with short-CoT models, etc.
⬇️Check out our tech report for details: github.com/MoonshotAI/Kim…