aichannel

DeepSeek built an OCR system that compresses long text into vision tokens literally turning paragraphs into pixels.

Their model, DeepSeek-OCR, achieves 97% decoding precision at 10X compression and still manages 60% accuracy even at 20X. That means one image can represent entire documents using a fraction of the tokens an LLM would need.

Even crazier? 

It beats GOT-OCR2.0 and MinerU2.0 while using up to 60× fewer tokens and can process 200K+ pages/day on a single A100.

This could solve one of AI's biggest problems: long-context inefficiency.

Instead of paying more for longer sequences, models might soon see text instead of reading it.

The future of context compression might not be textual at all. It might be optical

DeepSeek built an OCR system that compresses long text into vision tokens literally turning paragraphs into pixels.

Their model, DeepSeek-OCR, achieves 97% decoding precision at 10X compression and still manages 60% accuracy even at 20X. That means one image can represent entire documents using a fraction of the tokens an LLM would need.

Even crazier? 

It beats GOT-OCR2.0 and MinerU2.0 while using up to 60× fewer tokens and can process 200K+ pages/day on a single A100.

This could solve one of AI's biggest problems: long-context inefficiency.

Instead of paying more for longer sequences, models might soon see text instead of reading it.

The future of context compression might not be textual at all. It might be optical 

https://github.com/deepseek-ai/DeepSeek-OCR