Intellectually Curious

Context Optical Compression: DeepSeek OCR and the New Frontier of Long-Context AI


Listen Later

We explore DeepSeek AI's groundbreaking idea of turning long documents into dense visual tokens to bypass transformer context limits. DeepSeek OCR uses a two-path encoder (an 80M SAM-based local reader and a 300M CLIP-based global model) connected by a 16x convolutional compressor, feeding a 570M-parameter MOE decoder. With 10x–20x compression, it achieves high OCR accuracy on Fox benchmarks, outperforms rivals with far fewer tokens, and scales to industrial volumes (200k pages/day on a single A100). We discuss implications for memory and potentially unlimited-context architectures, and note that the project is open-sourced for researchers and educators alike.


Note:  This podcast was AI-generated, and sometimes AI can make mistakes.  Please double-check any critical information.

Sponsored by Embersilk LLC

...more
View all episodesView all episodes
Download on the App Store

Intellectually CuriousBy Mike Breault