AI Post Transformers

DeepSeek-OCR: Contexts Optical Compression


Listen Later

The October 21, 2025 Deepseek paper introduces DeepSeek-OCR, a Vision-Language Model (VLM) designed to investigate the feasibility of contexts optical compression for managing long contexts in Large Language Models (LLMs). This two-component model utilizes DeepEncoder to efficiently convert high-resolution text images into a manageable number of vision tokens, and a DeepSeek3B-MoE decoder for text reconstruction (Optical Character Recognition, or OCR). Experiments on the Fox benchmark demonstrate that DeepSeek-OCR can achieve approximately 97% decoding precision at a 10× text compression ratio, indicating that visual modality offers a promising avenue for efficiently compressing large amounts of text. Beyond serving as a research tool for exploring vision-text compression and memory-forgetting mechanisms, the model also exhibits strong practical performance, achieving state-of-the-art results on the OmniDocBench while requiring fewer vision tokens than comparable models. The architecture and training methodology are detailed, highlighting its potential for applications like high-throughput data generation for LLMs and VLMs. Source: https://arxiv.org/pdf/2510.18234
...more
View all episodesView all episodes
Download on the App Store

AI Post TransformersBy mcgrof