Share DeepSeek-OCR: Contexts Optical Compression

Copy link

October 21, 2025

DeepSeek-OCR: Contexts Optical Compression

14 minutes

The provided text is an excerpt from the technical paper "DeepSeek-OCR: Contexts Optical Compression," which introduces a novel vision-language model (VLM) for Optical Character Recognition (OCR) developed by DeepSeek-AI. This model explores the concept of optical 2D mapping as a method for efficiently compressing long textual contexts, tackling the computational challenges faced by Large Language Models (LLMs) when processing long sequences. DeepSeek-OCR consists of the DeepEncoder and a DeepSeek3B-MoE decoder, engineered to maintain high precision even at high compression ratios, such as achieving 97% accuracy when text tokens are compressed by 10× into vision tokens. The document presents quantitative results demonstrating the model's state-of-the-art performance on benchmarks like OmniDocBench while utilizing significantly fewer vision tokens than competing models, establishing the feasibility of using visual modality for effective text compression. Furthermore, the paper discusses DeepSeek-OCR's practical utility, including its ability to generate large volumes of training data and its potential to simulate memory forgetting mechanisms in LLMs through progressive image downsampling.

...more

View all episodes

By Steven

October 21, 2025

DeepSeek-OCR: Contexts Optical Compression

14 minutes

...more

Sign up to save your podcasts