Mechanical Dreams

DeepSeek OCR Paper


Listen Later

In this episode:
• A Picture is Worth a Thousand Tokens: The hosts introduce the challenge of long context in LLMs and present the paper's radical idea: compressing text by taking a picture of it.
• Compressing Text into Pixels: A deep dive into the main concept of optical compression, exploring how a page of text can be represented with far fewer vision tokens than text tokens.
• The Secret Sauce: DeepEncoder: An explanation of the novel 'DeepEncoder' architecture, which efficiently processes high-resolution images into a small number of vision tokens for the language model to read.
• The Proof is in the Pixels: Discussion of the experimental results, focusing on the impressive ~97% accuracy at a 10x compression ratio and its superior efficiency on industry benchmarks.
• Forgetting, The Smart Way: Exploring the broader implications of optical compression, particularly the paper's proposal to use it as a 'forgetting mechanism' for ultra-long contexts that mimics human memory.
...more
View all episodesView all episodes
Download on the App Store

Mechanical DreamsBy Mechanical Dirk