
Sign up to save your podcasts
Or


AI in Tokenization: Byte‑Latent Transformers, H‑Net & Bolmo Hosted by Nathan Rigoni | Guest: Jordan Conragan – Research Engineer at 11 Labs (formerly a Lockheed Martin colleague)
How can we make language models treat every byte of text as efficiently as a byte‑level transformer, and what does that mean for the future of AI‑driven computation? Could dynamic patching and entropy‑based chunking finally solve the tokenization bottleneck that limits model size, speed, and math reasoning?
What you will learn
Resources mentioned
Why this episode matters
Tokenization is the hidden cost driver behind today’s trillion‑parameter models. By moving from fixed sub‑word vocabularies to entropy‑aware, dynamically sized patches, we can dramatically reduce sequence length, lower compute budgets, and improve numerical reasoning—key steps toward making large language models more accessible, faster, and better at math. The discussion also surfaces the trade‑offs of maintaining a separate tokenizer versus learning chunking jointly with the model, a design choice that will shape the next generation of efficient AI systems.
Subscribe for more deep dives, visit www.phronesis-analytics.com, or email [email protected].
Keywords: Byte latent transformer, byte‑level tokenization, entropy‑based patching, dynamic chunking, H‑Net, Bolmo, model compression, bits‑per‑parameter, AI efficiency, mathematical reasoning.
By Nathan RigoniAI in Tokenization: Byte‑Latent Transformers, H‑Net & Bolmo Hosted by Nathan Rigoni | Guest: Jordan Conragan – Research Engineer at 11 Labs (formerly a Lockheed Martin colleague)
How can we make language models treat every byte of text as efficiently as a byte‑level transformer, and what does that mean for the future of AI‑driven computation? Could dynamic patching and entropy‑based chunking finally solve the tokenization bottleneck that limits model size, speed, and math reasoning?
What you will learn
Resources mentioned
Why this episode matters
Tokenization is the hidden cost driver behind today’s trillion‑parameter models. By moving from fixed sub‑word vocabularies to entropy‑aware, dynamically sized patches, we can dramatically reduce sequence length, lower compute budgets, and improve numerical reasoning—key steps toward making large language models more accessible, faster, and better at math. The discussion also surfaces the trade‑offs of maintaining a separate tokenizer versus learning chunking jointly with the model, a design choice that will shape the next generation of efficient AI systems.
Subscribe for more deep dives, visit www.phronesis-analytics.com, or email [email protected].
Keywords: Byte latent transformer, byte‑level tokenization, entropy‑based patching, dynamic chunking, H‑Net, Bolmo, model compression, bits‑per‑parameter, AI efficiency, mathematical reasoning.