Daily Tech Feed: From the Labs

Saguaro: The Algorithm That Doesn't Wait


Listen Later

Episode 0023: Making the Wait Do Work

Why it matters. Links to arXiv:2603.03251. Explains Saguaro / SSD — the second speculation layer that keeps the draft model productive during verifier execution. 2× faster than optimized SD, 5× autoregressive, lossless.

Stanford / Together AI. Links to the paper, GitHub (tanishqkumar/ssd), and both model pages (Llama 3.1 70B target, Llama 3.2 1B draft).

The Researchers. Three authors with confirmed Google Scholar IDs:

- Tanishq Kumar — Stanford CS PhD
- Tri Dao — Princeton / Together AI, FlashAttention
- Avner May — Staff Research Scientist, Together AI

Key Technical Concepts. Links to: original speculative decoding (arXiv:2211.17192), speculative sampling (arXiv:2302.01318), FlashAttention (arXiv:2205.14135), FlashAttention-2 (arXiv:2307.08691), Llama 3.1 paper. Covers the three core challenges, the 90% bonus token prediction result, cache hit/miss fallback, and the CPU branch prediction analogy from the paper.

~20 verified links total. All arXiv IDs pulled from search results, no fabricated URLs.

...more
View all episodesView all episodes
Download on the App Store

Daily Tech Feed: From the LabsBy Daily Tech Feed