
Sign up to save your podcasts
Or


We unpack the SSD (Speculative Speculative Decoding) approach to speculative decoding—precomputing multiple token paths while the giant model validates the first guesses. Learn how Saguaro, geometric fanout, and Saguaro sampling cut idle compute, enable up to 5x speedups on models like Llama 3 and Qan3, and why smart fallbacks keep the pipeline humming. Plus, explore the broader implications for self-optimizing systems and future AI hardware.
Note: This podcast was AI-generated, and sometimes AI can make mistakes. Please double-check any critical information.
Sponsored by Embersilk LLC
By Mike BreaultWe unpack the SSD (Speculative Speculative Decoding) approach to speculative decoding—precomputing multiple token paths while the giant model validates the first guesses. Learn how Saguaro, geometric fanout, and Saguaro sampling cut idle compute, enable up to 5x speedups on models like Llama 3 and Qan3, and why smart fallbacks keep the pipeline humming. Plus, explore the broader implications for self-optimizing systems and future AI hardware.
Note: This podcast was AI-generated, and sometimes AI can make mistakes. Please double-check any critical information.
Sponsored by Embersilk LLC