
Sign up to save your podcasts
Or
In this tech talk, we dive deep into the technical specifics around LLM inference.
The big question is: Why are LLMs slow? How can they be faster? And might slow inference affect UX in the next generation of AI-powered software?
We jump into:
In this tech talk, we dive deep into the technical specifics around LLM inference.
The big question is: Why are LLMs slow? How can they be faster? And might slow inference affect UX in the next generation of AI-powered software?
We jump into: