<p>Kimi's serving architecture, mooncake to offload GPU memory to other chipsets, the ubiquity of vllm, and the growing standard LLM stack</p>

Kimi's serving architecture, mooncake to offload GPU memory to other chipsets, the ubiquity of vllm, and the growing standard LLM stack

Eating some mooncake

10 years after studying at Stanford, two friends have somehow become AI experts. One builds startups, the other studies at Cambridge - together they break down LLMs and machine learning with zero BS and maximum banter.

Share Eating some mooncake

Sign up to save your podcasts

Eating some mooncake

Eating some mooncake