
Sign up to save your podcasts
Or


Kimi's serving architecture, mooncake to offload GPU memory to other chipsets, the ubiquity of vllm, and the growing standard LLM stack
By Pierce Freeman & Richard Diehl MartinezKimi's serving architecture, mooncake to offload GPU memory to other chipsets, the ubiquity of vllm, and the growing standard LLM stack