Pretrained

Eating some mooncake


Listen Later

Kimi's serving architecture, mooncake to offload GPU memory to other chipsets, the ubiquity of vllm, and the growing standard LLM stack

...more
View all episodesView all episodes
Download on the App Store

PretrainedBy Pierce Freeman & Richard Diehl Martinez