
Sign up to save your podcasts
Or


In this episode of The CTO Show with Mehmet, Mehmet sits down with Eugene Cheah, CEO of Featherless AI. The AI bottleneck is no longer just GPU access. Power, memory, inference cost, and model reliability are becoming the real constraints.
Eugene reframes the AI infrastructure debate away from a simple race for bigger models and more chips. The conversation connects energy capacity, HBM shortages, open source model adoption, linear attention architectures, and the enterprise need for predictable AI systems. It also challenges the assumption that the best AI strategy is always to use the largest available model.
If you are building, investing in, or operating AI infrastructure, this conversation gives a clearer view of where AI economics, hardware constraints, and production reliability are heading.
About the Guest
Eugene Cheah is the CEO of Featherless AI, an AI startup making open source AI models accessible through a single platform.
Featherless AI started from AI research and optimization work around RWKV architecture, with a focus on reducing inference cost and making AI models more accessible. Eugene’s work sits at the intersection of open source AI, model efficiency, GPU infrastructure, HBM constraints, and inference optimization.
He is well positioned to frame this shift because Featherless AI works directly on the infrastructure layer between developers, open models, and production inference.
LinkedIn: https://www.linkedin.com/in/eugene-cheah-a47791126/
Website: https://featherless.ai
Key Takeaways
What You Will Learn
Episode Highlights
00:00 — AI infrastructure moves beyond the GPU race
03:30 — Nvidia, AMD, and Huawei follow different hardware strategies
07:30 — Power becomes the first AI infrastructure bottleneck
08:30 — HBM pressure exposes the memory constraint
12:00 — AI follows the same pluralism as databases
15:00 — Developers start with big models, then specialize
18:30 — Transformer memory scaling becomes an economic problem
23:30 — Hardware choice starts weakening platform lock-in
29:30 — Reliability matters more than raw intelligence
36:00 — Open source gives enterprises model control
41:30 — Small models can now build real applications
Resources Mentioned
Listen Now
Available on all major podcast platforms and YouTube.
Connect with the Show
Follow The CTO Show with Mehmet for more conversations at the intersection of technology, startups, and venture capital.
By Mehmet GonulluIn this episode of The CTO Show with Mehmet, Mehmet sits down with Eugene Cheah, CEO of Featherless AI. The AI bottleneck is no longer just GPU access. Power, memory, inference cost, and model reliability are becoming the real constraints.
Eugene reframes the AI infrastructure debate away from a simple race for bigger models and more chips. The conversation connects energy capacity, HBM shortages, open source model adoption, linear attention architectures, and the enterprise need for predictable AI systems. It also challenges the assumption that the best AI strategy is always to use the largest available model.
If you are building, investing in, or operating AI infrastructure, this conversation gives a clearer view of where AI economics, hardware constraints, and production reliability are heading.
About the Guest
Eugene Cheah is the CEO of Featherless AI, an AI startup making open source AI models accessible through a single platform.
Featherless AI started from AI research and optimization work around RWKV architecture, with a focus on reducing inference cost and making AI models more accessible. Eugene’s work sits at the intersection of open source AI, model efficiency, GPU infrastructure, HBM constraints, and inference optimization.
He is well positioned to frame this shift because Featherless AI works directly on the infrastructure layer between developers, open models, and production inference.
LinkedIn: https://www.linkedin.com/in/eugene-cheah-a47791126/
Website: https://featherless.ai
Key Takeaways
What You Will Learn
Episode Highlights
00:00 — AI infrastructure moves beyond the GPU race
03:30 — Nvidia, AMD, and Huawei follow different hardware strategies
07:30 — Power becomes the first AI infrastructure bottleneck
08:30 — HBM pressure exposes the memory constraint
12:00 — AI follows the same pluralism as databases
15:00 — Developers start with big models, then specialize
18:30 — Transformer memory scaling becomes an economic problem
23:30 — Hardware choice starts weakening platform lock-in
29:30 — Reliability matters more than raw intelligence
36:00 — Open source gives enterprises model control
41:30 — Small models can now build real applications
Resources Mentioned
Listen Now
Available on all major podcast platforms and YouTube.
Connect with the Show
Follow The CTO Show with Mehmet for more conversations at the intersection of technology, startups, and venture capital.