Model sizes are crazy these days with billions and billions of parameters. As Mark Kurtz explains in this episode, this makes inference slow and expensive despite the fact that up to 90%+ of the parameters don’t influence the outputs at all. Mark helps us understand all of the practicalities and progress that is being made in model optimization and CPU inference, including the increasing opportunities to run LLMs and other Generative AI models on commodity hardware.

Large models on CPUs

Making artificial intelligence practical, productive &amp; accessible to everyone.  Practical AI is a show in which technology professionals, business people, students, enthusiasts, and expert guests engage in lively discussions about Artificial Intelligence and related topics (Machine Learning, Deep Learning, Neural Networks, GANs, MLOps, AIOps, LLMs &amp; more). The focus is on productive implementations and real-world scenarios that are accessible to everyone. If you want to keep up with the latest advances in AI, while keeping one foot in the real world, then this is the show for you!

Technology

Tech News

Podcasting

Gadgets

Software How-To

We recently gathered some Practical AI listeners for a live webinar with Danny from LibreChat to discuss the future of private, open source chat UIs. During the discussion we hear about the motivations behind LibreChat, why enterprise users are hosting their own chat UIs, and how Danny (and the LibreChat community) is creating amazing features (like RAG and plugins).

Private, open source chat UIs

First there was Mamba… now there is Jamba from AI21. This is a model that combines the best non-transformer goodness of Mamba with good ‘ol attention layers. This results in a highly performant and efficient model that AI21 has open sourced! We hear all about it (along with a variety of other LLM things) from AI21’s co-founder Yoav.

Mamba & Jamba

2024 promises to be the year of multi-modal AI, and we are already seeing some amazing things. In this “fully connected” episode, Chris and Daniel explore the new Udio product/service for generating music. Then they dig into the differences between recent multi-modal efforts and more “traditional” ways of combining data modalities.

Udio & the age of multi-modal AI

Daniel & Chris delight in conversation with “the funniest guy in AI”, Demetrios Brinkmann.  Together they explore the results of the MLOps Community’s latest survey. They also preview the upcoming AI Quality Conference.

RAG continues to rise

In this fully connected episode, Daniel & Chris discuss NVIDIA GTC keynote comments from CEO Jensen Huang about teaching kids to code.  Then they dive into the notion of “community” in the AI world, before discussing challenges in the adoption of generative AI by non-technical people.  They finish by addressing the evolving balance between generative AI interfaces and search engines.

Large models on CPUs

Download our free app to listen on your phone