MLOps.community

Cost/Performance Optimization with LLMs [Panel]


Listen Later

Sign up for the next LLM in production conference here: https://go.mlops.community/LLMinprod

Watch all the talks from the first conference: https://go.mlops.community/llmconfpart1

// Abstract

In this panel discussion, the topic of the cost of running large language models (LLMs) is explored, along with potential solutions. The benefits of bringing LLMs in-house, such as latency optimization and greater control, are also discussed. The panelists explore methods such as structured pruning and knowledge distillation for optimizing LLMs. OctoML's platform is mentioned as a tool for the automatic deployment of custom models and for selecting the most appropriate hardware for them. Overall, the discussion provides insights into the challenges of managing LLMs and potential strategies for overcoming them.
// Bio
Lina Weichbrodt
Lina is a pragmatic freelancer and machine learning consultant that likes to solve business problems end-to-end and make machine learning or a simple, fast heuristic work in the real world.
In her spare time, Lina likes to exchange with other people on how they can implement best practices in machine learning, talk to her at the Machine Learning Ops Slack: shorturl.at/swxIN.
Luis Ceze
Luis Ceze is Co-Founder and CEO of OctoML, which enables businesses to seamlessly deploy ML models to production making the most out of the hardware. OctoML is backed by Tiger Global, Addition, Amplify Partners, and Madrona Venture Group. Ceze is the Lazowska Professor in the Paul G. Allen School of Computer Science and Engineering at the University of Washington, where he has taught for 15 years.
Luis co-directs the Systems and Architectures for Machine Learning lab (sampl.ai), which co-authored Apache TVM, a leading open-source ML stack for performance and portability that is used in widely deployed AI applications.
Luis is also co-director of the Molecular Information Systems Lab (misl.bio), which led pioneering research in the intersection of computing and biology for IT applications such as DNA data storage. His research has been featured prominently in the media including New York Times, Popular Science, MIT Technology Review, and the Wall Street Journal. Ceze is a Venture Partner at Madrona Venture Group and leads their technical advisory board.
Jared Zoneraich
Co-Founder of PromptLayer, enabling data-driven prompt engineering. Compulsive builder. Jersey native, with a brief stint in California (UC Berkeley '20) and now residing in NYC.
Daniel Campos
Hailing from Mexico Daniel started his NLP journey with his BS in CS from RPI. He then worked at Microsoft on Ranking at Bing with LLM(back when they had 2 commas) and helped build out popular datasets like MSMARCO and TREC Deep Learning. While at Microsoft he got his MS in Computational Linguistics from the University of Washington with a focus on Curriculum Learning for Language Models. Most recently, he has been pursuing his Ph.D. at the University of Illinois Urbana Champaign focusing on efficient inference for LLMs and robust dense retrieval. During his Ph.D., he worked for companies like Neural Magic, Walmart, Qualtrics, and Mendel.AI and now works on bringing LLMs to search at Neeva.
Mario Kostelac
Currently building AI-powered products in Intercom in a small, highly effective team. I roam between practical research and engineering but lean more towards engineering and challenges around running reliable, safe, and predictable ML systems. You can imagine how fun it is in LLM era :).
Generally interested in the intersection of product and tech, and building a differentiation by solving hard challenges (technical or non-technical).
Software engineer turned into Machine Learning engineer 5 years ago.

...more
View all episodesView all episodes
Download on the App Store

MLOps.communityBy Demetrios

  • 4.9
  • 4.9
  • 4.9
  • 4.9
  • 4.9

4.9

20 ratings


More shows like MLOps.community

View all
Software Engineering Radio - the podcast for professional software developers by se-radio@computer.org

Software Engineering Radio - the podcast for professional software developers

272 Listeners

Data Skeptic by Kyle Polich

Data Skeptic

483 Listeners

Software Engineering Daily by Software Engineering Daily

Software Engineering Daily

625 Listeners

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) by Sam Charrington

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

444 Listeners

Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

298 Listeners

NVIDIA AI Podcast by NVIDIA

NVIDIA AI Podcast

323 Listeners

Data Engineering Podcast by Tobias Macey

Data Engineering Podcast

142 Listeners

DataFramed by DataCamp

DataFramed

266 Listeners

Practical AI by Practical AI LLC

Practical AI

190 Listeners

The Stack Overflow Podcast by The Stack Overflow Podcast

The Stack Overflow Podcast

64 Listeners

Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)

88 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

120 Listeners

Latent Space: The AI Engineer Podcast by swyx + Alessio

Latent Space: The AI Engineer Podcast

76 Listeners

AI + a16z by a16z

AI + a16z

31 Listeners

The Pragmatic Engineer by Gergely Orosz

The Pragmatic Engineer

52 Listeners