Two Voice Devs

Episode 227 - LLM Evaluation: Choosing the RIGHT Model


Listen Later

Are you overwhelmed by the sheer number of Large Language Models (LLMs) available? Choosing the right LLM for your project isn't about picking the most popular one – it's about understanding your specific needs and rigorously evaluating your options.


In this episode of Two Voice Devs, Allen Firstenberg and guest host Brad Nemer, a seasoned product manager, dive deep into the world of LLM evaluation. They go beyond the marketing buzz and explore practical tools and strategies for making informed decisions.


Whether you're a developer, a product manager, or just curious about the practical applications of LLMs, this episode provides invaluable insights into making the right choices for your projects. Don't get caught up in the hype – learn how to evaluate LLMs effectively!


More Info:

https://www.udacity.com/blog/2025/01/how-to-choose-the-right-ai-model-for-your-product.html


[00:00:00] Introduction: Meet Brad Niemer

[00:00:38] Brad's Journey to Product Management & AI

[00:03:12] Collaboration with Noble Ackerson and the LLM Evaluation Challenge

[00:05:23] The Role of a Product Manager.

[00:07:43] Product manager relation to engineering.

[00:13:46] Exploring Evaluation Tools: Hugging Face

[00:16:58] Exploring Evaluation Tools: Chatbot Arena (Human Evaluation)

[00:20:30] Chatbot Arena: Code Generation Evaluation

[00:24:43] Evaluating LLMs: Beyond Chatbots and Truth

[00:26:11] Exploring Evaluation Tools: Artificial Analysis (Quality, Speed, Price)

[00:28:47] Exploring Evaluation Tools: Galileo (Hallucination Report)

[00:31:16] Case Study: DeepSeek and the Importance of Contextual Evaluation

[00:34:53] The Future of LLM Testing and Quality Assurance

[00:37:49] Wrap Up contact information.


#LLM #LargeLanguageModels #AIEvaluation #ProductManagement #TechTalk #TwoVoiceDevs #HuggingFace #GenAI #GenerativeAI #ChatbotArena #ArtificialAnalysis #Galileo #DeepSeek #ChatGPT #Gemini #Mistral #Claude #ModelSelection #AIdevelopment #SoftwareDevelopment #Testing #QA #RAG #MachineLearning #NLP #Coding #TechPodcast #YouTubeTech #Developers

...more
View all episodesView all episodes
Download on the App Store

Two Voice DevsBy Mark and Allen

  • 1
  • 1
  • 1
  • 1
  • 1

1

1 ratings


More shows like Two Voice Devs

View all
Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

354 Listeners

The Daily AI Show by The Daily AI Show Crew - Brian, Beth, Jyunmi, Andy, Karl, and Eran

The Daily AI Show

3 Listeners