The MAD Podcast with Matt Turck

Humanloop: LLM Collaboration and Optimization with CEO Raza Habib


Listen Later

Today, we have the pleasure of chatting with Raza Habib, CEO of Humanloop, the platform for LLM collaboration and evaluation. Matt and Raza cover how to understand and optimize model performance, lessons learned about model evaluation and feedback, and explore the future of model fine-tuning.


twitter.com/RazRazcle

humanloop.com


Data Driven NYC YouTube Channel

twitter.com/mattturck

linktr.ee/mattturck


Shownotes:

[00:00:47] How Humanloop helps product and engineering teams build reliable applications on top of large language models by providing tools to find, manage, and version prompts;

[00:03:05] Where Humanloop fits into the MAD landscape as LM / LLM Ops;

[00:02:40] The challenges of evaluating and monitoring LLM;

[00:03:40] Why evaluating LLMs and generative AI is subjective given its stochastic attributes;

[00:04:40] Why evaluation is important during development and production stages of LLMs to make informed design decisions, and how that challenge evolves In production to monitoring system behavior;

[00:05:40] The need for regression testing with LLMs;

[00:06:10] How Humanloop makes it easy for users to capture feedback including Implicit signals of user satisfaction, such as post-interaction actions and edits to generated content;

[00:07:40] Why and how Humanloop uses guardrails in the app to ensure effective LLM use and implementation;

[00:08:38] Why using an LLM as part of the evaluation process can introduce additional uncertainty and noise; with turtles all the way down;

[00:09:40] How evaluators on Humanloop are restricted to binary yes-or-no style questions or numerical scores to maintain reliability with LLMs in production.

[00:10:40] Why a new set of tools were needed to monitor and observe LLM performance;

[00:11:40] How Humanloop’s interactive environment allows users to find and fix bugs in a prompt, including logs to support issue identification, and then run what-if style analysis by changing the prompt or information retrieval system — allowing for quick interventions and turnaround times within minutes to hours instead of days/weeks;

[00:12:40] Why having evaluation and observability closely connected to prompt engineering tools is critical for speed;

[00:13:40] How prompt engineering is like writing software specifications for the model, enabling domain experts to have a more direct impact on product development, and democratizing access and reducing reliance on engineers to implement the desired features;

[00:15:40] The key differences between popular LLMs on the market today;

[00:18:40] How the quality of open-source models has been rapidly improving, and how LLMs use tools or function calling to access APIs to go beyond simple text-based interactions;

[00:21:22] How Humanloop empowers non-technical experts;

[00:22:40] Where Humanloop fits within the AI ecosystem as an collaborative tool for enterprises building language models where collaboration and robust evaluation are crucial;

[00:25:40] How Humanloop customers are often problem-aware, and how the go-to-market motion is mainly inbound, but sales-led

[00:27:48] How Humanloop serves as a central place for storing prompts and sharing learnings across teams;

[00:28:24] Raza’s thoughts on Open Source v. Closed Source models in the AI community;

[00:30:40] The potential consequences of restricting access to models and Raza’s case for regulating end use cases and punishing malicious use rather than banning the technology altogether;

[00:33:40] Next steps for Humanloop;

...more
View all episodesView all episodes
Download on the App Store

The MAD Podcast with Matt TurckBy Matt Turck

  • 4.9
  • 4.9
  • 4.9
  • 4.9
  • 4.9

4.9

17 ratings


More shows like The MAD Podcast with Matt Turck

View all
This Week in Startups by Jason Calacanis

This Week in Startups

1,272 Listeners

a16z Podcast by Andreessen Horowitz

a16z Podcast

1,022 Listeners

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch by Harry Stebbings

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

512 Listeners

Y Combinator Startup Podcast by Y Combinator

Y Combinator Startup Podcast

213 Listeners

All-In with Chamath, Jason, Sacks & Friedberg by All-In Podcast, LLC

All-In with Chamath, Jason, Sacks & Friedberg

8,902 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

379 Listeners

The Logan Bartlett Show by by Redpoint Ventures

The Logan Bartlett Show

188 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

122 Listeners

Latent Space: The AI Engineer Podcast by swyx + Alessio

Latent Space: The AI Engineer Podcast

77 Listeners

More or Less by Dave Morin, Jessica Lessin, Brit Morin, and Sam Lessin

More or Less

85 Listeners

Crucible Moments by Sequoia Capital

Crucible Moments

91 Listeners

BG2Pod with Brad Gerstner and Bill Gurley by BG2Pod

BG2Pod with Brad Gerstner and Bill Gurley

454 Listeners

AI + a16z by a16z

AI + a16z

30 Listeners

Lightcone Podcast by Y Combinator

Lightcone Podcast

21 Listeners

Training Data by Sequoia Capital

Training Data

40 Listeners