April 20, 2025

Building AI with Foundation Models #2: Evaluating Foundation Models

39 minutes

This episode discusses the critical role of evaluation in the development and deployment of AI systems, especially foundation models. It highlights the challenges associated with assessing these open-ended models, noting the inadequacy of simple methods and the growing need for systematic and automated approaches. The text explores various evaluation methodologies, including language modeling metrics, exact evaluation using functional correctness and similarity measurements, and subjective evaluation with AI as a judge. Finally, it examines the complexities of model selection, considering factors like open source versus proprietary models, the limitations of public benchmarks, and the necessity of designing custom evaluation pipelines aligned with specific application needs

...more

View all episodes

By kw

April 20, 2025

Building AI with Foundation Models #2: Evaluating Foundation Models

39 minutes

...more

Share Building AI with Foundation Models #2: Evaluating Foundation Models

Sign up to save your podcasts

Building AI with Foundation Models #2: Evaluating Foundation Models

Building AI with Foundation Models #2: Evaluating Foundation Models