Jay Shah Podcast

The Hidden Flaws in AI Safety & Evaluation Benchmarks | Prof. Jackie Chi Kit Cheung


Listen Later

Dr. Jackie Cheung is an Associate Professor at McGill University where he co-directs the Reasoning and Learning Lab. He is also an Associate Scientific Director at Mila-Quebec Artificial Intelligence Institute. He and his team are developing computational models to improve the reliability, pragmatics, and evaluation of large language models to ensure they are contextually appropriate and factually grounded.Jackie was worked as a consultant researcher with Microsoft Research and before his current appointments, he earned his PhD and MSc in Computer Science from the University of Toronto, focusing on computational linguistics, and his BSc from the University of British Columbia.00:00:00 Highlight & Introduction00:02:04 Entrypoint in AI & NLP00:04:47 Academia vs. Industry: Career choices00:09:48 Language Revitalization using AI00:12:24 Addressing Biases & Data sovereignty in language revitalization 00:15:49 Evaluating LLMs as Judges00:17:14 Validity and reliability in LLM evaluation 00:25:11 Evidence-centered benchmark design (ECBD) framework00:30:38 Gaps in LLM benchmarks and meaning of "general purpose" AI00:35:24 General purpose intelligence vs reasoning00:40:16 Safety as an undefined bundle in LLMs00:51:45 Stochastic chameleons: how LLMs generalize and hallucinate 01:03:02 Potential & Biases of agentic frameworks for research01:05:52 Evaluating LLMs for summarization01:11:43 Scaling large language models01:16:33 Advice to beginners entering AI in 202601:20:33 Pitfalls to avoid in AI research & development More about Jackie & his research: https://www.cs.mcgill.ca/~jcheung/About the Host:Jay is a Machine Learning Engineer III at PathAI working on improving AI for medical diagnosis and prognosis. Linkedin: https://www.linkedin.com/in/shahjay22/Twitter: https://twitter.com/jaygshah22Homepage: https://jaygshah.github.io/ for any queries.Stay tuned for upcoming webinars!***Disclaimer: The information in this video represents the views and opinions of the speaker and does not necessarily represent the views or opinions of any institution. It does not constitute an endorsement by any Institution or its affiliates of such video content.***

...more
View all episodesView all episodes
Download on the App Store

Jay Shah PodcastBy Jay Shah

  • 5
  • 5
  • 5
  • 5
  • 5

5

14 ratings