December 18, 2024

FACTS Grounding Leaderboard: Benchmarking LLMs' Factuality

15 minutes

This notebook describes FACTS Grounding, a new system that tests how well large language models (LLMs) can give accurate answers based on long documents. FACTS Grounding uses a collection of documents and questions created by humans to challenge LLMs. The system then uses other LLMs as judges to decide if the answers are accurate and if they follow the instructions in the question. The goal is to see how well LLMs can understand and use information from long texts, without making things up or ignoring what the question asked. The researchers found that using multiple LLM judges is important because LLMs tend to be biased towards their own answers. FACTS Grounding will be continuously updated with new models, helping researchers improve the accuracy and reliability of LLMs.

https://storage.googleapis.com/deepmind-media/FACTS/FACTS_grounding_paper.pdf

...more

View all episodes

By AIPPD

December 18, 2024

FACTS Grounding Leaderboard: Benchmarking LLMs' Factuality

15 minutes

https://storage.googleapis.com/deepmind-media/FACTS/FACTS_grounding_paper.pdf

...more

Share FACTS Grounding Leaderboard: Benchmarking LLMs' Factuality

Sign up to save your podcasts

FACTS Grounding Leaderboard: Benchmarking LLMs' Factuality

FACTS Grounding Leaderboard: Benchmarking LLMs' Factuality