April 24, 2025

γ-Bench: Evaluating LLMs in Multi-Agent Games

24 minutes

This paper introduces γ-Bench, a novel framework for evaluating the gaming ability of large language models (LLMs) in complex, multi-agent environments. It includes eight classical game theory scenarios with dynamic scoring and parameters to assess LLMs' robustness, generalizability, and strategic thinking. The study evaluates thirteen LLMs from six model families, revealing that Gemini-1.5-Pro currently achieves the top performance. The research also explores the impact of prompt engineering and different game settings on LLM decision-making.

...more

View all episodes

By Enoch H. Kang

April 24, 2025

γ-Bench: Evaluating LLMs in Multi-Agent Games

24 minutes

...more

Share γ-Bench: Evaluating LLMs in Multi-Agent Games

Sign up to save your podcasts

γ-Bench: Evaluating LLMs in Multi-Agent Games

γ-Bench: Evaluating LLMs in Multi-Agent Games