Stash Talk

S02E06 - Can LLMs (Large Language Models) really reason?


Listen Later

In this episode, Anna and Aiden discuss whether LLMs (Large Language Models) are good at reasoning? Or, are they force-fit to pass certain well-known benchmarks?


The material for this episode comes from two research studies. They are:

1. GSM-Symbolic: Understanding the Limitations of

Mathematical Reasoning in Large Language Models by
Iman Mirzadeh, Keivan Alizadeh, Hooman Shahrokhi
Oncel Tuzel, Samy Bengio and Mehrdad Farajtabar working at Apple

2. Functional Benchmarks for Robust Evaluation of

Reasoning Performance, and the Reasoning Gap by
Annarose M B, Anto P V, Shashank Menon, Ajay Sukumar,
Adwaith Samod T, Alan Philipose, Stevin Prince, and Sooraj Thomas from Consequent AI

...more
View all episodesView all episodes
Download on the App Store

Stash TalkBy stashtalk