
Sign up to save your podcasts
Or


In this episode, Anna and Aiden discuss whether LLMs (Large Language Models) are good at reasoning? Or, are they force-fit to pass certain well-known benchmarks?
The material for this episode comes from two research studies. They are:
1. GSM-Symbolic: Understanding the Limitations of
2. Functional Benchmarks for Robust Evaluation of
By stashtalkIn this episode, Anna and Aiden discuss whether LLMs (Large Language Models) are good at reasoning? Or, are they force-fit to pass certain well-known benchmarks?
The material for this episode comes from two research studies. They are:
1. GSM-Symbolic: Understanding the Limitations of
2. Functional Benchmarks for Robust Evaluation of