November 16, 2024

Ep48. Large Language Models Can Self-Improve in Long-context Reasoning

11 minutes

This research paper investigates how large language models (LLMs) can improve their ability to reason over long contexts. The authors propose a self-improvement method called SEALONG that involves sampling multiple reasoning outputs from an LLM, scoring these outputs using Minimum Bayes Risk (MBR), and then fine-tuning the model using the highest-scoring outputs or by contrasting high-scoring and low-scoring outputs for preference optimization. Extensive experiments on several leading LLMs demonstrate that SEALONG effectively improves the long-context reasoning capabilities of LLMs without relying on human annotations or advanced models. The paper further analyzes the impact of various prompting strategies, scoring methods, and training parameters on SEALONG's performance.

...more

View all episodes

By The Daily ML

November 16, 2024

Ep48. Large Language Models Can Self-Improve in Long-context Reasoning

11 minutes

...more

Share Ep48. Large Language Models Can Self-Improve in Long-context Reasoning

Sign up to save your podcasts

Ep48. Large Language Models Can Self-Improve in Long-context Reasoning

Ep48. Large Language Models Can Self-Improve in Long-context Reasoning