Learning GenAI via SOTA Papers

EP164: [LACONIC] Teaching AI to stop overthinking


Listen Later

The paper introduces LACONIC (Length-Aware Constrained Policy Optimization), a novel reinforcement learning (RL) framework designed to reduce the verbosity of Large Language Model (LLM) outputs during fine-tuning. While RL-tuning typically enhances reasoning skills, it often leads to excessively long responses that increase inference latency and computational overhead.

Unlike previous methods that rely on fixed heuristic penalties, LACONIC treats length control as a constrained optimization problem. Its core features include:

  • Primal-Dual Algorithm: It maximizes task rewards (like accuracy) while enforcing a target token budget.
  • Clipped Cost Function: To prevent the model from collapsing into overly short, degenerate outputs, LACONIC uses a "clipped cost" that only penalizes responses exceeding the specified budget.
  • Adaptive Multiplier ($\lambda$): A dual variable is automatically adjusted throughout training. It increases the penalty if the model exceeds the budget and decreases it when the model is compliant, making the system robust and tuning-free.
  • Performance and Efficiency: On mathematical reasoning tasks, LACONIC reduces output length by over 50% while preserving or even improving task accuracy (pass@1).
  • Resource Savings: Compared to standard RL-tuning (GRPO), LACONIC is 19% faster and consumes 22% less GPU memory because it generates fewer tokens during the training process.
  • Generalization: The method maintains strong performance on out-of-domain benchmarks, such as general knowledge and logic reasoning, with 44% fewer tokens.

Overall, LACONIC provides a stable and reliable method for developers to enforce specific deployment targets, such as latency or token limits, without sacrificing the model's reasoning capabilities.

Key Innovation: Adaptive Length ControlMajor Results

...more
View all episodesView all episodes
Download on the App Store

Learning GenAI via SOTA PapersBy Yun Wu