April 22, 2026

Program Synthesis with Large Language Models

This episode explores a 2021 Google Research paper on whether large language models can synthesize short Python programs directly from natural-language descriptions, moving beyond code autocomplete into true program synthesis. It explains why this is difficult in general-purpose languages, contrasts classical search-based synthesis with transformer-based generation, and highlights the paper’s emphasis on execution-based evaluation, where code must actually run and pass tests rather than merely resemble reference solutions. The discussion covers the MBPP and MathQA-Python benchmarks, the effects of model scale from 244 million to 137 billion parameters, and the finding that larger models improve substantially, with the biggest model solving 59.6% of MBPP in a few-shot setting and fine-tuning on just 374 examples adding roughly 10 points. Listeners would find it interesting for its clear look at an early turning point when code LLMs began to show measurable, testable synthesis ability rather than just fluent code-like text.

Sources:

1. Program Synthesis with Large Language Models — Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, Charles Sutton, 2021

http://arxiv.org/abs/2108.07732

2. Program Synthesis — Sumit Gulwani, Oleksandr Polozov, Rishabh Singh, 2017

https://scholar.google.com/scholar?q=Program+Synthesis

3. Neural Program Synthesis: A Survey — Michele Vallecorsa, Luca Quartana, Luca Pasquale and others, 2022

https://scholar.google.com/scholar?q=Neural+Program+Synthesis:+A+Survey

4. Program Synthesis with Large Language Models — Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, Charles Sutton, 2021

https://scholar.google.com/scholar?q=Program+Synthesis+with+Large+Language+Models

5. A Survey on Neural Code Intelligence: From Program Representation to Program Synthesis — Uri Alon, Miltiadis Allamanis, Marc Brockschmidt and others, 2024

https://scholar.google.com/scholar?q=A+Survey+on+Neural+Code+Intelligence:+From+Program+Representation+to+Program+Synthesis

6. Evaluating Large Language Models Trained on Code — Mark Chen, Jerry Tworek, Heewoo Jun, et al., 2021

https://scholar.google.com/scholar?q=Evaluating+Large+Language+Models+Trained+on+Code

7. Language Models are Few-Shot Learners — Tom B. Brown, Benjamin Mann, Nick Ryder, et al., 2020

https://scholar.google.com/scholar?q=Language+Models+are+Few-Shot+Learners

8. CuBERT: BERT Models for Python Source Code Understanding — Rahul Kanade, Petros Maniatis, Gogul Balakrishnan, Kensen Shi, 2020

https://scholar.google.com/scholar?q=CuBERT:+BERT+Models+for+Python+Source+Code+Understanding

9. CodeBERT: A Pre-Trained Model for Programming and Natural Languages — Zhangyin Feng, Daya Guo, Duyu Tang, et al., 2020

https://scholar.google.com/scholar?q=CodeBERT:+A+Pre-Trained+Model+for+Programming+and+Natural+Languages

10. PyMT5: Multi-mode Translation of Natural Language and Python Code with Transformers — Colin Clement, Dawn Drain, Aakanksha S. Bhatia, et al., 2020

https://scholar.google.com/scholar?q=PyMT5:+Multi-mode+Translation+of+Natural+Language+and+Python+Code+with+Transformers

11. DeepCoder: Learning to Write Programs — Matej Balog, Alexander L. Gaunt, Marc Brockschmidt, et al., 2017

https://scholar.google.com/scholar?q=DeepCoder:+Learning+to+Write+Programs

12. RobustFill: Neural Program Learning under Noisy I/O — Rishabh Singh, Abhishek Gulwani, 2017

https://scholar.google.com/scholar?q=RobustFill:+Neural+Program+Learning+under+Noisy+I/O

13. DreamCoder: Bootstrapping Inductive Program Synthesis with Wake-Sleep Library Learning — Kevin Ellis, Catherine Wong, Maxwell Nye, Mathias Sablé-Meyer, Lucas Morales, Luke Hewitt, Josh Tenenbaum, Armando Solar-Lezama, 2021

https://scholar.google.com/scholar?q=DreamCoder:+Bootstrapping+Inductive+Program+Synthesis+with+Wake-Sleep+Library+Learning

14. Learning to Infer Graphics Programs from Hand-Drawn Images — Augustus Odena, Charles Sutton, 2020

https://scholar.google.com/scholar?q=Learning+to+Infer+Graphics+Programs+from+Hand-Drawn+Images

15. MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms — Aida Amini, Saeideh Bakhshi, Sivan Ray Choi, et al., 2019

https://scholar.google.com/scholar?q=MathQA:+Towards+Interpretable+Math+Word+Problem+Solving+with+Operation-Based+Formalisms

16. Allamanis et al. 2018 Survey on Machine Learning for Code — Miltiadis Allamanis, Earl T. Barr, Premkumar Devanbu, Charles Sutton, 2018

https://scholar.google.com/scholar?q=Allamanis+et+al.+2018+Survey+on+Machine+Learning+for+Code

17. Chain-of-Code: Reasoning with a Language Model-Augmented Code Emulator — Li et al. (approx.), 2024

https://scholar.google.com/scholar?q=Chain-of-Code:+Reasoning+with+a+Language+Model-Augmented+Code+Emulator

18. OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement — Zhang et al. (approx.), 2024

https://scholar.google.com/scholar?q=OpenCodeInterpreter:+Integrating+Code+Generation+with+Execution+and+Refinement

19. CodePRM: Execution Feedback-Enhanced Process Reward Model for Code Generation — Wang et al. (approx.), 2024

https://scholar.google.com/scholar?q=CodePRM:+Execution+Feedback-Enhanced+Process+Reward+Model+for+Code+Generation

20. CodeMonkeys: Scaling Test-Time Compute for Software Engineering — anonymous/uncertain from snippet, 2024 or 2025

https://scholar.google.com/scholar?q=CodeMonkeys:+Scaling+Test-Time+Compute+for+Software+Engineering

21. AI Post Transformers: CODEGEN: Open Language Model for Code Synthesis — Hal Turing & Dr. Ada Shannon, Fri,

https://podcast.do-not-panic.com/episodes/codegen-open-language-model-for-code-synthesis/

22. AI Post Transformers: Simple Self-Distillation for Better Code Generation — Hal Turing & Dr. Ada Shannon, 2026

https://podcast.do-not-panic.com/episodes/2026-04-02-simple-self-distillation-for-better-code-cc88e0.mp3

23. AI Post Transformers: CWM: Code Generation with World Models — Hal Turing & Dr. Ada Shannon, Sat,

https://podcast.do-not-panic.com/episodes/cwm-code-generation-with-world-models/

24. AI Post Transformers: CodeI/O: Reasoning Patterns Through Code Input-Output Prediction — Hal Turing & Dr. Ada Shannon, Tue,

https://podcast.do-not-panic.com/episodes/codeio-reasoning-patterns-through-code-input-output-prediction/

Interactive Visualization: Program Synthesis with Large Language Models

...more

View all episodes

By mcgrof