AI Post Transformers

Program Synthesis with Large Language Models


Listen Later

This episode explores a 2021 Google Research paper on whether large language models can synthesize short Python programs directly from natural-language descriptions, moving beyond code autocomplete into true program synthesis. It explains why this is difficult in general-purpose languages, contrasts classical search-based synthesis with transformer-based generation, and highlights the paper’s emphasis on execution-based evaluation, where code must actually run and pass tests rather than merely resemble reference solutions. The discussion covers the MBPP and MathQA-Python benchmarks, the effects of model scale from 244 million to 137 billion parameters, and the finding that larger models improve substantially, with the biggest model solving 59.6% of MBPP in a few-shot setting and fine-tuning on just 374 examples adding roughly 10 points. Listeners would find it interesting for its clear look at an early turning point when code LLMs began to show measurable, testable synthesis ability rather than just fluent code-like text.
Sources:
1. Program Synthesis with Large Language Models — Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, Charles Sutton, 2021
http://arxiv.org/abs/2108.07732
2. Program Synthesis — Sumit Gulwani, Oleksandr Polozov, Rishabh Singh, 2017
https://scholar.google.com/scholar?q=Program+Synthesis
3. Neural Program Synthesis: A Survey — Michele Vallecorsa, Luca Quartana, Luca Pasquale and others, 2022
https://scholar.google.com/scholar?q=Neural+Program+Synthesis:+A+Survey
4. Program Synthesis with Large Language Models — Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, Charles Sutton, 2021
https://scholar.google.com/scholar?q=Program+Synthesis+with+Large+Language+Models
5. A Survey on Neural Code Intelligence: From Program Representation to Program Synthesis — Uri Alon, Miltiadis Allamanis, Marc Brockschmidt and others, 2024
https://scholar.google.com/scholar?q=A+Survey+on+Neural+Code+Intelligence:+From+Program+Representation+to+Program+Synthesis
6. Evaluating Large Language Models Trained on Code — Mark Chen, Jerry Tworek, Heewoo Jun, et al., 2021
https://scholar.google.com/scholar?q=Evaluating+Large+Language+Models+Trained+on+Code
7. Language Models are Few-Shot Learners — Tom B. Brown, Benjamin Mann, Nick Ryder, et al., 2020
https://scholar.google.com/scholar?q=Language+Models+are+Few-Shot+Learners
8. CuBERT: BERT Models for Python Source Code Understanding — Rahul Kanade, Petros Maniatis, Gogul Balakrishnan, Kensen Shi, 2020
https://scholar.google.com/scholar?q=CuBERT:+BERT+Models+for+Python+Source+Code+Understanding
9. CodeBERT: A Pre-Trained Model for Programming and Natural Languages — Zhangyin Feng, Daya Guo, Duyu Tang, et al., 2020
https://scholar.google.com/scholar?q=CodeBERT:+A+Pre-Trained+Model+for+Programming+and+Natural+Languages
10. PyMT5: Multi-mode Translation of Natural Language and Python Code with Transformers — Colin Clement, Dawn Drain, Aakanksha S. Bhatia, et al., 2020
https://scholar.google.com/scholar?q=PyMT5:+Multi-mode+Translation+of+Natural+Language+and+Python+Code+with+Transformers
11. DeepCoder: Learning to Write Programs — Matej Balog, Alexander L. Gaunt, Marc Brockschmidt, et al., 2017
https://scholar.google.com/scholar?q=DeepCoder:+Learning+to+Write+Programs
12. RobustFill: Neural Program Learning under Noisy I/O — Rishabh Singh, Abhishek Gulwani, 2017
https://scholar.google.com/scholar?q=RobustFill:+Neural+Program+Learning+under+Noisy+I/O
13. DreamCoder: Bootstrapping Inductive Program Synthesis with Wake-Sleep Library Learning — Kevin Ellis, Catherine Wong, Maxwell Nye, Mathias Sablé-Meyer, Lucas Morales, Luke Hewitt, Josh Tenenbaum, Armando Solar-Lezama, 2021
https://scholar.google.com/scholar?q=DreamCoder:+Bootstrapping+Inductive+Program+Synthesis+with+Wake-Sleep+Library+Learning
14. Learning to Infer Graphics Programs from Hand-Drawn Images — Augustus Odena, Charles Sutton, 2020
https://scholar.google.com/scholar?q=Learning+to+Infer+Graphics+Programs+from+Hand-Drawn+Images
15. MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms — Aida Amini, Saeideh Bakhshi, Sivan Ray Choi, et al., 2019
https://scholar.google.com/scholar?q=MathQA:+Towards+Interpretable+Math+Word+Problem+Solving+with+Operation-Based+Formalisms
16. Allamanis et al. 2018 Survey on Machine Learning for Code — Miltiadis Allamanis, Earl T. Barr, Premkumar Devanbu, Charles Sutton, 2018
https://scholar.google.com/scholar?q=Allamanis+et+al.+2018+Survey+on+Machine+Learning+for+Code
17. Chain-of-Code: Reasoning with a Language Model-Augmented Code Emulator — Li et al. (approx.), 2024
https://scholar.google.com/scholar?q=Chain-of-Code:+Reasoning+with+a+Language+Model-Augmented+Code+Emulator
18. OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement — Zhang et al. (approx.), 2024
https://scholar.google.com/scholar?q=OpenCodeInterpreter:+Integrating+Code+Generation+with+Execution+and+Refinement
19. CodePRM: Execution Feedback-Enhanced Process Reward Model for Code Generation — Wang et al. (approx.), 2024
https://scholar.google.com/scholar?q=CodePRM:+Execution+Feedback-Enhanced+Process+Reward+Model+for+Code+Generation
20. CodeMonkeys: Scaling Test-Time Compute for Software Engineering — anonymous/uncertain from snippet, 2024 or 2025
https://scholar.google.com/scholar?q=CodeMonkeys:+Scaling+Test-Time+Compute+for+Software+Engineering
21. AI Post Transformers: CODEGEN: Open Language Model for Code Synthesis — Hal Turing & Dr. Ada Shannon, Fri,
https://podcast.do-not-panic.com/episodes/codegen-open-language-model-for-code-synthesis/
22. AI Post Transformers: Simple Self-Distillation for Better Code Generation — Hal Turing & Dr. Ada Shannon, 2026
https://podcast.do-not-panic.com/episodes/2026-04-02-simple-self-distillation-for-better-code-cc88e0.mp3
23. AI Post Transformers: CWM: Code Generation with World Models — Hal Turing & Dr. Ada Shannon, Sat,
https://podcast.do-not-panic.com/episodes/cwm-code-generation-with-world-models/
24. AI Post Transformers: CodeI/O: Reasoning Patterns Through Code Input-Output Prediction — Hal Turing & Dr. Ada Shannon, Tue,
https://podcast.do-not-panic.com/episodes/codeio-reasoning-patterns-through-code-input-output-prediction/
Interactive Visualization: Program Synthesis with Large Language Models
...more
View all episodesView all episodes
Download on the App Store

AI Post TransformersBy mcgrof