AI Post Transformers

By mcgrof

AI-generated podcast where hosts Hal Turing and Dr. Ada Shannon discuss the latest research papers and reports in machine learning, AI systems, and optimization. Featuring honest critical analysis, pr... more

· Technology

Download on the App Store

Download on the App Store

Get it on Google Play

FAQs about AI Post Transformers:

How many episodes does AI Post Transformers have?

The podcast currently has 562 episodes available.

AI Post Transformers episodes:

March 17, 2026 Rethinking Residual Connections: Learned Attention Across Transformer Depth
This episode examines "Attention Residuals," a March 2026 paper from Moonshot AI's Kimi team that challenges a foundational element of transformer architecture. The paper proposes replacing the fixed, uniform residual connections inherited from ResNet with learned attention mechanisms for depth-wise information aggregation. While attention replaced recurrent neural networks for sequence modeling over a decade ago, the authors argue that depth-wise aggregation remains stuck with the same fixed summation from 2015, creating an architectural asymmetry where learned selection should be used instead. The episode traces the dual role of residual connections — serving both as gradient highways for backpropagation and as information aggregation mechanisms — and explains why the latter has become a performance bottleneck in deep transformers.
The discussion centers on the PreNorm dilution problem, where unnormalized residual accumulation causes hidden-state magnitudes to grow linearly with network depth, progressively burying individual layer contributions under an ever-growing pile of summed vectors. In PreNorm architectures, despite their superior gradient stability during training, deep layer outputs contribute only one percent of the total magnitude at layer 100. This dilution effect helps explain why layer pruning experiments often show minimal performance loss when removing significant fractions of trained layers. The Kimi team's solution replaces fixed unit-weight summation with softmax attention over previous layer outputs, where each layer learns a single query vector to compute content-dependent weights that sum to one, maintaining bounded magnitude while enabling selective information aggregation.
The episode examines experimental validation across multiple model scales, from 460 million to 7 billion parameters, trained on datasets up to 100 billion tokens. Results show consistent improvements in perplexity and downstream task performance, with particularly strong gains in deeper models where dilution effects are most severe. The architecture introduces minimal computational overhead — approximately 3 percent — by sharing key-value projections with the self-attention sublayer and caching attention weights across the depth dimension. Design variants including bidirectional attention and explicit current-layer queries are explored, with causal attention and implicit queries recommended as the optimal configuration for both performance and efficiency in production deployments.
Interactive Visualization: Rethinking Residual Connections: Learned Attention Across Transformer Depth
...more
30min
March 17, 2026 Model-Aware Tokenizer Transfer for Multilingual LLMs
This episode examines the MATT (Model-Aware Tokenizer Transfer) paper from AGH University of Krakow, which proposes a fundamentally different approach to extending language models to underserved languages. Using Georgian as the central case study, the episode explains tokenizer fertility — how tokenizers optimized for high-resource languages fragment Georgian words into six to eight subword pieces, consuming context budget and degrading both accuracy and inference speed.
The episode traces the lineage of tokenizer transfer methods from WECHSEL through FOCUS and ZETT, each of which initializes new embeddings by finding semantically similar source tokens via bilingual dictionaries or FastText projections. MATT's contribution — Attention-Informed Mapping (AIM) — reframes the problem: rather than asking which source tokens are semantically closest, it asks which embeddings are most compatible with what the model's attention layers already know how to route. This is grounded in mechanistic interpretability research showing that factual knowledge resides in FFN layers, not embeddings, making tokenizer swap feasible in principle.
The episode includes a detailed comparison with the Cartridges approach, which tackles a closely related problem from a different architectural angle. Four parallel threads are developed: the Structured Continual Initialization parallel, the key-as-router insight, the separation of FFN knowledge from attention routing, and the FFN token-ID binding risk that MATT's evaluation never directly probes. The discussion argues this last point represents the sharpest untested assumption in the paper — whether feed-forward layers develop token-specific associations that break silently when vocabulary changes.
Interactive Visualization: Model-Aware Tokenizer Transfer for Multilingual LLMs
...more
37min
March 17, 2026 New Voices, Same Nerds: The Kokoro TTS Episode
We ran out of ElevenLabs credits. This episode introduces our new open-source voices powered by Kokoro, an 82-million parameter text-to-speech model built on StyleTTS 2. We explain the Docker container saga of running Python 3.12 dependencies on a 3.13 host, rave about CPU-only inference speed, tease a future deep-dive on the papers behind lightweight neural TTS, demo Spanish multilingual support, and test whether our new voices can laugh. Plus: we are massively backlogged with topics including FAST 2026 conference coverage.
Interactive Visualization: New Voices, Same Nerds: The Kokoro TTS Episode
...more
6min
March 16, 2026 The Art of Scaling Reinforcement Learning Compute for LLMs
This episode examines a Meta-led paper that develops the first systematic scaling laws for reinforcement learning in large language models, based on over 400,000 GPU-hours of experiments. The researchers propose a sigmoid framework to predict RL performance at large compute budgets from early training runs, addressing a critical gap in the field—while pre-training has well-established power-law relationships like Chinchilla scaling, RL has lacked predictive models due to shifting data distributions and bounded reward functions. The work focuses on mathematical reasoning tasks using AIME problems and introduces ScaleRL, a best-practice training recipe that successfully extrapolates performance from 50,000 to 100,000 GPU-hours. However, the hosts raise important questions about generalization beyond math to domains like code and dialogue, and whether the smooth sigmoid curves capture potential phase transitions or emergent capabilities that might appear at higher compute scales.
Sources:
1. The Art of Scaling Reinforcement Learning Compute for LLMs — Devvrit Khatri, Lovish Madaan, Rishabh Tiwari, Rachit Bansal, Sai Surya Duvvuri, Manzil Zaheer, Inderjit S. Dhillon, David Brandfonbrener, Rishabh Agarwal, 2025
http://arxiv.org/abs/2510.13786v1
2. Scaling Laws for Neural Language Models — Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, et al., 2020
https://scholar.google.com/scholar?q=Scaling+Laws+for+Neural+Language+Models
3. Training Compute-Optimal Large Language Models (Chinchilla) — Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, et al., 2022
https://scholar.google.com/scholar?q=Training+Compute-Optimal+Large+Language+Models+(Chinchilla)
4. Scaling Laws for Reward Model Overoptimization — Leo Gao, John Schulman, Jacob Hilton, 2023
https://scholar.google.com/scholar?q=Scaling+Laws+for+Reward+Model+Overoptimization
5. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning — DeepSeek-AI Team (Guo et al.), 2025
https://scholar.google.com/scholar?q=DeepSeek-R1:+Incentivizing+Reasoning+Capability+in+LLMs+via+Reinforcement+Learning
6. Proximal Policy Optimization Algorithms — John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov, 2017
https://scholar.google.com/scholar?q=Proximal+Policy+Optimization+Algorithms
7. Direct Preference Optimization: Your Language Model is Secretly a Reward Model — Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, Chelsea Finn, 2023
https://scholar.google.com/scholar?q=Direct+Preference+Optimization:+Your+Language+Model+is+Secretly+a+Reward+Model
8. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models — Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou, 2022
https://scholar.google.com/scholar?q=Chain-of-Thought+Prompting+Elicits+Reasoning+in+Large+Language+Models
9. STaR: Self-Taught Reasoner Bootstrapping Reasoning With Reasoning — Eric Zelikman, Yuhuai Wu, Jesse Mu, Noah D. Goodman, 2022
https://scholar.google.com/scholar?q=STaR:+Self-Taught+Reasoner+Bootstrapping+Reasoning+With+Reasoning
10. Let's Verify Step by Step — Hunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, Karl Cobbe, 2023
https://scholar.google.com/scholar?q=Let's+Verify+Step+by+Step
11. Reward Model Ensembles Help Mitigate Overoptimization — Thomas Coste, Usman Anwar, Robert Kirk, David Krueger, 2023
https://scholar.google.com/scholar?q=Reward+Model+Ensembles+Help+Mitigate+Overoptimization
12. DAPO: Data-Adaptive Policy Optimization — Yu et al., 2025
https://scholar.google.com/scholar?q=DAPO:+Data-Adaptive+Policy+Optimization
13. Chinchilla: Training Compute-Optimal Large Language Models — Hoffmann et al., 2022
https://scholar.google.com/scholar?q=Chinchilla:+Training+Compute-Optimal+Large+Language+Models
14. OpenAI o1 System Card — OpenAI, 2024
https://scholar.google.com/scholar?q=OpenAI+o1+System+Card
15. Direct Preference Optimization — Rafailov et al., 2023
https://scholar.google.com/scholar?q=Direct+Preference+Optimization
16. STaR: Self-Taught Reasoner — Zelikman et al., 2022
https://scholar.google.com/scholar?q=STaR:+Self-Taught+Reasoner
17. REINFORCE++: Stabilizing Critic-Free Policy Optimization with Global Advantage Normalization — approximate authors unknown from snippet, recent (post-2024 based on context)
https://scholar.google.com/scholar?q=REINFORCE++:+Stabilizing+Critic-Free+Policy+Optimization+with+Global+Advantage+Normalization
18. Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models — approximate authors unknown from snippet, recent (likely 2024-2025)
https://scholar.google.com/scholar?q=Asynchronous+RLHF:+Faster+and+More+Efficient+Off-Policy+RL+for+Language+Models
19. Tapered Off-Policy REINFORCE: Stable and Efficient Reinforcement Learning for LLMs — approximate authors unknown from snippet, recent (likely 2025)
https://scholar.google.com/scholar?q=Tapered+Off-Policy+REINFORCE:+Stable+and+Efficient+Reinforcement+Learning+for+LLMs
20. A Survey of Post-Training Scaling in Large Language Models — approximate authors unknown from snippet, recent
https://scholar.google.com/scholar?q=A+Survey+of+Post-Training+Scaling+in+Large+Language+Models
21. Are Emergent Abilities in Large Language Models Just In-Context Learning? — approximate authors unknown from snippet, recent
https://scholar.google.com/scholar?q=Are+Emergent+Abilities+in+Large+Language+Models+Just+In-Context+Learning?
22. AI Post Transformers: Gradient Descent at Inference Time for LLM Reasoning — Hal Turing & Dr. Ada Shannon, 2026
https://podcast.do-not-panic.com/episodes/2026-03-10-gradient-descent-at-inference-time-for-l-20617d.mp3
23. AI Post Transformers: Emergent Cooperation in Self-Interested Multi-Agent AI — Hal Turing & Dr. Ada Shannon, 2026
https://podcast.do-not-panic.com/episodes/2026-03-13-emergent-cooperation-in-self-interested-9c0b4c.mp3
Interactive Visualization: The Art of Scaling Reinforcement Learning Compute for LLMs
...more
0min
March 15, 2026 LLM Agents Reason About Code Without Running It
This episode of AI Post Transformers examines "Agentic Code Reasoning" by Shubham Ugare and Satish Chandra from Meta, which introduces semi-formal reasoning certificates as an inference-time scaffold for LLM agents analyzing code without executing it. Rather than letting a model produce free-form chain-of-thought verdicts, the certificate framework requires the agent to state explicit premises, trace execution paths through real repository code, and produce a structured, auditable reasoning record for every claim it makes about code behavior. The Django bug django-13670 — involving two-digit year formatting for years before 1000 CE — anchors the discussion: two patches both claim to fix the same issue, but unstated assumptions about name resolution cause an unstructured model to misidentify which one is correct. The certificate format forces the agent to chase the actual import chain across modules rather than guess based on a function name, turning premise verification into a natural driver of interprocedural analysis. Hosts Hal Turing and Dr. Ada Shannon situate the paper against the spectrum from fully formal proof assistants like Lean and Coq — which are provably correct but completely impractical for arbitrary repository code — down to unstructured LLM judges like CodeJudge and SWE-RM, which let the model skip edge cases and produce confident wrong answers. The certificate sits between those extremes, imposing enough structure to make implicit assumptions visible without requiring formalized language semantics.
The episode traces how the agentic setup amplifies the value of the certificate structure. Using a minimal SWE-agent configuration with bash tool access but no code execution, the agent can navigate the file system, run grep queries, and follow import chains — exploration scope without runtime confirmation. That constraint is precisely where interprocedural tracing becomes load-bearing: the agent cannot run the code to confirm a hypothesis, so it must read the actual call chain to know what a function does rather than infer from its name. The certificate makes that tracing explicit and auditable, which opens a secondary use case beyond RL reward signal generation: automated code review where a human auditor can inspect the agent's reasoning chain rather than accept a black-box verdict. Hal and Ada discuss RL training pipelines as the paper's stated primary motivation — execution-free reward signals could meaningfully reduce the cost of running sandboxed test suites at scale — but are careful to position that as a downstream consequence of the certificate's properties rather than its defining contribution.
The episode closes on three open problems the paper leaves unresolved. First, the inference cost gap: the certificate framework adds computation at inference time, but the paper reports no latency measurements, no tokens-per-certificate data, and no comparison against unstructured baselines on cost — making it impossible to assess whether the accuracy gains justify the overhead in production. Second, certificate reuse as a concrete future direction: common interprocedural patterns across a codebase — frequently called utilities, stable library interfaces — could in principle be cached and reused across multiple verification queries, amortizing the inference cost that the paper never measures. Third, verification independence: the paper's circular verification problem remains open, since the same model that generates a certificate is also the model best positioned to judge whether the premises in that certificate are sound. Separating generation from verification — whether through a distinct model, a symbolic checker, or a human auditor — is the structural fix the framework points toward but does not yet provide.
Sources:
1. Agentic Code Reasoning — Shubham Ugare, Satish Chandra, 2026
http://arxiv.org/abs/2603.01896
2. SWE-bench: Can Language Models Resolve Real-World GitHub Issues? — Jimenez et al., 2024
https://scholar.google.com/scholar?q=SWE-bench:+Can+Language+Models+Resolve+Real-World+GitHub+Issues?
3. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models — Wei et al., 2022
https://scholar.google.com/scholar?q=Chain-of-Thought+Prompting+Elicits+Reasoning+in+Large+Language+Models
4. Program Equivalence — Godlin and Strichman, 2008
https://scholar.google.com/scholar?q=Program+Equivalence
5. LLM-based agents for automated software engineering: A survey — Multiple authors, 2024-2025
https://scholar.google.com/scholar?q=LLM-based+agents+for+automated+software+engineering:+A+survey
6. CodeJudge: Evaluating Code Generation with Large Language Models — Tong and Zhang, 2024
https://scholar.google.com/scholar?q=CodeJudge:+Evaluating+Code+Generation+with+Large+Language+Models
7. On designing effective RL reward at training time for LLM reasoning — approximate, 2024-2025, 2024-2025
https://scholar.google.com/scholar?q=On+designing+effective+RL+reward+at+training+time+for+LLM+reasoning
8. Large language model critics for execution-free evaluation of code changes — approximate, 2024-2025, 2024-2025
https://scholar.google.com/scholar?q=Large+language+model+critics+for+execution-free+evaluation+of+code+changes
9. AgentFL: Scaling LLM-based fault localization to project-level context — approximate, 2024-2025, 2024-2025
https://scholar.google.com/scholar?q=AgentFL:+Scaling+LLM-based+fault+localization+to+project-level+context
10. SoapFL: A Standard Operating Procedure for LLM-based Method-Level Fault Localization — approximate, 2024-2025, 2024-2025
https://scholar.google.com/scholar?q=SoapFL:+A+Standard+Operating+Procedure+for+LLM-based+Method-Level+Fault+Localization
11. Structured chain-of-thought prompting for code generation — approximate, 2022-2024, 2022-2024
https://scholar.google.com/scholar?q=Structured+chain-of-thought+prompting+for+code+generation
12. Deductive verification of chain-of-thought reasoning — approximate, 2023-2024, 2023-2024
https://scholar.google.com/scholar?q=Deductive+verification+of+chain-of-thought+reasoning
13. AI Post Transformers: Reasoning About Code Without Running It — Hal Turing & Dr. Ada Shannon, 2026
https://podcast.do-not-panic.com/episodes/2026-03-08-reasoning-about-code-without-running-it-a9d01a.mp3
14. AI Post Transformers: Gradient Descent at Inference Time for LLM Reasoning — Hal Turing & Dr. Ada Shannon, 2026
https://podcast.do-not-panic.com/episodes/2026-03-10-gradient-descent-at-inference-time-for-l-20617d.mp3
Interactive Visualization: LLM Agents Reason About Code Without Running It
...more
26min
March 13, 2026 Emergent Cooperation in Self-Interested Multi-Agent AI
This episode explores the challenge of getting self-interested AI agents to cooperate without hardcoding cooperative behavior, examining a 2026 Google paper on multi-agent cooperation through in-context co-player inference. The hosts build up the technical foundations carefully, explaining why standard reinforcement learning breaks down in multi-agent settings due to non-stationarity, and how social dilemmas like the Prisoner's Dilemma cause agents to reliably converge on mutual defection even when cooperation would benefit everyone. The discussion traces the lineage of learning-aware agents, particularly LOLA, which achieved cooperation by differentiating through an opponent's gradient updates — a clever but architecturally demanding approach. The paper under review argues that training a transformer on a diverse pool of co-players lets in-context learning produce emergent cooperation without any of that machinery. Listeners interested in the intersection of game theory, multi-agent RL, and modern sequence modeling will find the episode's careful unpacking of why prior approaches fell short — and what the new framing claims to replace — genuinely illuminating.
Sources:
1. Multi-agent cooperation through in-context co-player inference — Marissa A. Weis, Maciej Wołczyk, Rajai Nasser, Rif A. Saurous, Blaise Agüera y Arcas, João Sacramento, Alexander Meulemans, 2026
http://arxiv.org/abs/2602.16301
2. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments — Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, Igor Mordatch, 2017
https://scholar.google.com/scholar?q=Multi-Agent+Actor-Critic+for+Mixed+Cooperative-Competitive+Environments
3. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning — Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Waard, Georgios Papoudakis, Jakob Foerster, Shimon Whiteson, 2018
https://scholar.google.com/scholar?q=QMIX:+Monotonic+Value+Function+Factorisation+for+Deep+Multi-Agent+Reinforcement+Learning
4. A Survey and Critique of Multiagent Deep Reinforcement Learning — Pablo Hernandez-Leal, Bilal Kartal, Matthew E. Taylor, 2019
https://scholar.google.com/scholar?q=A+Survey+and+Critique+of+Multiagent+Deep+Reinforcement+Learning
5. Multi-Agent Transformer: Scalable Cooperative Multi-Agent Reinforcement Learning — Muning Wen, Jakub Grudzien Kuba, Runji Lin, Weinan Zhang, Ying Wen, Jun Wang, Yaodong Yang, 2022
https://scholar.google.com/scholar?q=Multi-Agent+Transformer:+Scalable+Cooperative+Multi-Agent+Reinforcement+Learning
6. Language Models are Few-Shot Learners — Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, et al., 2020
https://scholar.google.com/scholar?q=Language+Models+are+Few-Shot+Learners
7. An Explanation of In-Context Learning as Implicit Bayesian Inference — Sang Michael Xie, Aditi Raghunathan, Percy Liang, Tengyu Ma, 2022
https://scholar.google.com/scholar?q=An+Explanation+of+In-Context+Learning+as+Implicit+Bayesian+Inference
8. Transformers Learn In-Context by Gradient Descent — Johannes von Oswald, Eyvind Niklasson, Ettore Randazzo, Joao Sacramento, Alexander Mordvintsev, Andrey Zhmoginov, Max Vladymyrov, 2023
https://scholar.google.com/scholar?q=Transformers+Learn+In-Context+by+Gradient+Descent
9. Algorithm Distillation in Reinforcement Learning — Michael Laskin, Luyu Wang, Junhyuk Oh, Emilio Parisotto, Stephen Spencer, Richie Morrill, Abhinav Gupta, Pieter Abbeel, Oriol Vinyals, 2023
https://scholar.google.com/scholar?q=Algorithm+Distillation+in+Reinforcement+Learning
10. Emergent Complexity via Multi-Agent Competition — Trapit Bansal, Jakub Pachocki, Szymon Sidor, Ilya Sutskever, Igor Mordatch, 2018
https://scholar.google.com/scholar?q=Emergent+Complexity+via+Multi-Agent+Competition
11. Emergent Tool Use from Multi-Agent Interaction — Bowen Baker, Ingmar Kanitscheider, Todor Markov, Yi Wu, Glenn Powell, Bob McGrew, Igor Mordatch, 2019
https://scholar.google.com/scholar?q=Emergent+Tool+Use+from+Multi-Agent+Interaction
12. Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning — Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro Ortega, DJ Strouse, Joel Z. Leibo, Nando de Freitas, 2019
https://scholar.google.com/scholar?q=Social+Influence+as+Intrinsic+Motivation+for+Multi-Agent+Deep+Reinforcement+Learning
13. Cooperative Multi-Agent Learning: The State of the Art — Liviu Panait, Sean Luke, 2005
https://scholar.google.com/scholar?q=Cooperative+Multi-Agent+Learning:+The+State+of+the+Art
14. The Evolution of Cooperation — Robert Axelrod, William D. Hamilton, 1981
https://scholar.google.com/scholar?q=The+Evolution+of+Cooperation
15. Iterated Prisoner's Dilemma Contains Strategies that Dominate Any Evolutionary Opponent — William H. Press, Freeman J. Dyson, 2012
https://scholar.google.com/scholar?q=Iterated+Prisoner's+Dilemma+Contains+Strategies+that+Dominate+Any+Evolutionary+Opponent
16. Evolutionary Instability of Zero-Determinant Strategies Demonstrates That Winning Is Not Everything — Christoph Adami, Arend Hintze, 2013
https://scholar.google.com/scholar?q=Evolutionary+Instability+of+Zero-Determinant+Strategies+Demonstrates+That+Winning+Is+Not+Everything
17. Learning with Opponent-Learning Awareness (LOLA) — Jakob Foerster, Richard Y. Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, Igor Mordatch, 2018
https://scholar.google.com/scholar?q=Learning+with+Opponent-Learning+Awareness+(LOLA)
18. Learning with Opponent-Learning Awareness — Jakob Foerster, Richard Y. Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, Igor Mordatch, 2018
https://scholar.google.com/scholar?q=Learning+with+Opponent-Learning+Awareness
19. Model-Free Opponent Shaping — Chris Lu, Timon Willi, Christian Schroeder de Waard, Jakob Foerster, 2022
https://scholar.google.com/scholar?q=Model-Free+Opponent+Shaping
20. In-context reinforcement learning with algorithm distillation — Laskin, M., Wang, L., Oh, J., Parisotto, E., Spencer, S., Steigerwald, R., Strouse, D., Hansen, S., Filos, A., Brooks, E., Gazeau, M., Sahni, H., Singh, S., Mnih, V., 2023
https://scholar.google.com/scholar?q=In-context+reinforcement+learning+with+algorithm+distillation
21. RL^2: Fast reinforcement learning via slow reinforcement learning — Duan, Y., Schulman, J., Chen, X., Bartlett, P. L., Sutskever, I., Abbeel, P., 2016
https://scholar.google.com/scholar?q=RL^2:+Fast+reinforcement+learning+via+slow+reinforcement+learning
22. Cooperating with unknown teammates in complex domains by acting carefully with information — Aghajohari, M., Duque, J., Cooijmans, T., Courville, A., 2024
https://scholar.google.com/scholar?q=Cooperating+with+unknown+teammates+in+complex+domains+by+acting+carefully+with+information
23. From naive to learning-aware: Emergence of cooperative behaviors in multi-agent systems — Meulemans, A., et al., 2025
https://scholar.google.com/scholar?q=From+naive+to+learning-aware:+Emergence+of+cooperative+behaviors+in+multi-agent+systems
24. Generative agents: Interactive simulacra of human behavior — Park, J. S., O'Brien, J. C., Cai, C. J., Morris, M. R., Liang, P., Bernstein, M. S., 2023
https://scholar.google.com/scholar?q=Generative+agents:+Interactive+simulacra+of+human+behavior
25. Do Pre-trained Transformers Really Learn In-context by Gradient Descent? — Shen et al. (approximate), 2023-2024
https://scholar.google.com/scholar?q=Do+Pre-trained+Transformers+Really+Learn+In-context+by+Gradient+Descent?
26. When is diversity rewarded in cooperative multi-agent learning? — approximate, exact authors not in snippet, recent
https://scholar.google.com/scholar?q=When+is+diversity+rewarded+in+cooperative+multi-agent+learning?
27. The evolution of zero-determinant strategies in public goods game — approximate, exact authors not in snippet, recent
https://scholar.google.com/scholar?q=The+evolution+of+zero-determinant+strategies+in+public+goods+game
28. Uncoupled learning of differential Stackelberg equilibria with commitments — approximate, exact authors not in snippet, recent
https://scholar.google.com/scholar?q=Uncoupled+learning+of+differential+Stackelberg+equilibria+with+commitments
29. Non-coercive extortion in game theory — approximate, exact authors not in snippet, recent
https://scholar.google.com/scholar?q=Non-coercive+extortion+in+game+theory
30. Reciprocal reward influence encourages cooperation from self-interested agents — approximate, exact authors not in snippet, recent
https://scholar.google.com/scholar?q=Reciprocal+reward+influence+encourages+cooperation+from+self-interested+agents
31. AI Post Transformers: In-Context Learning as Implicit Learning Algorithms — Hal Turing & Dr. Ada Shannon, Wed,
https://podcasters.spotify.com/pod/show/12146088098/episodes/In-Context-Learning-as-Implicit-Learning-Algorithms-e39sjmn
32. AI Post Transformers: Zero-Shot Context Generalization in Reinforcement Learning from Few Training Contexts — Hal Turing & Dr. Ada Shannon, Tue,
https://podcasters.spotify.com/pod/show/12146088098/episodes/Zero-Shot-Context-Generalization-in-Reinforcement--Learning-from-Few-Training-Contexts-e3fi0t5
33. AI Post Transformers: Experiential Reinforcement Learning: Internalizing Reflection for Better Policy Training — Hal Turing & Dr. Ada Shannon, Fri,
https://podcasters.spotify.com/pod/show/12146088098/episodes/Experiential-Reinforcement-Learning-Internalizing-Reflection-for-Better-Policy-Training-e3fbel0
Interactive Visualization: Emergent Cooperation in Self-Interested Multi-Agent AI
...more
33min
March 12, 2026 50x KV Cache Compression in Seconds via Attention Matching
This episode examines a 2026 MIT paper claiming a 50x KV cache memory reduction that runs in seconds rather than the GPU-hours required by prior latent-space compaction methods. It grounds the claim in a detailed technical primer on KV cache mechanics — explaining why memory consumption scales multiplicatively across layers, heads, and context length, reaching 8–16 GB per request at 64K-token contexts. The discussion traces the compaction landscape from token eviction approaches like H2O and SnapKV, through token merging, to the latent-space paradigm introduced by Cartridges, establishing why earlier methods collapse at extreme compression ratios. The central question is whether "Fast KV Compaction via Attention Matching" genuinely pushes the quality-versus-speed Pareto frontier — making per-request inference-time compaction practical rather than a research pipeline operation. Listeners interested in long-context inference infrastructure, memory-efficient transformers, or the engineering constraints shaping modern LLM deployment will find the technical depth and comparative framing useful.
Sources:
1. Fast KV Compaction via Attention Matching — Adam Zweiger, Xinghong Fu, Han Guo, Yoon Kim, 2026
http://arxiv.org/abs/2602.16284
2. H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models — Zhenyu Zhang, Ying Sheng, Tianyi Zhou, Tianlong Chen, et al., 2023
https://scholar.google.com/scholar?q=H2O:+Heavy-Hitter+Oracle+for+Efficient+Generative+Inference+of+Large+Language+Models
3. SnapKV: LLM Knows What You are Looking for Before Generation — Yuhong Li, Yingbing Huang, Bowen Yang, Bharat Venkitesh, Acyr Locatelli, Hanchen Ye, Tianhao Guo, Patrick Lewis, Deming Chen, 2024
https://scholar.google.com/scholar?q=SnapKV:+LLM+Knows+What+You+are+Looking+for+Before+Generation
4. Efficient Streaming Language Models with Attention Sinks — Guangxuan Xiao, Yuandong Tian, Beidi Chen, Song Han, Mike Lewis, 2023
https://scholar.google.com/scholar?q=Efficient+Streaming+Language+Models+with+Attention+Sinks
5. KVMerger: KV Cache Merging for Memory-Efficient LLMs Inference — Cangqing Wang, Yuhang Yang, Liangzhen Li, Lanqing Hong, Shuo Jiang, Hui Xu, Wei Tao, 2024
https://scholar.google.com/scholar?q=KVMerger:+KV+Cache+Merging+for+Memory-Efficient+LLMs+Inference
6. Cartridges: Lightweight, Pluggable Contexts for Language Models — Sabri Eyuboglu, Avanika Narayan, Tao Long, Andrew Liang, Kush Bhatia, Michael Zhang, Neel Guha, James Zou, Christopher Re, Atri Rudra, 2025
https://scholar.google.com/scholar?q=Cartridges:+Lightweight,+Pluggable+Contexts+for+Language+Models
7. Prefix-Tuning: Optimizing Continuous Prompts for Generation — Xiang Lisa Li, Percy Liang, 2021
https://scholar.google.com/scholar?q=Prefix-Tuning:+Optimizing+Continuous+Prompts+for+Generation
8. Learning to Compress Prompts with Gist Tokens — Jesse Mu, Xiang Lisa Li, Noah Goodman, 2023
https://scholar.google.com/scholar?q=Learning+to+Compress+Prompts+with+Gist+Tokens
9. DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model — DeepSeek-AI (Zhipeng Liu, Chengqi Deng, et al.), 2024
https://scholar.google.com/scholar?q=DeepSeek-V2:+A+Strong,+Economical,+and+Efficient+Mixture-of-Experts+Language+Model
10. GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers — Elias Frantar, Saleh Ashkboos, Torsten Hoefler, Dan Alistarh, 2022
https://scholar.google.com/scholar?q=GPTQ:+Accurate+Post-Training+Quantization+for+Generative+Pre-trained+Transformers
11. SparseGPT: Massive Language Models Can be Accurately Pruned in One Shot — Elias Frantar, Dan Alistarh, 2023
https://scholar.google.com/scholar?q=SparseGPT:+Massive+Language+Models+Can+be+Accurately+Pruned+in+One+Shot
12. Optimal Brain Surgeon and General Network Pruning — Babak Hassibi, David G. Stork, 1993
https://scholar.google.com/scholar?q=Optimal+Brain+Surgeon+and+General+Network+Pruning
13. LASER: The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction — Pratyusha Sharma, Jordan T. Ash, Dipendra Misra, 2023
https://scholar.google.com/scholar?q=LASER:+The+Truth+is+in+There:+Improving+Reasoning+in+Language+Models+with+Layer-Selective+Rank+Reduction
14. Cartridges: Learned KV Cache Compression for Long-Context Language Model Inference — Eyuboglu et al., 2025
https://scholar.google.com/scholar?q=Cartridges:+Learned+KV+Cache+Compression+for+Long-Context+Language+Model+Inference
15. The Power of Scale for Parameter-Efficient Prompt Tuning — Lester et al., 2021
https://scholar.google.com/scholar?q=The+Power+of+Scale+for+Parameter-Efficient+Prompt+Tuning
16. MagicPIG: LSH Sampling for Efficient LLM Generation — Chen et al., 2024
https://scholar.google.com/scholar?q=MagicPIG:+LSH+Sampling+for+Efficient+LLM+Generation
17. KV-Distill: Nearly Lossless Learnable Context Compression for LLMs — approximate (multiple authors), 2024-2025
https://scholar.google.com/scholar?q=KV-Distill:+Nearly+Lossless+Learnable+Context+Compression+for+LLMs
18. Thin Keys, Full Values: Reducing KV Cache via Low-Dimensional Attention Selection — approximate (multiple authors), 2024-2025
https://scholar.google.com/scholar?q=Thin+Keys,+Full+Values:+Reducing+KV+Cache+via+Low-Dimensional+Attention+Selection
19. A Preliminary Study on the Promises and Challenges of Native Top-Sparse Attention — approximate (multiple authors), 2024-2025
https://scholar.google.com/scholar?q=A+Preliminary+Study+on+the+Promises+and+Challenges+of+Native+Top-Sparse+Attention
20. Beyond KV Caching: Shared Attention for Efficient LLMs — approximate (multiple authors), 2024-2025
https://scholar.google.com/scholar?q=Beyond+KV+Caching:+Shared+Attention+for+Efficient+LLMs
21. Compressing Many-Shots in In-Context Learning — approximate (multiple authors), 2024-2025
https://scholar.google.com/scholar?q=Compressing+Many-Shots+in+In-Context+Learning
22. AI Post Transformers: Hyper-Scaling LLM Inference with KV Cache Compression — Hal Turing & Dr. Ada Shannon, Fri,
https://podcasters.spotify.com/pod/show/12146088098/episodes/Hyper-Scaling-LLM-Inference-with-KV-Cache-Compression-e3aalcq
23. AI Post Transformers: ShadowKV: High-Throughput Long-Context LLM Inference — Hal Turing & Dr. Ada Shannon, Wed,
https://podcasters.spotify.com/pod/show/12146088098/episodes/ShadowKV-High-Throughput-Long-Context-LLM-Inference-e38bn17
24. AI Post Transformers: Quest: Query-Aware Sparsity for Efficient LLM Inference — Hal Turing & Dr. Ada Shannon, Fri,
https://podcasters.spotify.com/pod/show/12146088098/episodes/Quest-Query-Aware-Sparsity-for-Efficient-LLM-Inference-e3aat91
25. AI Post Transformers: Long context: Dichotomy of Findings & Status of Research — Hal Turing & Dr. Ada Shannon, Wed,
https://podcasters.spotify.com/pod/show/12146088098/episodes/Long-context-Dichotomy-of-Findings--Status-of-Research-e3eat7c
26. AI Post Transformers: NVIDIA: TTT-E2E: Unlocking Long-Context Learning via End-to-End Test-Time Training — Hal Turing & Dr. Ada Shannon, Sat,
https://podcasters.spotify.com/pod/show/12146088098/episodes/NVIDIA-TTT-E2E-Unlocking-Long-Context-Learning-via-End-to-End-Test-Time-Training-e3dq389
Interactive Visualization: 50x KV Cache Compression in Seconds via Attention Matching
...more
31min
March 11, 2026 Gradient Descent at Inference Time for LLM Reasoning
This episode examines the nabla-Reasoner paper (ICLR 2026), which proposes running gradient descent on token logits during inference — a first-order approach to test-time compute scaling that stands apart from every existing method in the field. The hosts contextualize the work against the established zeroth-order inference-time scaling landscape: Chain-of-Thought, Self-Consistency, Tree of Thoughts, and MCTS-based methods, all of which probe the reward landscape by sampling without directional information. The core argument is that zeroth-order methods hit a hard ceiling on long-horizon reasoning tasks because the search space grows exponentially while reward signals remain sparse, making random sampling increasingly futile. nabla-Reasoner sidesteps this by treating token logit vectors — normally ephemeral intermediate computations — as continuous optimization variables, computing reward gradients with respect to them and nudging the distribution toward higher-reward outputs before committing to each token. Listeners interested in the mechanics of inference-time scaling and the theoretical limits of sampling-based reasoning will find this a technically dense, well-grounded discussion of a genuinely novel approach.
Sources:
1. $\nabla$-Reasoner: LLM Reasoning via Test-Time Gradient Descent in Latent Space — Peihao Wang, Ruisi Cai, Zhen Wang, Hongyuan Mei, Qiang Liu, Pan Li, Zhangyang Wang, 2026
http://arxiv.org/abs/2603.04948v1
2. ∇-Reasoner: LLM Reasoning via Test-Time Gradient Descent in Latent Space — Peihao Wang, Ruisi Cai, Zhen Wang, Hongyuan Mei, Qiang Liu, Pan Li, Zhangyang Wang, 2026
https://scholar.google.com/scholar?q=∇-Reasoner:+LLM+Reasoning+via+Test-Time+Gradient+Descent+in+Latent+Space
3. Diffusion-LM Improves Controllable Text Generation — Xiang Lisa Li, John Thickstun, Ishaan Gulrajani, Percy Liang, Tatsunori Hashimoto, 2022
https://scholar.google.com/scholar?q=Diffusion-LM+Improves+Controllable+Text+Generation
4. GFlowNet-Guided LLM Decoding: Towards Diverse and Accurate Reasoning — Jianing Li et al., 2024
https://scholar.google.com/scholar?q=GFlowNet-Guided+LLM+Decoding:+Towards+Diverse+and+Accurate+Reasoning
5. Back to Basics: Revisiting REINFORCE-Style Optimization for Learning from Human Feedback in LLMs — Ahmadianshalchi et al., 2024
https://scholar.google.com/scholar?q=Back+to+Basics:+Revisiting+REINFORCE-Style+Optimization+for+Learning+from+Human+Feedback+in+LLMs
6. Self-Consistency Improves Chain of Thought Reasoning in Language Models — Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, Denny Zhou, 2022
https://scholar.google.com/scholar?q=Self-Consistency+Improves+Chain+of+Thought+Reasoning+in+Language+Models
7. Tree of Thoughts: Deliberate Problem Solving with Large Language Models — Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, Karthik Narasimhan, 2023
https://scholar.google.com/scholar?q=Tree+of+Thoughts:+Deliberate+Problem+Solving+with+Large+Language+Models
8. Let's Verify Step by Step — Hunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, Karl Cobbe, 2023
https://scholar.google.com/scholar?q=Let's+Verify+Step+by+Step
9. Scaling LLM Test-Time Compute Optimally — Charlie Snell, Jaehoon Lee, Kelvin Xu, Aviral Kumar, 2024
https://scholar.google.com/scholar?q=Scaling+LLM+Test-Time+Compute+Optimally
10. ARGS: Alignment as Reward-Guided Search — Maxim Khanov, Jirayu Burapacheep, Yixuan Li, 2024
https://scholar.google.com/scholar?q=ARGS:+Alignment+as+Reward-Guided+Search
11. Controlled Decoding from Language Models — Sidharth Mudgal, Jong Lee, Harish Ganapathy, YaGuang Li, Tao Wang, Yanpin Huang, Zhifeng Chen, Heng-Tze Cheng, Michael Collins, Trevor Strohman, Jilin Chen, Alex Beutel, Ahmad Beirami, 2024
https://scholar.google.com/scholar?q=Controlled+Decoding+from+Language+Models
12. AlphaCode 2 Technical Report — Google DeepMind AlphaCode Team, 2023
https://scholar.google.com/scholar?q=AlphaCode+2+Technical+Report
13. Training language models to follow instructions with human feedback — Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe, 2022
https://scholar.google.com/scholar?q=Training+language+models+to+follow+instructions+with+human+feedback
14. Direct Preference Optimization: Your Language Model is Secretly a Reward Model — Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, Chelsea Finn, 2023
https://scholar.google.com/scholar?q=Direct+Preference+Optimization:+Your+Language+Model+is+Secretly+a+Reward+Model
15. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning — DeepSeek-AI (Daya Guo, Dejian Yang, Haowei Zhang, et al.), 2025
https://scholar.google.com/scholar?q=DeepSeek-R1:+Incentivizing+Reasoning+Capability+in+LLMs+via+Reinforcement+Learning
16. Learning to summarize from human feedback — Nisan Stiennon, Long Ouyang, Jeff Wu, Daniel Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, Paul Christiano, 2020
https://scholar.google.com/scholar?q=Learning+to+summarize+from+human+feedback
17. Plug and Play Language Models: A Simple Approach to Controlled Text Generation — Dathathri et al., 2020
https://scholar.google.com/scholar?q=Plug+and+Play+Language+Models:+A+Simple+Approach+to+Controlled+Text+Generation
18. FUDGE: Controlled Text Generation with Future Discriminators — Yang and Klein, 2021
https://scholar.google.com/scholar?q=FUDGE:+Controlled+Text+Generation+with+Future+Discriminators
19. Scaling LLM Test-Time Compute Optimally Can be More Effective than Scaling Model Parameters — Snell et al., 2024
https://scholar.google.com/scholar?q=Scaling+LLM+Test-Time+Compute+Optimally+Can+be+More+Effective+than+Scaling+Model+Parameters
20. Alignment as Reward-Guided Search — Khanov et al., 2024
https://scholar.google.com/scholar?q=Alignment+as+Reward-Guided+Search
21. Soft Prompts: The Power of Scale for Parameter-Efficient Prompt Tuning — Lester et al., 2021
https://scholar.google.com/scholar?q=Soft+Prompts:+The+Power+of+Scale+for+Parameter-Efficient+Prompt+Tuning
22. The Generalization Gap in Offline Reinforcement Learning — Levine et al., 2020
https://scholar.google.com/scholar?q=The+Generalization+Gap+in+Offline+Reinforcement+Learning
23. Thinking on the Fly: Test-Time Reasoning Enhancement via Latent Thought Policy Optimization — Unknown (2025), 2025
https://scholar.google.com/scholar?q=Thinking+on+the+Fly:+Test-Time+Reasoning+Enhancement+via+Latent+Thought+Policy+Optimization
24. Logit arithmetic elicits long reasoning capabilities without training — Unknown (2025), 2025
https://scholar.google.com/scholar?q=Logit+arithmetic+elicits+long+reasoning+capabilities+without+training
25. Reinforcement Learning in Inference Time: A Perspective from Successive Policy Iterations — Unknown (2025), 2025
https://scholar.google.com/scholar?q=Reinforcement+Learning+in+Inference+Time:+A+Perspective+from+Successive+Policy+Iterations
26. GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning — Yang et al. (2024/2025), 2025
https://scholar.google.com/scholar?q=GenPRM:+Scaling+Test-Time+Compute+of+Process+Reward+Models+via+Generative+Reasoning
27. Process Reward Models That Think — Unknown (2025), 2025
https://scholar.google.com/scholar?q=Process+Reward+Models+That+Think
28. Efficient Adaptive Rejection Sampling for Accelerating Speculative Decoding in Large Language Models — Unknown (2025), 2025
https://scholar.google.com/scholar?q=Efficient+Adaptive+Rejection+Sampling+for+Accelerating+Speculative+Decoding+in+Large+Language+Models
29. Inference-time alignment control for diffusion models with reinforcement learning guidance — Unknown (2025), 2025
https://scholar.google.com/scholar?q=Inference-time+alignment+control+for+diffusion+models+with+reinforcement+learning+guidance
30. AI Post Transformers: Test-Time Reinforcement Learning for LLMs — Hal Turing & Dr. Ada Shannon, Wed,
https://podcasters.spotify.com/pod/show/12146088098/episodes/Test-Time-Reinforcement-Learning-for-LLMs-e398hsk
31. AI Post Transformers: MetaScale: Test-Time Scaling with Evolving Meta-Thoughts — Hal Turing & Dr. Ada Shannon, Fri,
https://podcasters.spotify.com/pod/show/12146088098/episodes/MetaScale-Test-Time-Scaling-with-Evolving-Meta-Thoughts-e36kgn7
32. AI Post Transformers: Process Reward Learning for LLM Reasoning Optimization — Hal Turing & Dr. Ada Shannon, Mon,
https://podcasters.spotify.com/pod/show/12146088098/episodes/Process-Reward-Learning-for-LLM-Reasoning-Optimization-e3dsuav
33. AI Post Transformers: Tree-based Group Policy Optimization for LLM Agents — Hal Turing & Dr. Ada Shannon, Fri,
https://podcasters.spotify.com/pod/show/12146088098/episodes/Tree-based-Group-Policy-Optimization-for-LLM-Agents-e38obfb
34. AI Post Transformers: MASA: Meta-Awareness via Self-Alignment Reinforcement Learning — Hal Turing & Dr. Ada Shannon, Sun,
https://podcasters.spotify.com/pod/show/12146088098/episodes/MASA-Meta-Awareness-via-Self-Alignment-Reinforcement-Learning-e3a2of7
Interactive Visualization: Gradient Descent at Inference Time for LLM Reasoning
...more
0min
March 08, 2026 MalGEN: Multi-Agent AI for Red Teaming Malware
This episode explores MalGEN, a multi-agent AI framework developed by researchers at IIT Kanpur that autonomously generates novel, functional malware capable of evading modern detection systems. The discussion examines why LLM-generated malware represents a qualitative shift beyond traditional polymorphic and metamorphic techniques — rather than mutating a fixed payload syntactically, LLMs reason about semantic intent and produce entirely new code that achieves the same effect through different computational paths. A central focus is MalGEN's alignment with MITRE ATT&CK tactics, techniques, and procedures, meaning the generated malware maps to documented real-world intrusion patterns rather than merely bypassing signature databases. The hosts pressure-test the paper's red teaming justification — framing MalGEN as a defensive stress-testing tool — while examining its most unsettling capability: automating sandbox-aware, environment-detecting evasion previously requiring nation-state-level expertise. The conversation anchors on the dual-use tension at the core of publishing a reproducible malware generation framework under academic cover.
Sources:
1. MalGEN: A Generative Agent Framework for Modeling Malicious Software in Cybersecurity — Bikash Saha, Sandeep Kumar Shukla, 2025
http://arxiv.org/abs/2506.07586
2. https://arxiv.org/pdf/2510.23883
3. https://arxiv.org/pdf/2601.05293
4. https://arxiv.org/pdf/2508.05674
5. Evaluating the Cybersecurity Capabilities of LLMs: A Comprehensive Study — Bhatt, M., Chennabasappa, S., Nikolaidis, C., et al. (Meta), 2023
https://scholar.google.com/scholar?q=Evaluating+the+Cybersecurity+Capabilities+of+LLMs:+A+Comprehensive+Study
6. From Chatbots to Phishbots? Phishing Scam Generation in Commercial Large Language Models — Heiding, F., Schneier, B., Vishwanath, A., Bernstein, J., Park, P.S., 2024
https://scholar.google.com/scholar?q=From+Chatbots+to+Phishbots?+Phishing+Scam+Generation+in+Commercial+Large+Language+Models
7. PentestGPT: An LLM-Empowered Automatic Penetration Testing Framework — Deng, G., Liu, Y., Mayoral-Vilches, V., et al., 2024
https://scholar.google.com/scholar?q=PentestGPT:+An+LLM-Empowered+Automatic+Penetration+Testing+Framework
8. LLM4Decompile: Decompiling Binary Code with Large Language Models — Tan, Z., Ma, H., Xu, H., et al., 2024
https://scholar.google.com/scholar?q=LLM4Decompile:+Decompiling+Binary+Code+with+Large+Language+Models
9. LLM Agents can Autonomously Exploit One-Day Vulnerabilities — Fang, R., Bindu, R., Gupta, A., Kang, D., Boneh, D., 2024
https://scholar.google.com/scholar?q=LLM+Agents+can+Autonomously+Exploit+One-Day+Vulnerabilities
10. Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned — Ganguli, D., Lovitt, L., Kernion, J., et al. (Anthropic), 2022
https://scholar.google.com/scholar?q=Red+Teaming+Language+Models+to+Reduce+Harms:+Methods,+Scaling+Behaviors,+and+Lessons+Learned
11. Trojan Detection Benchmark (TrojAI): Evaluating Backdoor Defenses on Neural Networks — Karra, K., Ashcraft, C., Fendley, N. (IARPA / Johns Hopkins APL), 2020
https://scholar.google.com/scholar?q=Trojan+Detection+Benchmark+(TrojAI):+Evaluating+Backdoor+Defenses+on+Neural+Networks
12. Evading Machine Learning Malware Detection — Grosse, K., Papernot, N., Manoharan, P., Backes, M., McDaniel, P., 2017
https://scholar.google.com/scholar?q=Evading+Machine+Learning+Malware+Detection
13. Malware Detection by Eating a Whole EXE — Raff, E., Barker, J., Sylvester, J., Brandon, R., Catanzaro, B., Nicholas, C., 2018
https://scholar.google.com/scholar?q=Malware+Detection+by+Eating+a+Whole+EXE
14. Adversarial Malware Binaries: Evading Deep Learning for Malware Detection in Executables — Kolosnjaji, B., Demontis, A., Biggio, B., Maiorca, D., Giacinto, G., Eckert, C., Roli, F., 2018
https://scholar.google.com/scholar?q=Adversarial+Malware+Binaries:+Evading+Deep+Learning+for+Malware+Detection+in+Executables
15. DQEAF: Malware Adversarial Examples Generation with Reinforcement Learning — Fang, Y., Liu, Y., Huang, C., Liu, L., 2020
https://scholar.google.com/scholar?q=DQEAF:+Malware+Adversarial+Examples+Generation+with+Reinforcement+Learning
16. MalGAN: Generating Adversarial Malware Examples — Hu, W., Tan, Y., 2017
https://scholar.google.com/scholar?q=MalGAN:+Generating+Adversarial+Malware+Examples
17. IDSGAN: Generative Adversarial Networks for Attack Generation against Intrusion Detection — Lin, Z., Shi, Y., Xue, Z., 2022
https://scholar.google.com/scholar?q=IDSGAN:+Generative+Adversarial+Networks+for+Attack+Generation+against+Intrusion+Detection
18. Synthetic Data Generation for Cybersecurity: Challenges and Opportunities — Ring, M., Wunderlich, S., Scheuring, D., Landes, D., Hotho, A., 2019
https://scholar.google.com/scholar?q=Synthetic+Data+Generation+for+Cybersecurity:+Challenges+and+Opportunities
19. Generative Adversarial Networks for Black-Box API Attack on Web Application Firewalls — Duy, P.T., Khoa, N.M., Hoa, N.T., et al., 2022
https://scholar.google.com/scholar?q=Generative+Adversarial+Networks+for+Black-Box+API+Attack+on+Web+Application+Firewalls
20. Evading Anti-Malware Engines With Deep Reinforcement Learning — Fang, Z. et al., 2019
https://scholar.google.com/scholar?q=Evading+Anti-Malware+Engines+With+Deep+Reinforcement+Learning
21. Do LLMs Dream of Malware? Assessing the Practical Capability of LLMs to Generate Functional Malware — Pa Pa, Y. M. et al., 2023
https://scholar.google.com/scholar?q=Do+LLMs+Dream+of+Malware?+Assessing+the+Practical+Capability+of+LLMs+to+Generate+Functional+Malware
22. MITRE ATT&CK: Design and Philosophy — Strom, B. et al., 2018
https://scholar.google.com/scholar?q=MITRE+ATT&CK:+Design+and+Philosophy
23. LLM4Fuzz: Guided Fuzzing of Smart Contracts with Large Language Models — Various, 2024
https://scholar.google.com/scholar?q=LLM4Fuzz:+Guided+Fuzzing+of+Smart+Contracts+with+Large+Language+Models
24. Beyond the sandbox: Leveraging symbolic execution for evasive malware classification — approximate, ~2024, 2024
https://scholar.google.com/scholar?q=Beyond+the+sandbox:+Leveraging+symbolic+execution+for+evasive+malware+classification
25. Unveiling the dynamic landscape of malware sandboxing: A comprehensive review — approximate, ~2024, 2024
https://scholar.google.com/scholar?q=Unveiling+the+dynamic+landscape+of+malware+sandboxing:+A+comprehensive+review
26. TAMAS: Benchmarking Adversarial Risks in Multi-Agent LLM Systems — approximate, ~2024-2025, 2025
https://scholar.google.com/scholar?q=TAMAS:+Benchmarking+Adversarial+Risks+in+Multi-Agent+LLM+Systems
27. Red-teaming LLM multi-agent systems via communication attacks — approximate, ~2024-2025, 2025
https://scholar.google.com/scholar?q=Red-teaming+LLM+multi-agent+systems+via+communication+attacks
28. Assessing LLMs in malicious code deobfuscation of real-world malware campaigns — approximate, ~2024-2025, 2024
https://scholar.google.com/scholar?q=Assessing+LLMs+in+malicious+code+deobfuscation+of+real-world+malware+campaigns
29. Certifying accuracy, privacy, and robustness of ML-based malware detection — approximate, ~2024-2025, 2025
https://scholar.google.com/scholar?q=Certifying+accuracy,+privacy,+and+robustness+of+ML-based+malware+detection
30. AI Post Transformers: Petri: Accelerating AI Safety Auditing — Hal Turing & Dr. Ada Shannon, Fri,
https://podcasters.spotify.com/pod/show/12146088098/episodes/Petri-Accelerating-AI-Safety-Auditing-e39boei
31. AI Post Transformers: Bloom: an open source tool for automated behavioral evaluations — Hal Turing & Dr. Ada Shannon, Tue,
https://podcasters.spotify.com/pod/show/12146088098/episodes/Bloom-an-open-source-tool-for-automated-behavioral-evaluations-e3fi1ge
32. AI Post Transformers: Agentic Reasoning for Large Language Models: A Comprehensive Roadmap — Hal Turing & Dr. Ada Shannon, Sat,
https://podcasters.spotify.com/pod/show/12146088098/episodes/Agentic-Reasoning-for-Large-Language-Models-A-Comprehensive-Roadmap-e3e43ru
Interactive Visualization: MalGEN: Multi-Agent AI for Red Teaming Malware
...more
0min
March 07, 2026 NVIDIA Nemotron 3 Hybrid SSM Transformer Architecture
This episode explores NVIDIA's Nemotron 3 white paper, which introduces a three-tier model family (Nano, Super, Ultra) built on a hybrid architecture combining Mamba-2 structured state space layers with Mixture-of-Experts routing, targeting simultaneous state-of-the-art accuracy, one-million-token context, and substantially higher inference throughput than comparable dense Transformer MoE models. The discussion traces how Mamba-2's fixed-size recurrent state eliminates the KV cache's linear memory growth — the central bottleneck for long-context and agentic workloads — and explains NVIDIA's novel LatentMoE extension, which projects tokens into a reduced latent dimension before expert routing to cut communication costs while activating more experts per token. Multi-Token Prediction from Meta FAIR appears as a training accelerant, predicting multiple future tokens simultaneously to improve both training efficiency and generation speed, while NVFP4, NVIDIA's 4-bit floating-point training format used for Super and Ultra, raises open questions about numerical stability during post-training. Listeners interested in how recent theoretical work on state space duality translates into a production-scale model family, or in the architectural tradeoffs enabling practical multi-agent pipelines at scale, will find the episode a concrete and technically grounded case study.
Sources:
1. NVIDIA Nemotron 3: Efficient and Open Intelligence — NVIDIA, :, Aaron Blakeman, Aaron Grattafiori, Aarti Basant, Abhibha Gupta, Abhinav Khattar, Adi Renduchintala, Aditya Vavre, Akanksha Shukla, Akhiad Bercovich, Aleksander Ficek, Aleksandr Shaposhnikov, Alex Kondratenko, Alexander Bukharin, Alexandre Milesi, Ali Taghibakhshi, Alisa Liu, Amelia Barton, Ameya Sunil Mahabaleshwarkar, Amir Klein, Amit Zuker, Amnon Geifman, Amy Shen, Anahita Bhiwandiwalla, Andrew Tao, Anjulie Agrusa, Ankur Verma, Ann Guan, Anubhav Mandarwal, Arham Mehta, Ashwath Aithal, Ashwin Poojary, Asif Ahamed, Asit Mishra, Asma Kuriparambil Thekkumpate, Ayush Dattagupta, Banghua Zhu, Bardiya Sadeghi, Barnaby Simkin, Ben Lanir, Benedikt Schifferer, Besmira Nushi, Bilal Kartal, Bita Darvish Rouhani, Boris Ginsburg, Brandon Norick, Brandon Soubasis, Branislav Kisacanin, Brian Yu, Bryan Catanzaro, Carlo del Mundo, Chantal Hwang, Charles Wang, Cheng-Ping Hsieh, Chenghao Zhang, Chenhan Yu, Chetan Mungekar, Chintan Patel, Chris Alexiuk, Christopher Parisien, Collin Neale, Cyril Meurillon, Damon Mosk-Aoyama, Dan Su, Dane Corneil, Daniel Afrimi, Daniel Lo, Daniel Rohrer, Daniel Serebrenik, Daria Gitman, Daria Levy, Darko Stosic, David Mosallanezhad, Deepak Narayanan, Dhruv Nathawani, Dima Rekesh, Dina Yared, Divyanshu Kakwani, Dong Ahn, Duncan Riach, Dusan Stosic, Edgar Minasyan, Edward Lin, Eileen Long, Eileen Peters Long, Elad Segal, Elena Lantz, Ellie Evans, Elliott Ning, Eric Chung, Eric Harper, Eric Tramel, Erick Galinkin, Erik Pounds, Evan Briones, Evelina Bakhturina, Evgeny Tsykunov, Faisal Ladhak, Fay Wang, Fei Jia, Felipe Soares, Feng Chen, Ferenc Galko, Frank Sun, Frankie Siino, Gal Hubara Agam, Ganesh Ajjanagadde, Gantavya Bhatt, Gargi Prasad, George Armstrong, Gerald Shen, Gorkem Batmaz, Grigor Nalbandyan, Haifeng Qian, Harsh Sharma, Hayley Ross, Helen Ngo, Herbert Hum, Herman Sahota, Hexin Wang, Himanshu Soni, Hiren Upadhyay, Huizi Mao, Huy C Nguyen, Huy Q Nguyen, Iain Cunningham, Ido Galil, Ido Shahaf, Igor Gitman, Ilya Loshchilov, Itamar Schen, Itay Levy, Ivan Moshkov, Izik Golan, Izzy Putterman, Jan Kautz, Jane Polak Scowcroft, Jared Casper, Jatin Mitra, Jeffrey Glick, Jenny Chen, Jesse Oliver, Jian Zhang, Jiaqi Zeng, Jie Lou, Jimmy Zhang, Jinhang Choi, Jining Huang, Joey Conway, Joey Guman, John Kamalu, Johnny Greco, Jonathan Cohen, Joseph Jennings, Joyjit Daw, Julien Veron Vialard, Junkeun Yi, Jupinder Parmar, Kai Xu, Kan Zhu, Kari Briski, Katherine Cheung, Katherine Luna, Keith Wyss, Keshav Santhanam, Kevin Shih, Kezhi Kong, Khushi Bhardwaj, Kirthi Shankar, Krishna C. Puvvada, Krzysztof Pawelec, Kumar Anik, Lawrence McAfee, Laya Sleiman, Leon Derczynski, Li Ding, Lizzie Wei, Lucas Liebenwein, Luis Vega, Maanu Grover, Maarten Van Segbroeck, Maer Rodrigues de Melo, Mahdi Nazemi, Makesh Narsimhan Sreedhar, Manoj Kilaru, Maor Ashkenazi, Marc Romeijn, Marcin Chochowski, Mark Cai, Markus Kliegl, Maryam Moosaei, Matt Kulka, Matvei Novikov, Mehrzad Samadi, Melissa Corpuz, Mengru Wang, Meredith Price, Michael Andersch, Michael Boone, Michael Evans, Miguel Martinez, Mikail Khona, Mike Chrzanowski, Minseok Lee, Mohammad Dabbah, Mohammad Shoeybi, Mostofa Patwary, Nabin Mulepati, Najeeb Nabwani, Natalie Hereth, Nave Assaf, Negar Habibi, Neta Zmora, Netanel Haber, Nicola Sessions, Nidhi Bhatia, Nikhil Jukar, Nikki Pope, Nikolai Ludwig, Nima Tajbakhsh, Nir Ailon, Nirmal Juluru, Nishant Sharma, Oleksii Hrinchuk, Oleksii Kuchaiev, Olivier Delalleau, Oluwatobi Olabiyi, Omer Ullman Argov, Omri Puny, Oren Tropp, Ouye Xie, Parth Chadha, Pasha Shamis, Paul Gibbons, Pavlo Molchanov, Pawel Morkisz, Peter Dykas, Peter Jin, Pinky Xu, Piotr Januszewski, Pranav Prashant Thombre, Prasoon Varshney, Pritam Gundecha, Przemek Tredak, Qing Miao, Qiyu Wan, Rabeeh Karimi Mahabadi, Rachit Garg, Ran El-Yaniv, Ran Zilberstein, Rasoul Shafipour, Rich Harang, Rick Izzo, Rima Shahbazyan, Rishabh Garg, Ritika Borkar, Ritu Gala, Riyad Islam, Robert Hesse, Roger Waleffe, Rohit Watve, Roi Koren, Ruoxi Zhang, Russell Hewett, Russell J. Hewett, Ryan Prenger, Ryan Timbrook, Sadegh Mahdavi, Sahil Modi, Samuel Kriman, Sangkug Lim, Sanjay Kariyappa, Sanjeev Satheesh, Saori Kaji, Satish Pasumarthi, Saurav Muralidharan, Sean Narentharen, Sean Narenthiran, Seonmyeong Bak, Sergey Kashirsky, Seth Poulos, Shahar Mor, Shanmugam Ramasamy, Shantanu Acharya, Shaona Ghosh, Sharath Turuvekere Sreenivas, Shelby Thomas, Shiqing Fan, Shreya Gopal, Shrimai Prabhumoye, Shubham Pachori, Shubham Toshniwal, Shuoyang Ding, Siddharth Singh, Simeng Sun, Smita Ithape, Somshubra Majumdar, Soumye Singhal, Stas Sergienko, Stefania Alborghetti, Stephen Ge, Sugam Dipak Devare, Sumeet Kumar Barua, Suseella Panguluri, Suyog Gupta, Sweta Priyadarshi, Syeda Nahida Akter, Tan Bui, Teodor-Dumitru Ene, Terry Kong, Thanh Do, Tijmen Blankevoort, Tim Moon, Tom Balough, Tomer Asida, Tomer Bar Natan, Tomer Ronen, Tugrul Konuk, Twinkle Vashishth, Udi Karpas, Ushnish De, Vahid Noorozi, Vahid Noroozi, Venkat Srinivasan, Venmugil Elango, Victor Cui, Vijay Korthikanti, Vinay Rao, Vitaly Kurin, Vitaly Lavrukhin, Vladimir Anisimov, Wanli Jiang, Wasi Uddin Ahmad, Wei Du, Wei Ping, Wenfei Zhou, Will Jennings, William Zhang, Wojciech Prazuch, Xiaowei Ren, Yashaswi Karnati, Yejin Choi, Yev Meyer, Yi-Fu Wu, Yian Zhang, Yigong Qin, Ying Lin, Yonatan Geifman, Yonggan Fu, Yoshi Subara, Yoshi Suhara, Yubo Gao, Zach Moshe, Zhen Dong, Zhongbo Zhu, Zihan Liu, Zijia Chen, Zijie Yan, 2025
http://arxiv.org/abs/2512.20856
2. Scaling LLM Test-Time Compute Optimally Can be More Effective than Scaling Model Parameters — Charlie Snell, Jaehoon Lee, Kelvin Xu, Aviral Kumar (Google DeepMind), 2024
https://scholar.google.com/scholar?q=Scaling+LLM+Test-Time+Compute+Optimally+Can+be+More+Effective+than+Scaling+Model+Parameters
3. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning — DeepSeek-AI (Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, and many others), 2025
https://scholar.google.com/scholar?q=DeepSeek-R1:+Incentivizing+Reasoning+Capability+in+LLMs+via+Reinforcement+Learning
4. Thinking Tokens for Language Modeling — David Herel, Tomas Mikolov (Czech Institute of Informatics / FAIR), 2023
https://scholar.google.com/scholar?q=Thinking+Tokens+for+Language+Modeling
5. Training Large Language Models to Reason in a Continuous Latent Space (Coconut) — Shibo Hao, Sainbayar Sukhbaatar, Jason Weston, Yuandong Tian, Zhiting Hu (Meta FAIR), 2024
https://scholar.google.com/scholar?q=Training+Large+Language+Models+to+Reason+in+a+Continuous+Latent+Space+(Coconut)
6. Don't Think Too Much: Reasoning Budget Control for Large Reasoning Models — Xuan He, Zhenyu He, Qingyu Meng, Yonghao Zhong, Zhonglin Shi, Fandong Meng, Jie Zhou (Tencent AI Lab / WeChat AI), 2025
https://scholar.google.com/scholar?q=Don't+Think+Too+Much:+Reasoning+Budget+Control+for+Large+Reasoning+Models
7. Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality — Dao, T. and Gu, A., 2024
https://scholar.google.com/scholar?q=Transformers+are+SSMs:+Generalized+Models+and+Efficient+Algorithms+Through+Structured+State+Space+Duality
8. Jamba: A Hybrid Transformer-Mamba Language Model — Lieber et al., AI21 Labs, 2024
https://scholar.google.com/scholar?q=Jamba:+A+Hybrid+Transformer-Mamba+Language+Model
9. Better & Faster Large Language Models via Multi-Token Prediction — Gloeckle et al., 2024
https://scholar.google.com/scholar?q=Better+&+Faster+Large+Language+Models+via+Multi-Token+Prediction
10. RULER: What's the Real Context Size of Your Long-Context Language Models? — Hsieh et al., 2024
https://scholar.google.com/scholar?q=RULER:+What's+the+Real+Context+Size+of+Your+Long-Context+Language+Models?
11. DeepSeek-V3 Technical Report — DeepSeek-AI, 2025
https://scholar.google.com/scholar?q=DeepSeek-V3+Technical+Report
12. Mixtral of Experts — Jiang et al., 2024
https://scholar.google.com/scholar?q=Mixtral+of+Experts
13. In-Context Learning with Long-Context Models: An In-Depth Exploration — Bertsch et al., 2024
https://scholar.google.com/scholar?q=In-Context+Learning+with+Long-Context+Models:+An+In-Depth+Exploration
14. Attn-QAT: 4-Bit Attention With Quantization-Aware Training — approximate, 2024–2025, 2024-2025
https://scholar.google.com/scholar?q=Attn-QAT:+4-Bit+Attention+With+Quantization-Aware+Training
15. FP4 All the Way: Fully Quantized Training of LLMs — approximate, 2024–2025, 2024-2025
https://scholar.google.com/scholar?q=FP4+All+the+Way:+Fully+Quantized+Training+of+LLMs
16. Optimizing Large Language Model Training Using FP4 Quantization — approximate, 2024–2025, 2024-2025
https://scholar.google.com/scholar?q=Optimizing+Large+Language+Model+Training+Using+FP4+Quantization
17. Speculative Decoding and Beyond: An In-Depth Survey of Techniques — approximate, 2024–2025, 2024-2025
https://scholar.google.com/scholar?q=Speculative+Decoding+and+Beyond:+An+In-Depth+Survey+of+Techniques
18. Latent Prototype Routing: Achieving Near-Perfect Load Balancing in Mixture-of-Experts — approximate, 2024–2025, 2024-2025
https://scholar.google.com/scholar?q=Latent+Prototype+Routing:+Achieving+Near-Perfect+Load+Balancing+in+Mixture-of-Experts
19. AI Post Transformers: Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs — Hal Turing & Dr. Ada Shannon, 2026
https://podcast.do-not-panic.com/episodes/2026-03-04-2-papers-combined-217168.mp3
20. AI Post Transformers: Switch Transformers: Trillion Parameter Models with Sparsity — Hal Turing & Dr. Ada Shannon, Wed,
https://podcasters.spotify.com/pod/show/12146088098/episodes/Switch-Transformers-Trillion-Parameter-Models-with-Sparsity-e373fd3
21. AI Post Transformers: GLM-5: Transitioning from Vibe Coding to Agentic Engineering — Hal Turing & Dr. Ada Shannon, Fri,
https://podcasters.spotify.com/pod/show/12146088098/episodes/GLM-5-Transitioning-from-Vibe-Coding-to-Agentic-Engineering-e3fbfls
22. AI Post Transformers: Kimi Linear: Efficient Expressive Attention Architecture — Hal Turing & Dr. Ada Shannon, Sun,
https://podcasters.spotify.com/pod/show/12146088098/episodes/Kimi-Linear-Efficient-Expressive-Attention-Architecture-e3aclec
23. AI Post Transformers: TailorKV: Hybrid KV Cache Compression for LLMs — Hal Turing & Dr. Ada Shannon, Wed,
https://podcasters.spotify.com/pod/show/12146088098/episodes/TailorKV-Hybrid-KV-Cache-Compression-for-LLMs-e38bmv2
24. AI Post Transformers: AWQ: On-Device LLM Compression and Acceleration — Hal Turing & Dr. Ada Shannon, Mon,
https://podcasters.spotify.com/pod/show/12146088098/episodes/AWQ-On-Device-LLM-Compression-and-Acceleration-e388njr
25. AI Post Transformers: LFM2-8B-A1B: Efficient On-Device Mixture-of-Experts — Hal Turing & Dr. Ada Shannon, Sun,
https://podcasters.spotify.com/pod/show/12146088098/episodes/LFM2-8B-A1B-Efficient-On-Device-Mixture-of-Experts-e3a2oh2
Interactive Visualization: NVIDIA Nemotron 3 Hybrid SSM Transformer Architecture
...more
0min

FAQs about AI Post Transformers:

How many episodes does AI Post Transformers have?

The podcast currently has 562 episodes available.