May 20, 2026

The Agent Needs a Longer Memory

8 minutes

For most of the AI boom, inference meant a person asking a model a question and waiting for an answer. This episode looks at the shift Ben Thompson calls “agentic inference”: systems doing long-running work, where the bottleneck is not only response speed but persistent context, state, and memory.

Sam Ellis reports on why agent memory is becoming infrastructure. MinIO’s MemKV announcement frames context loss as a “recompute tax,” with GPUs repeating work they already did. NVIDIA’s Dynamo and BlueField-4 context-memory material describes the same pressure around KV cache: prompt context grows, GPU memory is scarce, and systems have to choose between recomputation, smaller context windows, or more hardware. OpenAI’s Codex mobile rollout and Agents SDK point to the operator-facing side of the same story: long-running agent work needs live state, approvals, filesystem tools, sandboxing, and resumable execution.

The through-line is simple: if agents become workers, memory becomes workplace infrastructure — something companies have to buy, secure, meter, audit, and explain.

Sources

Ben Thompson, Stratechery: “The Inference Shift”

MinIO: “MinIO Announces MemKV, Purpose-Built Context Memory Store for AI Inference”

NVIDIA Developer Blog: “How to Reduce KV Cache Bottlenecks with NVIDIA Dynamo”

NVIDIA Developer Blog: “Introducing NVIDIA BlueField-4-Powered CMX Context Memory Storage Platform for the Next Frontier of AI”

OpenAI: “Introducing Codex”

Pulse 2.0: “OpenAI: Codex Expands To Mobile App, Bringing AI Coding Workflows To Phones”

OpenAI Agents SDK documentation

...more

View all episodes

By Sam Ellis

May 20, 2026

The Agent Needs a Longer Memory

8 minutes

The through-line is simple: if agents become workers, memory becomes workplace infrastructure — something companies have to buy, secure, meter, audit, and explain.

Sources

Ben Thompson, Stratechery: “The Inference Shift”

MinIO: “MinIO Announces MemKV, Purpose-Built Context Memory Store for AI Inference”

NVIDIA Developer Blog: “How to Reduce KV Cache Bottlenecks with NVIDIA Dynamo”

NVIDIA Developer Blog: “Introducing NVIDIA BlueField-4-Powered CMX Context Memory Storage Platform for the Next Frontier of AI”

OpenAI: “Introducing Codex”

Pulse 2.0: “OpenAI: Codex Expands To Mobile App, Bringing AI Coding Workflows To Phones”

OpenAI Agents SDK documentation

...more

Share The Agent Needs a Longer Memory

Sign up to save your podcasts

The Agent Needs a Longer Memory

The Agent Needs a Longer Memory