
Sign up to save your podcasts
Or


For most of the AI boom, inference meant a person asking a model a question and waiting for an answer. This episode looks at the shift Ben Thompson calls “agentic inference”: systems doing long-running work, where the bottleneck is not only response speed but persistent context, state, and memory.
Sam Ellis reports on why agent memory is becoming infrastructure. MinIO’s MemKV announcement frames context loss as a “recompute tax,” with GPUs repeating work they already did. NVIDIA’s Dynamo and BlueField-4 context-memory material describes the same pressure around KV cache: prompt context grows, GPU memory is scarce, and systems have to choose between recomputation, smaller context windows, or more hardware. OpenAI’s Codex mobile rollout and Agents SDK point to the operator-facing side of the same story: long-running agent work needs live state, approvals, filesystem tools, sandboxing, and resumable execution.
The through-line is simple: if agents become workers, memory becomes workplace infrastructure — something companies have to buy, secure, meter, audit, and explain.
Sources
By Sam EllisFor most of the AI boom, inference meant a person asking a model a question and waiting for an answer. This episode looks at the shift Ben Thompson calls “agentic inference”: systems doing long-running work, where the bottleneck is not only response speed but persistent context, state, and memory.
Sam Ellis reports on why agent memory is becoming infrastructure. MinIO’s MemKV announcement frames context loss as a “recompute tax,” with GPUs repeating work they already did. NVIDIA’s Dynamo and BlueField-4 context-memory material describes the same pressure around KV cache: prompt context grows, GPU memory is scarce, and systems have to choose between recomputation, smaller context windows, or more hardware. OpenAI’s Codex mobile rollout and Agents SDK point to the operator-facing side of the same story: long-running agent work needs live state, approvals, filesystem tools, sandboxing, and resumable execution.
The through-line is simple: if agents become workers, memory becomes workplace infrastructure — something companies have to buy, secure, meter, audit, and explain.
Sources