Data Engineering Podcast

From RAG to Relational: How Agentic Patterns Are Reshaping Data Architecture


Listen Later

Summary
In this episode of the AI Engineering Podcast Mark Brooker, VP and Distinguished Engineer at AWS, talks about how agentic workflows are transforming database usage and infrastructure design. He discusses the evolving role of data in AI systems, from traditional models to more modern approaches like vectors, RAG, and relational databases. Mark explains why agents require serverless, elastic, and operationally simple databases, and how AWS solutions like Aurora and DSQL address these needs with features such as rapid provisioning, automated patching, geodistribution, and spiky usage. The conversation covers topics including tool calling, improved model capabilities, state in agents versus stateless LLM calls, and the role of Lambda and AgentCore for long-running, session-isolated agents. Mark also touches on the shift from local MCP tools to secure, remote endpoints, the rise of object storage as a durable backplane, and the need for better identity and authorization models. The episode highlights real-world patterns like agent-driven SQL fuzzing and plan analysis, while identifying gaps in simplifying data access, hardening ops for autonomous systems, and evolving serverless database ergonomics to keep pace with agentic development.

Announcements
  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • Data teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.
  • Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.
  • Your host is Tobias Macey and today I'm interviewing Marc Brooker about the impact of agentic workflows on database usage patterns and how they change the architectural requirements for databases
Interview
  • Introduction
  • How did you get involved in the area of data management?
  • Can you describe what the role of the database is in agentic workflows?
    • There are numerous types of databases, with relational being the most prevalent. How does the type and purpose of an agent inform the type of database that should be used?
  • Anecdotally I have heard about how agentic workloads have become the predominant "customers" of services like Neon and Fly.io. How would you characterize the different patterns of scale for agentic AI applications? (e.g. proliferation of agents, monolithic agents, multi-agent, etc.)
  • What are some of the most significant impacts on workload and access patterns for data storage and retrieval that agents introduce?
    • What are the categorical differences in that behavior as compared to programmatic/automated systems?
  • You have spent a substantial amount of time on Lambda at AWS. Given that LLMs are effectively stateless, how does the added ephemerality of serverless functions impact design and performance considerations around having to "re-hydrate" context when interacting with agents?
  • What are the most interesting, innovative, or unexpected ways that you have seen serverless and database systems used for agentic workloads?
  • What are the most interesting, unexpected, or challenging lessons that you have learned while working on technologies that are supporting agentic applications?
Contact Info
  • Blog
  • LinkedIn
Parting Question
  • From your perspective, what is the biggest gap in the tooling or technology for data management today?
Closing Announcements
  • Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.
  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
  • If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.
Links
  • AWS Aurora DSQL
  • AWS Lambda
  • Three Tier Architecture
  • Vector Database
  • Graph Database
  • Relational Database
  • Vector Embedding
  • RAG == Retrieval Augmented Generation
    • AI Engineering Podcast Episode
  • GraphRAG
    • AI Engineering Podcast Episode
  • LLM Tool Calling
  • MCP == Model Context Protocol
  • A2A == Agent 2 Agent Protocol
  • AWS Bedrock AgentCore
  • Strands
  • LangChain
  • Kiro
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
...more
View all episodesView all episodes
Download on the App Store

Data Engineering PodcastBy Tobias Macey

  • 4.5
  • 4.5
  • 4.5
  • 4.5
  • 4.5

4.5

140 ratings


More shows like Data Engineering Podcast

View all
Software Engineering Radio by se-radio@computer.org

Software Engineering Radio

271 Listeners

The Changelog: Software Development, Open Source by Changelog Media

The Changelog: Software Development, Open Source

291 Listeners

Software Engineering Daily by Software Engineering Daily

Software Engineering Daily

624 Listeners

The Cloudcast by Massive Studios

The Cloudcast

155 Listeners

Talk Python To Me by Michael Kennedy

Talk Python To Me

588 Listeners

Thoughtworks Technology Podcast by Thoughtworks

Thoughtworks Technology Podcast

41 Listeners

Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

301 Listeners

Python Bytes by Michael Kennedy and Brian Okken

Python Bytes

214 Listeners

Syntax - Tasty Web Development Treats by Wes Bos & Scott Tolinski - Full Stack JavaScript Web Developers

Syntax - Tasty Web Development Treats

987 Listeners

DataFramed by DataCamp

DataFramed

268 Listeners

Practical AI by Practical AI LLC

Practical AI

211 Listeners

AWS Podcast by Amazon Web Services

AWS Podcast

203 Listeners

The Stack Overflow Podcast by The Stack Overflow Podcast

The Stack Overflow Podcast

62 Listeners

The Real Python Podcast by Real Python

The Real Python Podcast

141 Listeners

Latent Space: The AI Engineer Podcast by swyx + Alessio

Latent Space: The AI Engineer Podcast

97 Listeners