Data Engineering Podcast

From Bits to Tables: The Evolution of S3 Storage


Listen Later

Summary
In this episode of the Data Engineering Podcast Andy Warfield talks about the innovative functionalities of S3 Tables and Vectors and their integration into modern data stacks. Andy shares his journey through the tech industry and his role at Amazon, where he collaborates to enhance storage capabilities, discussing the evolution of S3 from a simple storage solution to a sophisticated system supporting advanced data types like tables and vectors crucial for analytics and AI-driven applications. He explains the motivations behind introducing S3 Tables and Vectors, highlighting their role in simplifying data management and enhancing performance for complex workloads, and shares insights into the technical challenges and design considerations involved in developing these features. The conversation explores potential applications of S3 Tables and Vectors in fields like AI, genomics, and media, and discusses future directions for S3's development to further support data-driven innovation.

Announcements
  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • Tired of data migrations that drag on for months or even years? What if I told you there's a way to cut that timeline by up to 6x while guaranteeing accuracy? Datafold's Migration Agent is the only AI-powered solution that doesn't just translate your code; it validates every single data point to ensure perfect parity between your old and new systems. Whether you're moving from Oracle to Snowflake, migrating stored procedures to dbt, or handling complex multi-system migrations, they deliver production-ready code with a guaranteed timeline and fixed price. Stop burning budget on endless consulting hours. Visit dataengineeringpodcast.com/datafold to book a demo and see how they're turning months-long migration nightmares into week-long success stories.
  • Your host is Tobias Macey and today I'm interviewing Andy Warfield about S3 Tables and Vectors
Interview
  • Introduction
  • How did you get involved in the area of data management?
  • Can you describe what your goals are with the Tables and Vector features of S3?
  • How did the experience of building S3 Tables inform your work on S3 Vectors?
  • There are numerous implementations of vector storage and search. How do you view the role of S3 in the context of that ecosystem?
  • The most directly analogous implementation that I'm aware of is the Lance table format. How would you compare the implementation and capabilities of Lance with what you are building with S3 Vectors?
    • What opportunity do you see for being able to offer a protocol compatible implementation similar to the Iceberg compatibility that you provide with S3 Tables?
  • Can you describe the technical implementation of the Vectors functionality in S3?
    • What are the sources of inspiration that you looked to in designing the service?
  • Can you describe some of the ways that S3 Vectors might be integrated into a typical AI application?
  • What are the most interesting, innovative, or unexpected ways that you have seen S3 Tables/Vectors used?
  • What are the most interesting, unexpected, or challenging lessons that you have learned while working on S3 Tables/Vectors?
  • When is S3 the wrong choice for Iceberg or Vector implementations?
  • What do you have planned for the future of S3 Tables and Vectors?
Contact Info
  • LinkedIn
Parting Question
  • From your perspective, what is the biggest gap in the tooling or technology for data management today?
Closing Announcements
  • Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.
  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
  • If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.
Links
  • S3 Tables
  • S3 Vectors
  • S3 Express
  • Parquet
  • Iceberg
  • Vector Index
  • Vector Database
  • pgvector
  • Embedding Model
  • Retrieval Augmented Generation
  • TwelveLabs
  • Amazon Bedrock
  • Iceberg REST Catalog
  • Log-Structured Merge Tree
  • S3 Metadata
  • Sentence Transformer
  • Spark
  • Trino
  • Daft
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
...more
View all episodesView all episodes
Download on the App Store

Data Engineering PodcastBy Tobias Macey

  • 4.5
  • 4.5
  • 4.5
  • 4.5
  • 4.5

4.5

136 ratings


More shows like Data Engineering Podcast

View all
Software Engineering Radio - the podcast for professional software developers by se-radio@computer.org

Software Engineering Radio - the podcast for professional software developers

270 Listeners

The Changelog: Software Development, Open Source by Changelog Media

The Changelog: Software Development, Open Source

283 Listeners

The Cloudcast by Massive Studios

The Cloudcast

152 Listeners

Thoughtworks Technology Podcast by Thoughtworks

Thoughtworks Technology Podcast

42 Listeners

Data Skeptic by Kyle Polich

Data Skeptic

478 Listeners

Talk Python To Me by Michael Kennedy

Talk Python To Me

583 Listeners

Software Engineering Daily by Software Engineering Daily

Software Engineering Daily

626 Listeners

Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

294 Listeners

Python Bytes by Michael Kennedy and Brian Okken

Python Bytes

214 Listeners

DataFramed by DataCamp

DataFramed

268 Listeners

Practical AI by Practical AI LLC

Practical AI

191 Listeners

The Stack Overflow Podcast by The Stack Overflow Podcast

The Stack Overflow Podcast

65 Listeners

The Real Python Podcast by Real Python

The Real Python Podcast

141 Listeners

Latent Space: The AI Engineer Podcast by swyx + Alessio

Latent Space: The AI Engineer Podcast

89 Listeners

The Pragmatic Engineer by Gergely Orosz

The Pragmatic Engineer

61 Listeners