The Data Engineering Show

Postgres vs. Elasticsearch: The Unexpected Winner in High-Stakes Search for Instacart with Ankit Mittal


Listen Later

In this episode of The Data Engineering Show, Benjamin Wagner sits down with Ankit Mittal, former Senior Engineer at Instacart, to explore how they revolutionized their search infrastructure by transitioning from Elasticsearch to PostgreSQL. Learn how Instacart tackled the unique challenges of fast-moving grocery inventory, achieved high-performance search capabilities, and leveraged PostgreSQL extensions for complex retrieval operations. Whether you're scaling search functionality or optimizing database performance, this deep dive offers valuable insights into building robust, production-ready search systems using PostgreSQL.
  • Discover why Instacart moved from Elasticsearch to PostgreSQL for retailer search
  • Learn about handling real-time inventory updates and search optimization
  • Explore PostgreSQL extensions, sharding strategies, and data flow architecture
  • Understand the trade-offs between different search infrastructure approaches

What You'll Learn:

  • How Instacart managed fast-moving grocery inventory data by consolidating search, ranking, and filtering into a single PostgreSQL cluster
  • Why pushing compute closer to the data layer can significantly improve search performance and reduce network calls
  • The architecture decisions behind using PostgreSQL extensions like PG Vector and custom solutions for search functionality
  • How to implement efficient data ingestion through S3-based pipelines and bulk writes instead of real-time updates
  • Why table maintenance operations like PGD pack are crucial for optimizing read throughput in production environments
  • The trade-offs between traditional search engines and relational databases for complex search implementations
  •  The challenges of maintaining self-hosted PostgreSQL in a predominantly cloud-managed environment

If you enjoyed this episode, make sure to subscribe, rate, and review it on Apple Podcasts, Spotify, and YouTube Podcasts. Instructions on how to do this are here.
About the Guest(s)

Ankit is a Software Engineer at ParadeDB and former Senior Engineer at Instacart, where he specialized in PostgreSQL infrastructure and search systems. With extensive experience in database optimization and search architecture, he played a key role in modernizing Instacart's search infrastructure by transitioning from Elasticsearch to a custom PostgreSQL solution. In this episode, Ankit shares deep insights into building and scaling high-performance search systems for e-commerce, particularly focusing on the unique challenges of grocery retail's fast-moving inventory. His work at Instacart revolutionized their single-retailer search functionality, demonstrating how traditional relational databases can be adapted for complex search operations. His expertise in database systems and their practical applications in high-scale environments makes this conversation particularly valuable for engineers interested in modern search architecture and database optimization.
Quotes

"Think about it. If there's a lot of things that you can get the database to do, then the applications become simpler." - Ankit
"My non-Instacart experience has largely been in pre-PMF startups where the approach of abuse your database to its absolute limits works wonders." - Ankit
"Almost everything that we got retrieved had to be filtered out. So we go back to Elasticsearch again." - Ankit

"We traded off the quality of retrieval, hardcore core retrieval, with the whole system reducing the network calls." - Ankit
"It's a place to go to find what item is available, in what store, what item is available, at what price, including full product taxonomy graph and product and ontology." - Ankit
"The grand theme here is that we wanted more control over the cluster, how to spin it off, what kind of disks it would have." - Ankit
"We tell teams who want to have their data in this cluster, create an s3 home, create either a bucket or a home, whatever they want to do, and tell us that we would sync ourselves." - Ankit
"What we found is that the read throughput, we can throw more data if the tables are repacked nicely." - Ankit
"Most engineers who want to work on search, they are more used to the Elasticsearch shape of the query." - Ankit
"The relevance is better because they could join more things in the database. They also saw the cost of the normalized data reduced." - Ankit
Resources

Company Websites:
- Instacart - Grocery delivery platform
- ParadeDB - Database technology company
- Firebolt - Cloud data warehouse (firebolt.io)
Tools & Technologies:

- PostgreSQL - Database system

- Elasticsearch - Search engine

- PG Cat/PG Dog - PostgreSQL proxy tools

- PG Vector - PostgreSQL vector extension

- PG Repack - PostgreSQL table repacking tool

- ClickHouse - Column-oriented DBMS

- TantiVy - Rust-based search engine library

Articles:

- Instacart Search Modernization Blog Posts (Series on hybrid retrieval)

- Target's AlloyDB Migration Blog Post


For Feedback & Discussions on Firebolt Core:

  • Join Firebolt Discord Community
  • Join Firebolt GitHub Discussions
  • Firebolt Core Github Repository 
  • [email protected]

 Primary Speakers:

  • Ankit Mittal  
  • Benjamin Wagner 

The Data Engineering Show is brought to you by firebolt.io and handcrafted by our friends over at: fame.so

Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of The Fundamentals of Data Engineering, Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.

Check out our three most downloaded episodes:
  • Zach Wilson on What Makes a Great Data Engineer
  • Joe Reis and Matt Housley on The Fundamentals of Data Engineering
  • Bill Inmon, The Godfather of Data Warehousing
...more
View all episodesView all episodes
Download on the App Store

The Data Engineering ShowBy The Firebolt Data Bros

  • 3.8
  • 3.8
  • 3.8
  • 3.8
  • 3.8

3.8

8 ratings


More shows like The Data Engineering Show

View all
Marketplace by Marketplace

Marketplace

8,754 Listeners

a16z Podcast by Andreessen Horowitz

a16z Podcast

1,092 Listeners

Software Engineering Daily by Software Engineering Daily

Software Engineering Daily

625 Listeners

Soft Skills Engineering by Jamison Dance and Dave Smith

Soft Skills Engineering

283 Listeners

Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

303 Listeners

Data Engineering Podcast by Tobias Macey

Data Engineering Podcast

145 Listeners

CoRecursive: Coding Stories by Adam Gordon Bell - Software Developer

CoRecursive: Coding Stories

189 Listeners

DataFramed by DataCamp

DataFramed

268 Listeners

Practical AI by Practical AI LLC

Practical AI

212 Listeners

Fiction - Comedy Fiction by The Sunset Explorers

Fiction - Comedy Fiction

6,445 Listeners

Last Week in AI by Skynet Today

Last Week in AI

306 Listeners

Hard Fork by The New York Times

Hard Fork

5,469 Listeners

Lenny's Podcast: Product | Career | Growth by Lenny Rachitsky

Lenny's Podcast: Product | Career | Growth

1,364 Listeners

The AI Daily Brief: Artificial Intelligence News and Analysis by Nathaniel Whittemore

The AI Daily Brief: Artificial Intelligence News and Analysis

586 Listeners

HBR On Leadership by Harvard Business Review

HBR On Leadership

159 Listeners