Search Off the Record

How Googlebot crawls the web


Listen Later

In this episode of Search Off the Record, Martin and Gary from the Google Search Relations team take a deep dive into how Googlebot and web crawling work—past, present, and future. Through their humorous and thoughtful conversation, they explore how crawling evolved from the early days of the internet, when scripts could index a chunk of the web from a single homepage, to the more complex and considerate systems used today. They discuss the basics of what a crawler is, how tools like cURL or Wget relate, and how policies like robots.txt ensure crawlers play nice with web infrastructure.

The conversation also covers Google's internal shift to unified infrastructure for all crawling needs, highlighting how different teams moved from separate crawlers to a shared system that enforces consistent policies. They explain why some fetches bypass robots.txt (like user-initiated actions) and the rising impact of automated traffic from new products and AI agents. With a nod to initiatives like Common Crawl, the episode ends with a look at the road ahead, acknowledging growing internet congestion but remaining optimistic about the web's capacity to adapt.

Resources:

Episode transcript → https://goo.gle/sotr092-transcript

Chapters:

Chapters: 0:00 - Intro 0:53 - What is a Web Crawler? 3:11 - Building a Minimal Crawler 6:12 - Ethical Crawling: Robots.txt & Host Health 7:42 - BackRub and Early Crawling Challenges 11:02 - The Anatomy of a Search Engine Paper 13:09 - Crawling Across Google Products 16:51 - New Crawlers & User Agent Strings 22:38 - Crawlers Beyond Google 23:17 - The Evolution of Crawlers 26:32 - Bad Actors and Overpowering Servers 27:31 - Reducing the Footprint on the Internet 28:44 - The Future of Crawlers 31:29- Conclusion

Listen to more Search Off the Record → https://goo.gle/sotr-yt

Subscribe to Google Search Channel → https://goo.gle/SearchCentral

Search Off the Record is a podcast series that takes you behind the scenes of Google Search with the Search Relations team.

#SOTRpodcast #SEO #SearchOfTheRecord

Speakers: Martin Splitt, Gary Illyes

Products Mentioned: Googlebot, Search

...more
View all episodesView all episodes
Download on the App Store

Search Off the RecordBy Google

  • 4.2
  • 4.2
  • 4.2
  • 4.2
  • 4.2

4.2

137 ratings


More shows like Search Off the Record

View all
Planet Money by NPR

Planet Money

30,642 Listeners

Pivot by New York Magazine

Pivot

9,502 Listeners

Social Media Marketing Podcast by Michael Stelzner, Social Media Examiner

Social Media Marketing Podcast

1,437 Listeners

The Digital Marketing Podcast by Daniel Rowles and Ciaran Rogers

The Digital Marketing Podcast

114 Listeners

Marketing School - Digital Marketing and Online Marketing Tips by Eric Siu and Neil Patel

Marketing School - Digital Marketing and Online Marketing Tips

1,254 Listeners

Syntax - Tasty Web Development Treats by Wes Bos & Scott Tolinski - Full Stack JavaScript Web Developers

Syntax - Tasty Web Development Treats

985 Listeners

The Diary Of A CEO with Steven Bartlett by DOAC

The Diary Of A CEO with Steven Bartlett

8,716 Listeners

My First Million by Hubspot Media

My First Million

2,645 Listeners

Morning Brew Daily by Morning Brew

Morning Brew Daily

3,005 Listeners

All-In with Chamath, Jason, Sacks & Friedberg by All-In Podcast, LLC

All-In with Chamath, Jason, Sacks & Friedberg

9,900 Listeners

No Stupid Questions by Freakonomics Radio + Stitcher

No Stupid Questions

3,653 Listeners

A Bit of Optimism by Simon Sinek

A Bit of Optimism

2,128 Listeners

Prof G Markets by Vox Media Podcast Network

Prof G Markets

1,340 Listeners

AI Explored by Michael Stelzner, Social Media Examiner—AI marketing

AI Explored

85 Listeners

OpenAI Podcast by OpenAI

OpenAI Podcast

52 Listeners