Search Off the Record

Analysing Robots.txt at scale with HTTP Archive and BigQuery


Listen Later

In this episode of Search Off the Record, Martin and Gary turn a simple robots.txt question into a data‑driven deep dive using HTTP Archive, WebPageTest, custom JavaScript metrics, and BigQuery. They explore how millions of real robots.txt files are actually written in 2025–2026, which directives and user‑agents are most common, and what that means for modern crawling and AI bots.

Perfect for beginner to mid‑level developers and SEOs, you'll learn how large‑scale web measurement works (HTTP Archive, Chrome UX Report, Web Almanac), and how to turn raw crawl data into actionable SEO insights. Subscribe for more candid conversations about crawling, indexing, and the data behind how Google Search and the web really work.

Resources:

Web Almanac → https://almanac.httparchive.org/en/2025/ Robotstxt custom metric for the HTTP Archive → https://github.com/HTTPArchive/custom-metrics/pull/191 robots.txt parser change → https://github.com/google/robotstxt/commit/4af32e54b715442bb04cd0470e99192f0ffb9792#commitcomment-178586774

Episode transcript → https://goo.gle/sotr108-transcript

Listen to more Search Off the Record → https://goo.gle/sotr-yt Subscribe to Google Search Channel → https://goo.gle/SearchCentral

Search Off the Record is a podcast series that takes you behind the scenes of Google Search with the Search Relations team.

#SOTRpodcast #SEO #GoogleSearch

Speakers: Martin Splitt, Gary Illyes

...more
View all episodesView all episodes
Download on the App Store

Search Off the RecordBy Google

  • 4.2
  • 4.2
  • 4.2
  • 4.2
  • 4.2

4.2

137 ratings


More shows like Search Off the Record

View all
Planet Money by NPR

Planet Money

30,736 Listeners

Pivot by New York Magazine

Pivot

9,645 Listeners

Social Media Marketing Podcast by Michael Stelzner, Social Media Examiner

Social Media Marketing Podcast

1,445 Listeners

The Digital Marketing Podcast by Daniel Rowles and Ciaran Rogers

The Digital Marketing Podcast

115 Listeners

Marketing School - Digital Marketing and Online Marketing Tips by Eric Siu and Neil Patel

Marketing School - Digital Marketing and Online Marketing Tips

1,254 Listeners

Syntax - Tasty Web Development Treats by Wes Bos & Scott Tolinski - Full Stack JavaScript Web Developers

Syntax - Tasty Web Development Treats

984 Listeners

The Diary Of A CEO with Steven Bartlett by DOAC

The Diary Of A CEO with Steven Bartlett

8,542 Listeners

My First Million by Hubspot Media

My First Million

2,674 Listeners

Morning Brew Daily by Morning Brew

Morning Brew Daily

3,037 Listeners

All-In with Chamath, Jason, Sacks & Friedberg by All-In Podcast, LLC

All-In with Chamath, Jason, Sacks & Friedberg

10,178 Listeners

No Stupid Questions by Freakonomics Radio + Stitcher

No Stupid Questions

3,631 Listeners

A Bit of Optimism by Simon Sinek

A Bit of Optimism

2,222 Listeners

Prof G Markets by Vox Media Podcast Network

Prof G Markets

1,486 Listeners

AI Explored by Michael Stelzner, Social Media Examiner—AI marketing

AI Explored

94 Listeners

OpenAI Podcast by OpenAI

OpenAI Podcast

58 Listeners