
Sign up to save your podcasts
Or


R. Tyler Croy, a principal engineer at Scribd, joins Corey Quinn to explain what happens when simple tasks cost $100,000. Checking if files are damaged? $100K. Using newer S3 tools? Way too expensive. Normal solutions don't work anymore. Tyler shares how with this much data, you can't just throw money at the problem, but rather you have to engineer your way out.
About R. Tyler:
R. Tyler Croy leads infrastructure architecture at Scribd and has been an open source developer for over 14 years. His work spans the FreeBSD, Python, Ruby, Puppet, Jenkins, and Delta Lake communities. Under his leadership, Scribd’s Infrastructure Engineering team built Delta Lake for Rust to support a wide variety of high performance data processing systems. That experience led to Tyler developing the next big iteration of storage architecture to power large-scale fulltext compute challenges facing the organization.
Show Highlights:
01:48 Scribd's 18-Year History
04:00 One Document Becomes Billions of Files
05:47 When Normal Physics Stop Working
08:02 Why S3 Metadata Costs Too Much
10:50 How AI Made Old Documents Valuable
13:30 From 100 Billion to 100 Million Objects
15:05 The Curse of Retail Pricing
19:17 How Data Scientists Create Growth
21:18 De-Normalizing Data Problems
25:29 Evolving Old Systems
27:45 Billions Added Since Summer
29:29 Underused S3 Features
31:48 Where to Find Tyler
Links:
Scribd: https://tech.scribd.com
Mastodon: https://hacky.town/@rtyler
GitHub: https://github.com/rtyler
Sponsored by:
duckbillhq.com
By Corey Quinn4.7
9292 ratings
R. Tyler Croy, a principal engineer at Scribd, joins Corey Quinn to explain what happens when simple tasks cost $100,000. Checking if files are damaged? $100K. Using newer S3 tools? Way too expensive. Normal solutions don't work anymore. Tyler shares how with this much data, you can't just throw money at the problem, but rather you have to engineer your way out.
About R. Tyler:
R. Tyler Croy leads infrastructure architecture at Scribd and has been an open source developer for over 14 years. His work spans the FreeBSD, Python, Ruby, Puppet, Jenkins, and Delta Lake communities. Under his leadership, Scribd’s Infrastructure Engineering team built Delta Lake for Rust to support a wide variety of high performance data processing systems. That experience led to Tyler developing the next big iteration of storage architecture to power large-scale fulltext compute challenges facing the organization.
Show Highlights:
01:48 Scribd's 18-Year History
04:00 One Document Becomes Billions of Files
05:47 When Normal Physics Stop Working
08:02 Why S3 Metadata Costs Too Much
10:50 How AI Made Old Documents Valuable
13:30 From 100 Billion to 100 Million Objects
15:05 The Curse of Retail Pricing
19:17 How Data Scientists Create Growth
21:18 De-Normalizing Data Problems
25:29 Evolving Old Systems
27:45 Billions Added Since Summer
29:29 Underused S3 Features
31:48 Where to Find Tyler
Links:
Scribd: https://tech.scribd.com
Mastodon: https://hacky.town/@rtyler
GitHub: https://github.com/rtyler
Sponsored by:
duckbillhq.com

273 Listeners

382 Listeners

288 Listeners

1,092 Listeners

625 Listeners

150 Listeners

44 Listeners

227 Listeners

988 Listeners

206 Listeners

79 Listeners

62 Listeners

533 Listeners

67 Listeners

638 Listeners