Share 10-Minute System Design
Share to email
Share to Facebook
Share to X
By 10min Tech
The podcast currently has 16 episodes available.
In this episode, we'll take a look at Meta’s ambitious approach to scaling large language models. We'll explore the shift from handling many smaller models for recommendation engines to building colossal generative AI models, and the immense challenges that come with it. From hardware and software optimizations to managing power and dealing with inevitable hardware failures, we'll break down the critical pieces that make Meta's infrastructure tick. What does it take to run systems this large without breaking? Tune in to learn how Meta did it.
In this episode, let's explore how Netflix revamped their video processing pipeline, moving from a monolithic system to a microservices architecture. What drove such a major shift? You'll hear how their original platform, Reloaded, couldn’t keep up with Netflix’s rapid pace of innovation, and why Cosmos, their new system, is now the backbone of everything from streaming to studio operations. But what challenges did they face along the way? And is Cosmos truly the future-proof solution it promises to be? Tune in and find out.
In this episode, we'll explore the intricate system and architecture design behind Apple's iCloud. We'll break down how Apple seamlessly handles billions of users by combining Cassandra and FoundationDB to power iCloud's backbone. What prompted Apple to shift from Cassandra to FoundationDB, and how does this choice impact scalability and performance? Get a closer look at the architecture that makes iCloud tick, and discover how it enables such a smooth user experience. The surprising reason behind Apple’s tech pivot might just change the way you think about designing cloud storage systems.
In this episode, we explore the system behind Uber's driver-matching functionality, capable of handling an incredible one million requests per second. We break down the key technologies that make it work, from H3, the hexagonal grid system for location indexing, to Ringpop, which scales services across servers. You'll hear about how GPS data is transformed into road segments, and how databases like Cassandra and Redis power this high-demand platform. Whether you're curious about large-scale systems or just fascinated by Uber's tech, this episode simplifies complex engineering into something anyone can understand.
In this episode, we'll learn how Instagram scaled to 2.5 billion users. We'll discuss the major challenges Instagram faced — from resource constraints to data consistency and performance, and unpack the innovative strategies the team used to tackle them. From replacing Python with more performant languages to leveraging Cassandra for distributed data storage, we'll learn how Instagram managed to keep things running smoothly at such massive scale. Curious how they did it? Tune in to hear how a mix of clever optimizations and solid technology choices helped them manage internet-scale traffic.
In this episode, we explore how Facebook engineers scaled Memcached, the open-source caching system, to handle billions of requests and trillions of items. We’ll break down the challenges they faced and the smart solutions they developed — from reducing latency to optimizing memory usage. Join us as we uncover how they transitioned from a single cluster to a distributed system spread across the globe, tackling data replication, load balancing, and more. If you’re curious about the inner workings of high-performance caching at massive scale, this one’s for you.
In this episode, we explore another important piece of technology from Google: Spanner — a globally distributed database that reshapes how massive datasets are managed. We’ll talk about its unique architecture, including the TrueTime API, which solves clock uncertainty to ensure consistency across data centers. We’ll also cover Spanner’s concurrency control, two-phase commit, and lock-free read-only transactions. Plus, discover how Google’s ad platform, F1, leverages Spanner to handle millions of transactions with impressive speed and reliability.
This episode focuses on Kafka, the distributed messaging system born at LinkedIn. Learn how Kafka was designed to tackle the massive streams of log data driving personalized recommendations, search algorithms, and real-time security. We'll explore how it outperforms traditional systems like ActiveMQ and RabbitMQ with its streamlined architecture, decentralized coordination, and focus on efficiency. Tune in to explore Kafka's unique design and how it’s becoming essential for modern data processing.
Ever wondered how multiple processes can safely share resources without stepping on each other's toes? In this episode, we'll talk about Redis's distributed lock and discover how it ensures mutual exclusion for shared resources across a network of Redis servers, allowing only one process at a time to gain access. We’ll delve into its safety and liveness properties that guarantee reliable lock management, even amidst failures. Join us as we unpack potential challenges like network partitions and discuss solutions that improve the Redlock algorithm's resilience.
In this episode, we take a closer look at the Hadoop Distributed File System (HDFS), a key part of the Hadoop framework that helps store and manage huge amounts of data. We’ll explore how HDFS spreads data across many affordable servers, making it both scalable and cost-effective. You’ll learn about its main components, like the NameNode and DataNodes, and how they work together. We’ll also discuss features that keep your data safe and ensure it moves efficiently. Join us, we’ll touch on the challenges of managing large data clusters and what the future might hold for HDFS.
The podcast currently has 16 episodes available.