May 21, 2026

Cloud Fragility & Distributed Systems with Somtochi Onyekwere

46 minutes

In Elixir Wizards S15E04, Charles Suggs and Emma Whamond are joined by Somtochi Onyekwere, a software engineer at Fly.io and contributor to the Corrosion distributed database project, to talk about distributed systems, infrastructure resilience, and the growing fragility of centralized cloud platforms.

We discuss what recent outages across major providers reveal about modern infrastructure and why more teams are starting to rethink assumptions around reliability, failover, and system design. Somtochi explains how Fly.io approaches geographic distribution, eventual consistency, and replication across nodes, along with the trade-offs that come with building systems this way.

The conversation explores CRDTs (Conflict-free Replicated Data Types), consensus, split-brain prevention, and what actually happens when distributed systems fail in production. We also talk about testing strategies, rollback planning, property-based testing tools, and how teams can reduce blast radius when things inevitably go wrong.

Along the way, we discuss AI infrastructure, sandboxing AI agents, and how newer workloads may add pressure to already centralized systems. The episode closes with practical advice for developers who want to build more resilient applications without over-complicating their architecture.

Topics Discussed in this Episode:

Corrosion and distributed database replication

Centralized cloud fragility and recent outage patterns

Distributed systems versus traditional cloud architectures

Multi-region deployment strategies for Phoenix applications

CRDTs and conflict resolution in distributed systems

Eventual consistency versus strict consistency tradeoffs

Consensus, leader election, and split-brain prevention

Testing failover and recovery scenarios

Property-based testing and Antithesis

Rollback planning for database schema migrations

Reducing blast radius through system isolation

Health checks and blue-green deployment strategies

Fly Proxy request routing and replay behavior

Cross-region synchronization and replication challenges

Single points of failure inside “redundant” systems

Backup restoration testing and disaster recovery planning

Network partitions and failure handling in production

Infrastructure monitoring and operational visibility

AI infrastructure workloads and operational strain

Sandboxing and securing AI agents

Sprites and AI workflows at Fly.io

Latency improvements from geographic distribution

Distributed systems tradeoffs in real-world environments

Transitive dependency failures across cloud providers

Practical resilience strategies for modern engineering teams

Links Mentioned:

https://fly.io

https://github.com/superfly/corrosion

https://docs.gitops.weaveworks.org/

FluxCD https://fluxcd.io/

Fly.io Stateful Sandbox Environments https://sprites.dev/

Cloudflare Workers AI Inference Platform https://www.cloudflare.com/products/workers-ai/

“An AI Agent Just Destroyed Our Production Data. It Confessed in Writing” Twitter post from PocketOS founder: https://x.com/lifeof_jer/status/2048103471019434248

Oct 2025 AWS Outage https://www.theguardian.com/technology/2025/oct/24/amazon-reveals-cause-of-aws-outage

Dec 2025 Cloudflare Outage https://www.theguardian.com/technology/2025/dec/05/another-cloudflare-outage-takes-down-websites-linkedin-zoom

July 2025 Crowdstrike Outage https://www.ibm.com/think/news/recent-crowdstrike-outage-what-you-should-know

March 2026 Stryker Cyber Attack https://www.stryker.com/us/en/about/news/2026/a-message-to-our-customers-03-2026.html

https://aws.amazon.com/

https://cloud.google.com/

https://azure.microsoft.com/en-us

https://fly.io/docs/elixir/

CRDTs!! https://smartlogic.io/podcast/elixir-wizards/s13-e03-local-first-liveview-svelte-pwa/

https://antithesis.com/docs/resources/property_based_testing/

https://hex.pm/packages/proper

...more

View all episodes

By SmartLogic LLC

4.9

2222 ratings

May 21, 2026

Cloud Fragility & Distributed Systems with Somtochi Onyekwere

46 minutes

Topics Discussed in this Episode:

Corrosion and distributed database replication

Centralized cloud fragility and recent outage patterns

Distributed systems versus traditional cloud architectures

Multi-region deployment strategies for Phoenix applications

CRDTs and conflict resolution in distributed systems

Eventual consistency versus strict consistency tradeoffs

Consensus, leader election, and split-brain prevention

Testing failover and recovery scenarios

Property-based testing and Antithesis

Rollback planning for database schema migrations

Reducing blast radius through system isolation

Health checks and blue-green deployment strategies

Fly Proxy request routing and replay behavior

Cross-region synchronization and replication challenges

Single points of failure inside “redundant” systems

Backup restoration testing and disaster recovery planning

Network partitions and failure handling in production

Infrastructure monitoring and operational visibility

AI infrastructure workloads and operational strain

Sandboxing and securing AI agents

Sprites and AI workflows at Fly.io

Latency improvements from geographic distribution

Distributed systems tradeoffs in real-world environments

Transitive dependency failures across cloud providers

Practical resilience strategies for modern engineering teams

Links Mentioned:

https://fly.io

https://github.com/superfly/corrosion

https://docs.gitops.weaveworks.org/

FluxCD https://fluxcd.io/

Fly.io Stateful Sandbox Environments https://sprites.dev/

Cloudflare Workers AI Inference Platform https://www.cloudflare.com/products/workers-ai/

“An AI Agent Just Destroyed Our Production Data. It Confessed in Writing” Twitter post from PocketOS founder: https://x.com/lifeof_jer/status/2048103471019434248

Oct 2025 AWS Outage https://www.theguardian.com/technology/2025/oct/24/amazon-reveals-cause-of-aws-outage

Dec 2025 Cloudflare Outage https://www.theguardian.com/technology/2025/dec/05/another-cloudflare-outage-takes-down-websites-linkedin-zoom

July 2025 Crowdstrike Outage https://www.ibm.com/think/news/recent-crowdstrike-outage-what-you-should-know

March 2026 Stryker Cyber Attack https://www.stryker.com/us/en/about/news/2026/a-message-to-our-customers-03-2026.html

https://aws.amazon.com/

https://cloud.google.com/

https://azure.microsoft.com/en-us

https://fly.io/docs/elixir/

CRDTs!! https://smartlogic.io/podcast/elixir-wizards/s13-e03-local-first-liveview-svelte-pwa/

https://antithesis.com/docs/resources/property_based_testing/

https://hex.pm/packages/proper

...more

Share Cloud Fragility & Distributed Systems with Somtochi Onyekwere

Sign up to save your podcasts

Cloud Fragility & Distributed Systems with Somtochi Onyekwere

Cloud Fragility & Distributed Systems with Somtochi Onyekwere