The InfoQ Podcast

Oliver Gould About Architecting to Avoid and Recover from Failure


Listen Later

In this week’s podcast, Robert Blumen talks to Oliver Gould at QCon San Francsico 2016. Oliver is the CTO of Buoyant where he leads open source development efforts. Prior to Buoyant he was a Staff Infrastructure Engineer at Twitter where he was technical lead on Observability, Traffic, Configuration and Co-ordination teams.
Why listen to this podcast:
- Stratification allows applications to own their logic while libraries take care of the different mechanisms, such as service discovery and load balancing
- Cascading failures can’t be tested or protected against, so having a fast time to recovery is important
- Having developers own their services with on-call mechanisms improves the reliability of the service; it’s best to optimise automatic restarts so problems can be addressed during normal working hours
- Post mortem analysis of failures are important to improve run books or checklists and to share learning between teams
- Incremental roll out of features with feature flags or weighted routing provides agility while testing with production load, which highlights issues that aren’t seen during limited developer testing
Notes and links can be found on: http://bit.ly/2ivoz9w
4m:05s - Each domain has different failure and operating modes, and the layered approach to resiliency means that the layer handles this automatically
4m:30s - Large systems may fail in unexpected ways
4m:35s - Twitter originally had the “Fail Whale” but this has been phased out as the system has become more stable
4m:50s - As Twitter grew, it needed to move quicker, with more engineers and less whale time
5m:10s - Automation and social tools were needed to improve the situation
More on this - Quick scan our curated show notes on InfoQ: http://bit.ly/2ivoz9w
You can also subscribe to the InfoQ newsletter to receive weekly updates on the hottest topics from professional software development. bit.ly/24x3IVq
...more
View all episodesView all episodes
Download on the App Store

The InfoQ PodcastBy InfoQ

  • 4.8
  • 4.8
  • 4.8
  • 4.8
  • 4.8

4.8

37 ratings


More shows like The InfoQ Podcast

View all
Hanselminutes with Scott Hanselman by Scott Hanselman

Hanselminutes with Scott Hanselman

377 Listeners

Software Engineering Radio - the podcast for professional software developers by se-radio@computer.org

Software Engineering Radio - the podcast for professional software developers

272 Listeners

.NET Rocks! by Carl Franklin and Richard Campbell

.NET Rocks!

246 Listeners

The Changelog: Software Development, Open Source by Changelog Media

The Changelog: Software Development, Open Source

283 Listeners

The Cloudcast by Massive Studios

The Cloudcast

152 Listeners

Thoughtworks Technology Podcast by Thoughtworks

Thoughtworks Technology Podcast

42 Listeners

Software Engineering Daily by Software Engineering Daily

Software Engineering Daily

624 Listeners

Soft Skills Engineering by Jamison Dance and Dave Smith

Soft Skills Engineering

268 Listeners

AWS Podcast by Amazon Web Services

AWS Podcast

203 Listeners

Engineering Culture by InfoQ by InfoQ

Engineering Culture by InfoQ

12 Listeners

CoRecursive: Coding Stories by Adam Gordon Bell - Software Developer

CoRecursive: Coding Stories

189 Listeners

Kubernetes Podcast from Google by Abdel Sghiouar, Kaslin Fields

Kubernetes Podcast from Google

181 Listeners

Practical AI by Practical AI LLC

Practical AI

189 Listeners

The Stack Overflow Podcast by The Stack Overflow Podcast

The Stack Overflow Podcast

64 Listeners

The Pragmatic Engineer by Gergely Orosz

The Pragmatic Engineer

52 Listeners