Screaming in the Cloud

Episode 34: Slack and the Safety Dance of Chaos Engineering


Listen Later

In the early days, angry nerd corners on the Internet viewed Slack and some of its predecessors as, “Oh, it’s just IRC. Now, you pay someone for it.” Many fell into that trap of wondering about what value such systems offered.The big differentiator? Slack is built as a collaborative business tool.

Today, we’re talking to Holly Allen, who helped make government software better while  serving as the director of engineering at 18F. Now, she’s a senior engineering manager at Slack, a collaborative chat program where you can do most of your work through a rich platform of integrations. Holly enjoys taking a weird set of skills that make a computer do things and convincing people who know how to make computers do things do things.

Some of the highlights of the show include:

  • Safety engineering brings chaos and resilience engineering, incident management, and post-mortem processes together for resiliency and reliability
  • Slack strives to move really fast while being in complete control
  • Slack is primarily on AWS, but is working on a multi-Cloud strategy because if AWS is down, Slack still needs to work
  • Slack has a close relationship with AWS and is a collaborative company; it has immediate access to AWS staff anytime there’s a problem
  • Slack uses Terraform and Chef and working to determine if its production workflows in Kubernetes would be worthwhile
  • Disasterpiece Theater: Real scenario that might happen and surmise what will happen; don’t cause production issues, but teach Slack employees
  • Slack hires collaborative, empathetic people to create a collaborative environment where everyone works together toward a goal
  • Slack was firmly in a centralized operations model, but is transforming toward development teams to increase responsibility and service ownership
  • Slack doesn’t encourage remote work because it’s not in a position to put in that investment; day-to-day work happens in hallways and between desks
  • Slack sees itself as an enterprise software company; an enterprise software company must have enterprise software reliability, stability, and processes
  • Slack has thousands of servers, so events and disruptions happen more often; system needs to respond, react, and repair itself without human intervention
  • Links:

    • Holly Allen on Twitter
    • 18F
    • Slack
    • Freenode IRC
    • HipChat
    • AWS
    • Kubernetes
    • Terraform
    • Chef
    • QCon
    • Datadog
    • .
      ...more
      View all episodesView all episodes
      Download on the App Store

      Screaming in the CloudBy Corey Quinn

      • 4.7
      • 4.7
      • 4.7
      • 4.7
      • 4.7

      4.7

      92 ratings


      More shows like Screaming in the Cloud

      View all
      Software Engineering Radio - the podcast for professional software developers by se-radio@computer.org

      Software Engineering Radio - the podcast for professional software developers

      272 Listeners

      The Changelog: Software Development, Open Source by Changelog Media

      The Changelog: Software Development, Open Source

      284 Listeners

      The Cloudcast by Massive Studios

      The Cloudcast

      152 Listeners

      Thoughtworks Technology Podcast by Thoughtworks

      Thoughtworks Technology Podcast

      40 Listeners

      Software Engineering Daily by Software Engineering Daily

      Software Engineering Daily

      621 Listeners

      Soft Skills Engineering by Jamison Dance and Dave Smith

      Soft Skills Engineering

      269 Listeners

      AWS Podcast by Amazon Web Services

      AWS Podcast

      202 Listeners

      Python Bytes by Michael Kennedy and Brian Okken

      Python Bytes

      215 Listeners

      Data Engineering Podcast by Tobias Macey

      Data Engineering Podcast

      141 Listeners

      Syntax - Tasty Web Development Treats by Wes Bos & Scott Tolinski - Full Stack JavaScript Web Developers

      Syntax - Tasty Web Development Treats

      987 Listeners

      CoRecursive: Coding Stories by Adam Gordon Bell - Software Developer

      CoRecursive: Coding Stories

      189 Listeners

      Kubernetes Podcast from Google by Abdel Sghiouar, Kaslin Fields

      Kubernetes Podcast from Google

      181 Listeners

      AWS Morning Brief by Corey Quinn

      AWS Morning Brief

      77 Listeners

      The Stack Overflow Podcast by The Stack Overflow Podcast

      The Stack Overflow Podcast

      62 Listeners

      Oxide and Friends by Oxide Computer Company

      Oxide and Friends

      47 Listeners