
Sign up to save your podcasts
Or


In this episode of Tech Leadership with Fexingo, Lucas and Luna dive into a case study of a mid-stage SaaS company that slashed its mean time to acknowledge (MTTA) from 12 minutes to under 4 minutes — a 70% improvement — without adding headcount or buying expensive tools. They break down the three specific changes the team made: redesigning the on-call rotation to use a 'follow-the-sun' model, implementing a tiered escalation protocol that routes alerts based on severity, and introducing a 'swarming' practice where the first responder owns the incident until resolution. Lucas shares why most incident response improvements fail because teams optimize for alert volume instead of alert quality, and Luna pushes back on whether these practices scale beyond small teams. They also discuss how the team used a simple pre-mortem exercise to identify their biggest bottlenecks before making changes. This episode is packed with actionable advice for engineering leaders looking to reduce burnout and improve reliability.
#IncidentResponse #OnCall #EngineeringLeadership #SiteReliabilityEngineering #DevOps #IncidentManagement #FollowTheSun #Swarming #AlertFatigue #MeanTimeToAcknowledge #MTTA #PreMortem #BurnoutPrevention #Observability #TechLeadershipWithFexingo #FexingoBusiness #BusinessPodcast #Technology
Keep every episode free: buymeacoffee.com/fexingo
By FexingoIn this episode of Tech Leadership with Fexingo, Lucas and Luna dive into a case study of a mid-stage SaaS company that slashed its mean time to acknowledge (MTTA) from 12 minutes to under 4 minutes — a 70% improvement — without adding headcount or buying expensive tools. They break down the three specific changes the team made: redesigning the on-call rotation to use a 'follow-the-sun' model, implementing a tiered escalation protocol that routes alerts based on severity, and introducing a 'swarming' practice where the first responder owns the incident until resolution. Lucas shares why most incident response improvements fail because teams optimize for alert volume instead of alert quality, and Luna pushes back on whether these practices scale beyond small teams. They also discuss how the team used a simple pre-mortem exercise to identify their biggest bottlenecks before making changes. This episode is packed with actionable advice for engineering leaders looking to reduce burnout and improve reliability.
#IncidentResponse #OnCall #EngineeringLeadership #SiteReliabilityEngineering #DevOps #IncidentManagement #FollowTheSun #Swarming #AlertFatigue #MeanTimeToAcknowledge #MTTA #PreMortem #BurnoutPrevention #Observability #TechLeadershipWithFexingo #FexingoBusiness #BusinessPodcast #Technology
Keep every episode free: buymeacoffee.com/fexingo