The Chief AI Officer Show

How incident.io built AI agents that draft code fixes within 3 minutes of an alert


Listen Later

Lawrence Jones, product engineer at incident.io, describes how their AI incident response system evolved from basic log summaries to agents that analyze thousands of GitHub PRs and Slack messages to draft remediation pull requests within three minutes of an alert firing. The system doesn't pursue full automation because the real value lies elsewhere: eliminating the diagnostic work that consumes the first 30-60 minutes of incident response, and filtering out the false positives that wake engineers unnecessarily at 3am.

The core architectural decision treats each organization's incident history as a unique immune system rather than fitting generic playbooks. By pre-processing and indexing how a specific company has resolved incidents across dimensions like affected teams, error patterns, and system dependencies, incident.io generates ephemeral runbooks that surface the 3-4 commands that actually worked last time this type of failure occurred. This approach emerged from recognizing that cross-customer meta-models fail because incident response is fundamentally organization-specific: one company's SEV-0 is an airline bankruptcy, another's is a stolen laptop.

The engineering challenge centers on building trust with deeply skeptical SRE teams who view AI as non-deterministic chaos in their deterministic infrastructure. Lawrence's team addresses this through custom Go tooling that enables backtest-driven development: they rerun thousands of historical investigations with different model configurations and prompt changes, then use precision-focused scorecards to prove improvements objectively before deploying. This workflow revealed that traditional product engineers struggle with AI's slow evaluation cycles, while the team succeeded by hiring for methodical ownership over velocity.

Topics discussed:

  • Balancing precision versus recall in agent outputs to earn trust from SRE teams who are "hardcore AI holdouts"

  • Pre-processing incident artifacts (PRs, Slack threads, transcripts) into queryable indexes that cross-reference team ownership, system dependencies, and historical resolution patterns

  • Model selection strategy: GPT-4.1 for cost-effective daily operations, Claude Sonnet for superior code analysis and agentic planning loops

  • Backtest infrastructure that reruns thousands of past investigations with modified prompts to objectively validate changes through scorecard comparisons

  • Building ephemeral runbooks by extracting which historical commands and fixes worked for similar incidents, filtered by what the organization learned NOT to do in subsequent incidents

  • Prioritizing alert noise reduction over autonomous remediation because the false positive problem has clearer ROI and lower risk

  • Why AI engineering teams fail when staffed with traditional engineers optimized for fast feedback loops rather than tolerance for non-deterministic iteration

  • Building entirely custom tooling in Go without vendor frameworks due to early ecosystem constraints and desire for native product integration

  • The evaluation problem where only engineers who invested hundreds of hours building a system can predict how prompt changes cascade through multi-step agentic workflows


...more
View all episodesView all episodes
Download on the App Store

The Chief AI Officer ShowBy Front Lines