Platform Engineering Playbook Podcast

HolmesGPT: AI Root Cause Analysis for Kubernetes


Listen Later

Deep dive into HolmesGPT, the CNCF Sandbox AI agent that revolutionizes cloud-native troubleshooting. This episode covers what it is, its 40+ integrations, the project roadmap, and how to set it up today.

News Segment:

  • AirFrance-KLM's secure automation platform with Terraform, Vault, and Ansible
  • AWS ECS tmpfs mounts on Fargate for secure secrets handling
  • Qwen 30B running on Raspberry Pi - democratizing edge AI
  • AWS European Sovereign Cloud with independent EU governance
  • Main Topic - HolmesGPT:

    • CNCF Sandbox project (accepted October 2025) with 1,600+ GitHub stars
    • Agentic architecture: creates investigation task lists, queries systems, synthesizes findings
    • 40+ built-in toolsets: Prometheus, Grafana Loki/Tempo, Kubernetes, ArgoCD, DataDog, and more
    • Privacy-first: bring your own LLM keys, read-only access, respects RBAC
    • End-to-end automation with AlertManager, PagerDuty, OpsGenie integration
    • Installation options: pip, Homebrew, Helm, Web UI, K9s plugin
    • Resources:

      • HolmesGPT GitHub
      • HolmesGPT Documentation
      • Full Transcript
      • Episode Type: full Episode Number: 83 Season: 1 Tags: HolmesGPT, CNCF, Kubernetes, root cause analysis, AI ops, troubleshooting, observability, SRE, platform engineering, Robusta, agentic AI

        ...more
        View all episodesView all episodes
        Download on the App Store

        Platform Engineering Playbook PodcastBy vibesre