
Sign up to save your podcasts
Or


Alexander Page transitioned from sales engineer to engineering director by prototyping LLM applications after ChatGPT's launch, moving from initial prototype to customer GA in under four months. At Big Panda, he's building Biggie, an AIOps co-pilot where reliability isn't negotiable: a wrong automation execution at a major bank could make headlines.
Big Panda's core platform correlates alerts from 10-50 monitoring tools per customer into unified incidents. Biggie operates at L2/L3 escalation: investigating root causes through live system queries, surfacing remediation options from Ansible playbooks, and managing incident workflows. The architecture challenge is building agents that traverse ServiceNow, Dynatrace, New Relic, and other APIs while maintaining human approval gates for any write operations in production environments.
Page's team invested months building a dedicated multi-agent system (15-20 steps with nested agent teams) solely for knowledge graph operations. The insertion pipeline transforms unstructured data like Slack threads, call transcripts, and technical PDFs with images into graph representations, validating against existing state before committing changes. This architectural discipline makes retrieval straightforward and enables users to correct outdated context directly, updating graph relationships in real-time. Where vector search finds similar past incidents, the knowledge graph traces server dependencies to surface common root causes across connected infrastructure.
Topics discussed:
By Front LinesAlexander Page transitioned from sales engineer to engineering director by prototyping LLM applications after ChatGPT's launch, moving from initial prototype to customer GA in under four months. At Big Panda, he's building Biggie, an AIOps co-pilot where reliability isn't negotiable: a wrong automation execution at a major bank could make headlines.
Big Panda's core platform correlates alerts from 10-50 monitoring tools per customer into unified incidents. Biggie operates at L2/L3 escalation: investigating root causes through live system queries, surfacing remediation options from Ansible playbooks, and managing incident workflows. The architecture challenge is building agents that traverse ServiceNow, Dynatrace, New Relic, and other APIs while maintaining human approval gates for any write operations in production environments.
Page's team invested months building a dedicated multi-agent system (15-20 steps with nested agent teams) solely for knowledge graph operations. The insertion pipeline transforms unstructured data like Slack threads, call transcripts, and technical PDFs with images into graph representations, validating against existing state before committing changes. This architectural discipline makes retrieval straightforward and enables users to correct outdated context directly, updating graph relationships in real-time. Where vector search finds similar past incidents, the knowledge graph traces server dependencies to surface common root causes across connected infrastructure.
Topics discussed: