Welcome to episode 329 of The Cloud Pod, where the forecast is always cloudy! Matt, Jonathan, and special guest Elise are in the studio to bring you all the latest in AI and cloud news, including – you guessed it – more outages, and more OpenAI team-ups. We’ve also got GPUs, K8 news, and Cursor updates. Let’s get started!
Titles we almost went with this week
Azure Front Door: Please Use the Side Entrance – el -jbAzure and NVIDIA: A Match Made in GPU Heaven – mkAzure Goes Down Under the Weight of Its Own Configuration – elGitHub Turns Your Copilot Subscription Into an All-You-Can-Eat Agent Buffet – mk, elMicrosoft Goes Full Blackwell: No Regrets, Just GPUsJules Verne Would Be Proud: Google’s CLI Goes 20,000 Bugs Under the CodebaseRAG to Riches: AWS Makes Retrieval Augmented Generation TurnkeyKubectl Gets a Gemini Twin: Google Teaches AI to Speak KubernetesI’m Not a Robot: Azure WAF Finally Learns to Ask the Important QuestionsOpenAI Puts 38 Billion Eggs in Amazon’s Basket: Multi-Cloud Gets ComplicatedThe Root Cause They’ll Never Root Out: Why Attrition Stays Off the RCAGoogle’s New Extension Lets You Deploy Kubernetes by Just Asking NicelyCursor 2.0: Now With More Agents Than a Hollywood Talent AgencyFollow Up
04:46 Massive Azure outage is over, but problems linger – here’s what happened | ZDNET
Azure experienced a global outage on October 29, affecting all regions simultaneously, unlike the recent AWS outage that was limited to a single region. The incident lasted approximately eight hours from noon to 8 PM ET, impacting major services including Microsoft 365, Teams, Xbox Live, and critical infrastructure for Alaska Airlines, Vodafone UK, and Heathrow Airport, among others.The root cause was an inadvertent tenant configuration change in Azure Front Door that bypassed safety validations due to a software defect. Microsoft’s protection mechanisms failed to catch the erroneous deployment, allowing invalid configurations to propagate across the global fleet and cause HTTP timeouts, server errors, and elevated packet loss at network edges.Recovery required rolling back to the last known good configuration and gradually rebalancing traffic across nodes to prevent overload conditions. Some customers experienced lingering issues even after the official recovery time, with Microsoft temporarily blocking configuration changes to Azure Front D