Cognixia Podcast

What caused the massive global Microsoft outage?


Listen Later

Hello everyone and welcome back to the Cognixia podcast! Endpoint protection has been a buzzword in the world of cybersecurity for quite a bit now. Endpoint protection involves software running on local machines so they wouldn’t run malicious software or any unintended code. It is like a more modern name for the good old anti-virus and firewalls, sounds like it, no? It has two key components – a backend control center and an agent software which would be installed on the end point devices. And, if you haven’t guessed so far, endpoint devices are the user devices – mobile phones, laptops, desktops, etc. The endpoint protection agent software is constantly running on the endpoint devices. So, if you run a program or application that the agent feels needs to be prevented, a sensor would be notified by the operating system of the device and it will prevent the execution. The main endpoint application would also be notified about the blocked execution, which would further notify the control center, using the internet.


Simply put, this is a surveillance system of sorts. To be seriously effective they need to be deeply embedded into the operating system. It would also need to have the capability and requisite permissions to bypass lots of internal security systems.


So what happened exactly that more than 8.5 million systems were affected? Banking services came to a halt, countless flights were canceled, travelers were stranded, retail services came grinding to a stop, and an unimaginable number of workplaces were left staring at what is popularly called “The Blue Screen of Death”. While this number is less than 1% of Microsoft devices sold and operational globally, the broad economic and societal impact of even that 1% is unfathomable.


This is the first time such an incident has had figures, that too of this magnitude revealed. It is believed that this could be the worst cyber event in history. And, while the event is largely being labeled as a “Microsoft outage”, it is actually caused by an update that was rolled out by CrowdStrike, not Microsoft. The closest next big incident would be the WannaCry cyberattack of 2017 where over 300,000 devices were affected in over 150 countries. But do you see the difference between 8.5 million devices and 300,000 devices?


On 19 July at 04:09 UTC, CrowdStrike carried out a regular release of one such ‘sensor’ as a Windows device driver which would hook and attach deeper into Windows, one of the updates as part of the ongoing protection mechanisms of the Falcon platform. To do this, it would need special permissions, of course. These drivers would be written in C and C++, the same as the Windows kernels and core libraries. The configuration system triggered a logic error leading to a system crash and the blue screen of death or BSOD on impacted systems.

...more
View all episodesView all episodes
Download on the App Store

Cognixia PodcastBy Cognixia