This blog post discusses the recent CrowdStrike incident where a flawed update caused blue screen errors (BSODs) on Windows systems, leading to widespread disruptions. The issue was due to an out-of-bounds read, a type of memory safety problem, which resulted in system crashes because the Falcon sensor operates in the kernel-space. To prevent similar incidents, robust memory safety practices like automated vulnerability testing are crucial.

On July 19th 2024, CrowdStrike released a flawed update to their Falcon Platform that caused blue screens of deaths (BSODs) on Microsoft Windows hosts. Millions of computers  worldwide were affected, grounding commercial airliners, disrupting banking services, and temporarily taking broadcasters offline. This blog post explores the root cause of this event, and what steps could have been taken to prevent it.

CrowdStrike is a cybersecurity company that provides endpoint security, threat intelligence, and cyberattack response services. The CrowdStrike Falcon Platform monitors and responds to network activity, aiming to prevent attacks. To do this, CrowdStrike places a driver/kernel-level software sensor on all devices being monitored in an environment. A version update to this sensor resulted in a memory safety issue on Windows hosts, subsequently causing the global outage.

What Memory Safety Issue?

CrowdStrike themselves describes the issue as follows:

“When received by the sensor and loaded into the Content Interpreter, problematic content in Channel File 291 resulted in an out-of-bounds memory read triggering an exception. This unexpected exception could not be gracefully handled, resulting in a Windows operating system crash (BSOD).”

The CrowdStrike Falcon Platform uses a content distribution system for rapid updates, wherein Channel Files are delivered and interpreted by all software sensors in a monitored network. This is different than a traditional software update, which would modify the Falcon software sensor’s code directly. Instead, Channel Files enables already deployed software sensors to be reconfigured at runtime. When interpreted, Channel File 291 (the file associated with this update) caused the Falcon software sensor to encounter a memory safety issue.

The memory safety issue in question was an out-of-bounds memory read. This type of issue occurs when a program intending to read from a certain buffer in memory instead reads from before or after the buffer, accessing an invalid memory region. MITRE classifies this issue as CWE-125: Out-of-bounds Read, describing it as: “The product reads data past the end, or before the beginning, of the intended buffer.”

Out-of-bounds reads can have a wide variety of negative effects, including program crashes, segmentation faults, and other unintended behavior. Out-of-bounds reads can have significant security ramifications outside of just denial-of-service: they can be used to read secret values such as cryptographic keys and be used to gain information needed to bypass ASLR and other binary protection mechanisms.

In an operating-system context, programs either run in the user-space or the kernel-space. While in the user-space, exceptions such as out-of-bounds reads typically cause the running program to exit, leaving other system processes unaffected. However, in the kernel-space, such exceptions lack standard panic handling. In the kernel-space, these exceptions can cause the entire system to crash. The CrowdStrike Falcon software sensor runs in the kernel-space.

How Could This Have Been Prevented?

In their public statement, CrowdStrike lists a few possible techniques to prevent a similarly devasting event from happening again:

Memory safety issues are extremely pervasive. During its lifecycle, a software offering will likely be affected by hundreds of unique memory safety issues that must be addressed. Ideally, the detection and remediation of these issues is formalized and optimized by the developers. As the CrowdStrike example shows, this is not always the case until it is too late.

For software developers and firmware manufacturers, CrowdStrike’s posthumous advice rings true. Automated testing, rollback, fault injection, fuzzing, and other software validation techniques are required for the delivery of secure and reliable software. Software development companies must be cognizant of memory safety issues, especially if their software lives at the kernel-level and is a dependency of critical infrastructure.

In many cases, out-of-bounds reads are very tricky to detect statically. SAST techniques often miss non-trivial instances of this type of issue: some instances of program behavior can only be determined at runtime. C, C++, and other lower-level programming languages offer limited guard rails to prevent developers from violating memory safety in this way.

The ObjectSecurity OT.AI Platform uses symbolic execution to analyze binary programs and detect vulnerable states, such as those wherein out-of-bounds reads occur. The ObjectSecurity OT.AI Platform performs post-build binary analysis, inspecting a program’s runtime behavior as it would occur on the CPU. By inspecting the “ground-truth” of program behavior in this way, OT.AI can find the tricky vulnerable states missed by most SAST tools. OT.AI’s vulnerability analysis capabilities can be integrated directly into your CI/CD pipeline. Novel memory safety issues are detected and reported automatically, preventing your organization from being responsible for the next global computer outage.

Resources