The Crowdstrike Catastrophe: Lessons from the Biggest IT Disruption in History

Around 3 PM AEST on Friday, 19th July 2024, a defective software update from Crowdstrike triggered the most widespread IT disruption in history.

Medical systems, banking, critical infrastructure, and airlines were disrupted; some operations came to a complete halt. While IOR did not encounter any direct impact, many suppliers and vendors (including Microsoft and some customer banks) were impacted to varying degree. In all, approximately 8 million devices were estimated to have been affected globally, and full resolution is expected to take weeks.

While essentially a benign mistake and not a cybersecurity incident, the issue caused Windows machines to crash before any automatic recovery process could be initiated, necessitating a manual recovery process using local administrator credentials. And it is the nature of this recovery process which, somewhere in the world, has laid the foundations for another cybersecurity incident.

This issue has brought into focus a number of core issues:

Consolidated Supply Chains

  • Technology supply chains are increasingly consolidating around a small number of enterprise tools with privileged systems access.

  • The disruption caused as a result of this otherwise benign incident has significantly raised the bar for the size and scale of disruption that could be caused by a single malicious incident, or indeed a single malicious file deployed in a central location. Were an entity with malicious intent seeking to cause real damage at scale, these tools now present a target with both systemic and global reach.

The Limitations of Endpoint Resilience

  • Distributed endpoints can in most circumstances provide a higher overall redundancy than Terminal Services, but do not guarantee the absence of any single points of failure especially from the point of view of individual users.

  • Properly managing endpoint encryption is crucial to business continuity. Organizations that have overlooked the management of Bitlocker passwords on Windows endpoints will face prolonged recovery times, particularly for essential yet low maintenance systems like cash registers.

  • Technical Support teams cannot fix everything that can go wrong from behind a screen on the other side of the country.

The Significance of Local Administrator credentials

Despite extensive media coverage of the above issues, one further issue appears to have been overlooked: the fix for the Crowdstrike issue is both manual and (in most cases) requires local administrative privileges on the endpoint.

Local Administrator passwords are one of the key vectors in the lateral spread of ransomware, and it is in this rush to fix machines post-Crowdstrike that industry may have laid the foundations for another cybersecurity incident.

At many heavily impacted companies, any IT-literate employee was deployed over the weekend to assist. In doing so, most of these employees were given access to local administrator passwords they did not have previously. Due to the urgency and lack of secured methods to communicate such passwords at that scale, it is near certain that such passwords have been excessively shared through inappropriate methods including SMS and Teams.

If, post-Crowdstrike, entire IT Operations groups now have local administrator credentials for everything from endpoints to server infrastructure, and if these credentials are stored insecurely, what was a benign mistake in 2024 could lead to a material cybersecurity incident at a later date.

Microsoft recommends using a local admin password manager/policy to manage local administrator accounts, historically via their LAPM tool and more recently through Intune. Given the role of these accounts in the spread of ransomware, it’s essential for more engineers to be aware of and utilise these tools to manage, rotate, and to make such important credentials unique to the endpoint.

// Elliot Mackenzie //

Next
Next

Not another chatbot