“Threat content, and queries, and use cases… oh my!”
~Dorothy (If she visited a SOC instead of Oz)
Security operations centers (SOC) are marvelous and complex machines. They often remind me of antique clocks in their complexity. They are awash with advanced technology full of (digital) moving parts, (virtual) spinning cogs, and a dizzying number of people. These elements are all working in tandem to secure their parent organizations from attacks of all sizes. With all these moving parts though, sometimes it can be hard, as the saying goes, to “see the forest from the trees” when it comes to improvement.
The concept of “improvement” in modern SOCs is often synonymous with “rip and replace.” As new technologies become available and old technologies fail to live up to promises, life cycling occurs. In a way, it is the SOC Circle of Life. Organizations will often refer to these upgrades as improvements, and it is hard not to see why. The average SOC with tools like SOAR and EDR/XDR are head-and-shoulders above SOCs 10 years ago when it comes to capabilities. But, as any analyst that has cut their teeth in a SOC can tell you, not every problem is one best fixed with technology alone.
One of the best examples of this is the threat content that drives all these security tools. This threat content, despite being central to the security of organizations, has not seen a lot of improvement since its inception. But it is my opinion that organizations need to start focusing on threat content more now than ever before.
First, before we dig into the concept of threat content and how to improve it, let’s first figure out what it is. “Threat content” is an ambiguous term in the industry. Some organizations don’t use the term at all. This is often because they see “content” and their corresponding technology solutions as one-and-the-same. Others may use it to refer to the individual indicators of compromise (IOCs) fed into their technologies. And still others may refer to it synonymously with threat intelligence reports and blog articles. These definitions all capture elements of “threat content,” but they all still miss the mark.
Threat content are the individual queries that run in a SIEM, EDR, and data lake platform (to name only a few common ones). They differ from other queries run in these platforms that may detect service interruptions or other issues, as these queries are focus on detecting threats. These queries can be simple, matching against a single IOC, or complex, looking for specific behaviours.
These “queries” are what power the various security tools. They are also what drives security analysis and threat hunting. And they are what generates alerts for investigation. Threat content serves a very central role to the security process.
Threat content can come from a wide variety of places, many of which we covered in this article. The TL;DR is that most organizations will get a lot of their threat content from so-called “free” sources like code sharing sites and repositories. Another very common source for this content is that it is often built right into various technologies as part of the annual license.
“Threat content” really hasn’t changed much, despite evolving to take advantage of new technologies. Now my defacto stance is that change isn’t always required – especially if the thing being changed works well enough.
However, the reality is that that content isn’t working “well enough,” and the statistics seems to bear that out. For instance, false positive rates for threat content are still often more than 50%. And analysts still have to spend more than 10 minutes (and it is often much more!) analyzing a single alert, only to find out that alert is statistically likely to be false positive.. This seems to highlight that the problem isn’t purely a technological one. There is more is at play here, and threat content seems a likely suspect.
And the statistics tell us this problem is affecting more than just the outcomes, but the people as well. Anyone with time in the industry knows that many SOCs continue to face high per-analyst workloads. And that these workloads can contribute heavily to analyst burnout.
With the importance of threat content, and the challenges the industry faces with workload and personnel, there must be a better way. (Cue the Shark Tank theme song… Or Dragon’s Den for you Brits, Aussies, and Canadians)
The solution to this problem is multi-faceted. It has to look at the totality of challenges organizations face with threat content today. From development to deployment, and from triage to analysis and remediation.
The solution is that threat content needs to become more. Indeed, the threat content itself is really only a component of what I have called a broader threat “package.” These packages includes key elements that have long since been missing in traditional threat content.
One of the essential elements to address with threat content is the query development process itself.
The numbers have shown that a lot threat content has a false positive rate of 50% or higher. This means that, statistically, any given alert is more likely to be false positive than true positive. It also means that, across all alerts, analysts are more likely to see false positives. This can lead to analysts becoming numb to alerts generated from this content. The outcome of this is that analysts may start approaching alerts with the preconception that they will be false positive. This can be a large liability for security teams and organizations.
The solution to this is to optimize the content development process and establish accuracy thresholds. Test the content during the development or deployment phase against common and up-to-date data sets. Further, review any content that doesn’t meet those established thresholds before deploying it. This means that when something is detected by the content, analysts are more likely to approach it as a genuine threat.
One of the key challenges analysts face on a constant basis is an information deficit. Often the analyst has little more than an alert name and the logic used, and sometimes even that is lacking, when they are investigating something.
This means that it is up to the defenders to try to fill in blanks even before they have gotten to the data itself. Now couple this with the sometimes-quixotic naming conventions used by organizations for actors and malware and it becomes easy to how this is an unnecessary time sink.
The solution is fairly straightforward. Every query should include a human readable “use case.” The term “use case” is another ambiguous term in the industry. Here, though, we mean a human readable description of what the threat content is detecting, along with why and how it functions. This means that the analyst is able to understand the nature of the threat as soon as they acknowledge the alert.
As previously mentioned, defenders face an ongoing information deficit when responding to detections. This results in an often-challenging slog on the part of the analyst to piece together as much information as possible.
The solution to this is that by providing meaningful contextualization to threat content and detections, it helps reduce this type of unnecessary searching for additional data. This could include tagging known about the threat, frameworks (like MITRE ATT&CK), and even other analyst previous notes.
Many teams and developers can view documentation as nothing more than a chore to eat up cycles. And indeed, it can seem this way, right up until that documentation is actually needed, and the engineer responsible left 6 months ago. This is a problem that seems to reoccur regularly.
Another consideration is that threat content, especially for threat hunting, can be deployed many times during hunts. This means that the content may not live in a SIEM all the time, instead requiring teams to roll it out during each hunt. If specific components of the content need altering, the documentation is critical.
The solution is that teams facing turnover, or repeated deployments, need deployment documentation as a part of their threat content. This should be clear and concise, and laid out in a step-by-step format. Having detailed deployment guides can save a lot of engineering time and effort.
What will likely be the most controversial part of this proposed solution for threat content are runbooks. Runbooks don’t have to be step-by-step guides, though they can be, but they should generally guide the analyst through an investigation.
The security industry continues to face a higher-than-average turnover of its ranks. This means, in a year, some organizations will see more than a quarter of their corporate knowledge walk out the door. It also means that, when coupled with the ever-present skills shortage, they will likely have to train more junior analysts those vacant positions. During this period, having runbooks ensures consistent analysis at all times.
Runbooks can also serve another valuable purpose for organizations. They can help organizations prepare for automation. By having runbooks in place, security teams can more easily design automations.
Some organizations are reticent to put in place runbooks in their threat content for various reasons. These can range from making runbooks too explicit, to making runbooks too long and complicated, to taking the “fun” out of analysis. Whatever the reason, it is important to remember that these runbooks don’t have to be step-by-step guides. Security analysis does demand adaptability, after all. But these guides can at least ensure consistent analysis, serve as an excellent training aid, and better prepare SOCs for automation.
In the same vein as runbooks, another analysis “challenge” can be the remediation.
We’ve all heard (or experienced!) the horror stories in the industry around failed remediation. An analyst caught the activity, but when it came time to remediate the threat, the compromised host was re-imaged. In this process valuable evidence was destroyed, covering the adversaries tracks, while letting them know someone is on to them. This sends the adversary deeper than before and prolongs the actor’s dwell time. The problem, in this scenario, wasn’t that the technology didn’t work, or something was missed. Instead, it was that improper remediation was carried out.
The solution in this case is to build in remediation guidelines for analysts into threat content. This could include when to engage tech support for a simple re-image, and when to engage incident response teams, or how to when to escalate to a high tiered analyst. More mature teams can also build high level remediation guidelines to state what evidence should be collected before remediation, for instance.
A concept that often gets far too little consideration in threat content is that of emulation and validation. This component allows organizations to perform targeted simulations of a threat. More importantly, the simulation is in their own environment, and the generated alert is on their own tools.
A lot of things, including threat content, in the cyber security industry work on faith. This is because too often solutions are a blackbox. This means that organizations must trust that the provided solution does as advertised. Now, this is not to say that these vendors are lying. But the results on the number of breaches would tend to indicate things get thorough. Emulation and validation tools allow organizations to trust yet verify the capabilities of their technology and its configuration.
Additionally, emulation and validation are great tools for incident response training. They allow teams to simulate threats to validate that their escalation and IR processes work as intended.
The solution to building emulation and validation tools into threat content is actually relatively simple these days, at least to begin with. Organizations can use free tools like Red Canary’s Red Team Atomics, or vendor-built solutions like breach attack simulation (BAS) tools. With this type of tooling in place for threat content, organizations can ensure that it works as intended not just in a lab but in a production environment.
Threat content is a central element that drives protection, detection, and threat hunting. Despite this central role, though, it remains an area that can often go completely overlooked, regardless of where the content came from.
The continuously high false positive rate, and the often long analysis times show that we cannot rely solely on new technologies to “improve” security operations. The associated workloads and burnout rates also demonstrate that this issue has broader implications for security teams and organizations.
The solution to this problem of threat content improvement lies in realizing that threat content alone is not enough. Instead, the threat content or query should constitute only a part of a broader package. Alongside the query should be human readable use cases, broader context, deployment guides, runbooks, remediations, and emulation and validation components. These “packages” enable a more comprehensive and robust approach to threat content, and serve to improve security for teams overall.