SOC Metrics, Part I: Foundational Metrics

I prefer a hands-off approach to management. Describe a goal, give me the freedom and resources to achieve it, and I will. The success of that approach, which the Army calls “mission command”, ultimately depends on the right person doing the right things in the right ways. Unfortunately, many focus on that first criterion but neglect the second and the third. In many cases, that leads to failure. In the final days of my last job, in a security operations center (SOC), I realized that this explained many of our systemic problems. We had the right people, they just did the right things in the wrong ways or — in some cases — did the wrong things altogether. This was not an inherent consequence of that hands-off approach, of mission command, but rather a necessary consequence of its partial implementation: we lacked measures of performance (MOPs), to assess whether or not (and to what degree) we were doing the right things, and measures of effectiveness (MOEs), to assess whether or not (and to what degree) we were doing them in the right ways. In their absence the right things and the right ways became inconsistent and subjective, and so did our success. I wrote this series to fix that.

MOPs and MOEs are military terms rooted in Joint doctrine. A 2014 edition of ARMOR magazine, the Armor Branch’s professional development bulletin, included an article by Captains Tom Westphal and Jason Guffey titled Measures of Effectiveness in Army Doctrine. That comprehensive review highlighted many discrepancies in doctrinal definitions of MOPs and MOEs; this article relies on definitions from JP 5-0: Joint Planning, dated 01 December, 2020. JP 5-0 defines a measure of performance as “an indicator used to measure a friendly action that is tied to measuring task accomplishment”, and a measure of effectiveness as “an indicator used to measure a current system state, with change indicated by comparing multiple observations over time.” MOPs concern themselves with friendly action (the right things), while MOEs concern themselves with those actions’ ability to change the system (the right things done in the right ways). Note that that change may be tangible (a new firewall) or intangible (sufficient evidence that a questionable environment is secure).

“Right”, here, depends on the SOC’s purpose. I have encountered several organizations that had a SOC just to say they had a SOC; what it actually did, or how well it did it, was irrelevant. This series assumes that the SOC’s purpose is to efficiently detect, thoroughly investigate, and effectively remediate malicious activity. MOPs help ensure friendly actions support that goal, and MOEs help ensure those actions actually achieve it. It may help to think of more literal interpretation of these terms, where measures of performance measure the SOC’s ability to perform an action, and measures of effectiveness measure the SOC’s ability to do so effectively.

This series details a selection of SOC-specific MOPs and MOEs I refer to generally as “SOC metrics.”1 In this post, in part one, I touch briefly on the controversy surrounding SOC metrics. I then describe foundational metrics, the crucial but often overlooked metrics that measure the SOC’s ability to produce meaningful results. That topic then leads into part two, which focuses on measures of performance, which help assess whether or not (and to what degree) the SOC is doing the right things. Part three focuses on measures of effectiveness, which help assess whether or not (and to what degree) the SOC is doing those right things in the right ways. I based my selection on extensive research and on personal operational experience, but I also include several sources for other metrics in the resources section at the end of each article. In the final days of my last job, I realized that these were the missing pieces inhibiting our success; if your SOC also struggles to recruit, resource, and retain talented information security professionals, perhaps you are missing them, too.

The Metrics Controversy #

I drew on a number of resources for this series. The primary was a Twitter thread by Dr. Anton Chuvakin in which he polled the information security community for SOC metrics. Many replied with good suggestions or links to helpful articles.2 Many also pointed out challenges of working in a metrics-driven SOC. Justin Lister, for example, commented that, “Cyber metrics are like Schrödinger’s cat. In that you can’t manage what you can’t measure. But if you measure it then it starts to drive the wrong outcomes.” His was not an uncommon opinion.

On the one hand, SOC metrics give managers ways to highlight the good work their teams do (the right things done in the right ways) and to identify areas in need of improvement (the right things done in the wrong ways or the wrong things altogether). They provide the necessary foundation for a learning organization that improves over time, which leads to an effective security program, boosts job satisfaction, and increases retention. They also give decision makers palpable returns on their investments, which helps justify appropriate resourcing, which then further improves the SOC.

On the other hand, few use metrics in general — and SOC metrics in particular — well. This series does not ignore the potential downsides of introducing metrics into an organization not ready for them. As Mick Douglass alluded to in Rapid–er Incident Response: How Fast Should You Go?, organizations that prioritize speed over effectiveness just rush to failure. The root of this problem, however, lies in poor management. Any discussion of SOC metrics must separate their value as enablers of the positives I described in the last paragraph from the downsides that result from their improper implementation. A corollary to one of this post’s central ideas, the notion of success resulting from the right person doing the right things in the right ways, is that successfully implementing SOC metrics requires the right person acting on the right metrics in the right ways. I have little to offer in the way of recommendations for selecting the right person, but I will offer my take on the right metrics and the right ways. I did, however, select many of these metrics such that over-indexing on any one would cause one or more others to fail. This will not solve the problem of bad management, but it may at least help reduce its impact.

Foundational Metrics #

Foundational metrics are the crucial but often overlooked metrics that measure the SOC’s ability to produce meaningful results. They contextualize all other measures, and common statements that — on their own — lack real meaning.

Consider, for example, this statement: “The SOC processed one billion events last month.” Although perhaps impressive, if those events came from just 10% of the environment, any subsequent conclusions do not hold for the other 90%. Consider a common follow-on statement, “And we found no evidence of malicious activity.”3 Again, that does not hold for 90% of the environment — but also, how did the analysts reach that conclusion? Did they collect the proper data to reasonably support such a bold claim? An absence of evidence is not the same as evidence of absence. This applies not just to the presence or absence of data in general (data coverage), but also to the presence or absence of the right data specifically (data quality). Foundational metrics measure both.

Data Coverage #

Data coverage is the first foundational metric. It measures the SOC’s ability to observe the environment. I find it helpful to discuss this metric in terms of Donald Rumsfeld’s famous “known knowns”: “There are known knowns, things we know that we know; and there are known unknowns, things that we know we don’t know. But there are also unknown unknowns, things we do not know we don’t know.” That idea applies to data coverage just as well as intelligence. Specter Ops had a helpful description of those terms in its post The Attack Path Management Manifesto, which also included a definition for “unknown knowns”, things we are not aware of but still understand. Measuring data coverage means quantifying the number of devices that fall into each category.

Known knowns #

This class includes devices the SOC knows about and receives data from. I generally use the simpler label, “monitored.”

Monitored: The number of known devices feeding the SIEM.

At the very least, break this down by host and network events. Take the number of devices feeding network events into the SOC’s SIEM, then the number of devices feeding host events into the SOC’s SIEM. Some also delineate between enclave or domain, layer of the computing stack, or device type such as workstation, server, infrastructure, or appliance. Some also delineate between log collection, antivirus alerts, endpoint detection and response events, and coverage on a per-tool basis. I recommend starting with a basic host and network breakdown before getting more specific.

Known unknowns #

This class includes devices the SOC knows about but does not receive data from. I generally use the simpler label, “unmonitored.”

Unmonitored: The number of known devices not feeding the SIEM.

Devices may fall into the “unmonitored” category for any number of reasons. For example, a new but not yet configured server may not yet send events to the SIEM. This class also includes devices the SOC cannot or will not monitor, such as legacy systems or personal devices. These pose persistent risk that, even if accepted, must remain at the front of the security team’s mind lest they become normalized and thus unduly disregarded. Potential sources for identifying unmonitored devices include, but are not limited to, Active Directory inventories, network scanners, network sensors, and logs from an appliance like a DHCP server.

Unknown knowns #

This class includes devices the SOC is unaware of, but that someone in the organization is aware of, that do not send any data to the SOC. Most organizations understand that these devices exist but lack awareness of the individual machines themselves. I generally use the common term “shadow IT” for these systems.

Shadow IT: The number of unknown devices not feeding the SIEM.

The unsanctioned nature of these devices makes it unlikely that anyone will voluntarily identify them. Others may not even realize they exist. Potential sources for quantifying shadow IT include, but are not limited to, Active Directory inventories, network scanners, network sensors, logs from an appliance like a DHCP server, or even the accounting department’s acquisition records. Again, the SOC might be unaware of these devices, but someone is.

Unknown unknowns #

This class includes devices the entire organization, including the SOC, is unaware of that also do not send any data to the SOC. Compared to shadow IT, which well-intentioned personnel might maintain to circumvent onerous IT controls in service of legitimate business needs, outsiders insert these devices into a target environment for malicious purposes. These rogue devices pose significant risk to the organization.

Rogue devices: The number of illicit devices in the environment.

The ease with which most attackers can gain access to an environment through its network make this vector unlikely but not impossible. Detect these devices by analyzing network logs from a passive network sensors or by physically inspecting hardware, network closets, and data centers for illicit devices.

Given these four categories, data coverage is typically expressed as a percentage:

Data coverage: Monitored / (Monitored + Unmonitored + Shadow IT + Rogue devices)

Each category, as well as the main metric itself, should be tracked over time to identify progress (as monitored increases and data coverage approaches one) or deterioration (as the denominator increases and data coverage approaches zero). Any modern SIEM should have the ability to automate this entire process in a scheduled report.

Measuring data coverage is one of the ways the SOC can support more traditional IT functions. From an environmental standpoint, this moves the organization away from wilderness and toward managed. Carson Zimmerman (author of the excellent Ten Strategies for a World-Class Cybersecurity Operations Center) introduced this concept in his presentation Practical SOC Metrics: if a system is inventoried, tied to a business owner, tied to a business function, subject to configuration management, assigned to a responsible security team, and assessed for risk, it is managed; otherwise, it is wilderness. Measuring data coverage can help quantify that wilderness so that the IT team can bring it under their control.

Recall, though, that it is not enough just to have data from an acceptable portion of the network.4 Consider again my question from earlier, in response to the common statement, “And we found no evidence of malicious activity.” Did the analysts collect the proper data to reasonably support such a bold claim? After data coverage, measuring data quality helps answer that important question.

Data Quality #

Whereas data coverage measures the SOC’s ability to observe the environment, data quality measures the SOC’s ability to detect malicious activity occurring within it. This requires assessing the data itself, as the enabler of that detection, as well as the feeds in general, as the enabler of that detection over time. Together, these two dimensions quantify which malicious tactics, techniques, and procedures the SOC can uncover and when it may do so; paired with data coverage, these metrics allow the SOC to define exactly what it can defend, what it can defend against, and when it can provide that defense. For under-resourced SOCs unable to effectively defend their environment, trapped in a “safety blanket” role, this is a game changer.

To assess the tactics, techniques, and procedures the data itself enables the SOC to uncover, we must first define a range of possible nefarious activities. From there we may then determine which subset of those actions the SOC can uncover given its present collection. Without a denominator representing the realm of the possible, that measure would lack meaning.

This article relies on the MITRE ATT&CK Matrix to define the realm of the possible. The MITRE ATT&CK Matrix lists actions an adversary might execute from initial access to end objectives. It consists of four components: tactics, the adversary’s technical goal; techniques, how the adversary achieved that technical goal; sub-techniques, more granular explanations of how the adversary achieved a technical goal; and procedures, the specific implementation of a technique or sub-technique. Although the Matrix may change slightly over time, it has matured enough that it is unlikely to change significantly or often. This does not reduce the importance of malleable collection to account for emerging threats, but it does make it a suitable starting point for understanding the range of possible nefarious activities.

Using the MITRE ATT&CK Matrix to represent the realm of the possible, the SOC can quantify the subset of those techniques it has the ability to detect given its present collection in a number of ways. The DeTT&CT project presents the easiest route: simply identify data sources and provide a rough assessment of their coverage, and the DeTT&CT tool will generate a heat map depicting the degree to which the SOC can detect each technique on the MITRE ATT&CK Matrix. The excellent ATT&CK Data Sources repository on GitHub, which includes a helpful Jupyter Notebook for working with the ATT&CK Matrix, could also help: use the data_source or data_component field to filter out techniques that match the SOC’s present collection, then count those that remain. A more rigorous approach would involve performing each action and then reviewing the SOC’s ability to detect them in a purple team-style engagement, but that would require a significant investment of time and resources. While certainly an important milestone in a SOC’s maturation, these less comprehensive yet much more efficient assessments are a suitable starting point for defining MITRE ATT&CK coverage.

MITRE ATT&CK coverage: The percentage of techniques the SOC has the ability to detect.

“Starting point”, here, is the operative term. MITRE’s ATT&CK Matrix describes the “known knowns” (actions we understand based on previous compromises), but not the “known unknowns” (actions we can conceptualize but have yet to observe), “unknown knowns” (novel ways to achieve an understood objective), or “unknown unknowns” (actions we cannot conceptualize). The MITRE ATT&CK Matrix is a catalog of known threats that provides a framework for categorizing future ones, not a complete encyclopedia of all possible malicious activities ever. Further, as Jared Atkinson explained in a Twitter thread, the ATT&CK Matrix is a very high level abstraction of the activities a threat actor might execute from initial access to end objectives. This makes detecting T1543: Create of Modify System Process, for example, a question of degrees — not a binary: to what degree can the SOC detect this technique given the many paths a threat actor could take to achieve it? Christopher Peacock’s TTP Pyramid is a useful tool for visualizing the level of abstraction for each type of action, as it decreases from tactic to technique to procedure. MITRE ATT&CK coverage measures technique coverage; those techniques are abstractions of sub techniques, which are themselves abstraction of the actual activity occurring on a host. For the sake of time and simplicity, this article treats that question of degrees as a binary “yes” or “no”; in a real environment, the SOC must make a much more nuanced distinction. The map is not the territory.

MITRE ATT&CK coverage — and particularly the heatmaps the DeTT&CT project generates — are great ways to visualize the present state as a way to then justify resources to reach the desired future state, as Daniel Gordon explained on Twitter.

MITRE ATT&CK coverage describes the ideal state. It does not account for the complexity of implementing collection, particularly at scale, where the SOC’s ability to detect a given technique may vary from hour to hour based on factors completely outside of its control. Scheduled maintenance, network outages, and hardware failures all impact the data feeds upon which the SOC relies to do its job. Assessing data feeds in general, the second dimension of data quality, gives the SOC a way to monitor those critical inputs.

Carson Zimmerman called this “data feed health” in his presentation Practical SOC Metrics, where he suggested monitoring presence, a binary, whether or not the SOC is receiving a particular data feed at any given time; latency, a leading indicator that would highlight bottlenecks before they become catastrophic failures; and volume, a basic count of events. I recommend also adding in constitution, a count of unique sources, such as systems for endpoint events or sensors for network events.

Unlike MITRE ATT&CK coverage, which is expressed as a percentage, data feed health requires a much more nuanced expression. I convey it on a per-feed basis using a series of dashboard panels as depicted below.

These panels would immediately convey that the endpoint detection and response (EDR) feed was down, but that the host and network event feeds were operational. This would indicate that during the given period, the SOC lacked the ability to detect techniques reliant on the EDR feed. This is helpful to understand in real time, so that the SOC can rapidly fix an issue with a data feed, but also when looking back during a postmortem to figure out why it missed something.

Closer inspection would also reveal that one or more network sensors had stopped sending events to the SIEM, which likely led to the downward trend in volume for that feed. Again, this is helpful to understand in real time but also during a postmortem. The host feed appears healthy, although an upward trend in event volume may indicate something unusual; an analyst could drill down on that uptick to rule out password spraying or brute forcing, for example.

I frequently encountered environments without this basic level of “meta monitoring”, where an administrator had set up network collection but not noticed that many of the sensors had since gone offline. Network events appeared in the SIEM, but over time they came to represent less and less of the environment. It is imperative that SOCs not only assess MITRE ATT&CK coverage as the enabler of detection but also data feed health as the enabler of detection over time.

As with data coverage, data quality — and its sub-categories — should also be tracked over time to identify progress or deterioration. The SOC must accept some variability, particularly with data feed health, but wild swings in either direction — or sustained trends in the wrong direction — should be highlighted, investigated, and acted upon. In a dashboard like the one above, the SOC should aim for green boxes and horizontal arrows at all times.

Together, data coverage and data quality provide the foundation upon which measures of performance and measures of effectiveness rely. Organizations that do not assess these foundational metrics, or that have not made the investment necessary to bring them to a suitable level, should do so before shifting their focus elsewhere. These metrics contextualize all other measures, and ensure the SOC possesses the ability to produce meaningful results; in their absence, take those results with a grain of salt.

Resources #

This section lists several resources for SOC metrics, some of which were cited throughout this article. This post contained the metrics that would have been most effective in my organization as judged by my personal experience working in a SOC, but you may find these articles helpful as well.

 Note that “SOC metrics” are not limited to security operations centers. In this article, I use “SOC” as a generic term for all groups responsible for securing an organization’s information systems. This includes the IT administrators who provision and maintain those systems, security and compliance monitors who deal with known threats and policy violations, and threat hunters who deal with emerging and novel threats. Each of these entities plays a distinct but critical role in an effective security program, but I will use “SOC” as a general term for all of them here.

 I do not cite authors for well-known metrics like Mean Time to Detect. So many suggested these measures that it made little sense to attribute them to a single person. I do, however, cite individuals who came up with unique metrics, and I cite external resources as much as possible.

 Once, at the beginning of an incident response, a technical advisor at the headquarters organization tried to derail the investigation. He said that he had already searched the endpoint agent for the key indicator and had found nothing, so we could all go home. I asked him how confident he was that the endpoint agent was deployed across the entire environment; he was not, and so we stayed. During another investigation, a different advisor tried to discount the findings of an analyst at a satellite location. He could not reproduce their work in the organization’s SIEM and so he lobbied to close the investigation as an unexplained anomaly. I asked him how confident he was that the satellite location actually forwarded any data to that SIEM; he was not, and so we continued.

 “Acceptable” will vary across organizations and over time. Some may not consider the high cost of incremental improvement worth it: if automated deployment mechanisms can get the SIEM to, say, 75% coverage, some organizations may elect not to invest in shrinking the remaining 25%. Entities dealing with sensitive information, on the other hand, may not have that luxury. Although complete coverage is maddeningly elusive, it is not uncommon for some organizations to set a threshold well above 90%. As I once wrote elsewhere, "Many think collection exists on a spectrum. Maximum collection, at one end, puts analysts in the ideal position to answer almost any information requirement. Marginal collection, at the other, makes those assessments impossible. Adequate collection lies somewhere between the two. This assumption is incorrect. Collection is not subjective, it is binary. It is either present in its entirety, or effectively absent; it is either sufficiently detailed to enable the detection of malicious cyber actors, or wholly inadequate. In this domain of great power competition against advanced, persistent threats, our defenses must not only be advanced, but also persistent. In the fifth domain, where defensive cyberspace operations forces gain and maintain enemy contact every day, there is no ‘good enough.‘ There is all, or there is nothing, and nothing is unacceptable.’

Permalink.