April 13, 2026

Binarly Risk Score: A New Approach to Vulnerability Prioritization

By Lucy Tretiakova, Claudiu-Vlad Ursache

Introduction

Security teams today are drowning in vulnerabilities and prioritization is not a solved problem. Systems like CVSS, EPSS or frameworks like SSVC aim to facilitate prioritization using different data points, properties and potential impact of vulnerabilities, but can end up focused on aspects that are too generic, leading to occasional contradictions and more importantly inflexibility. Even worse, these systems are mostly built to work with known vulnerabilities missing entire categories of potentially-critical security risks like leaked secrets, unknown vulnerabilities detected by advanced static analysis tools or malware hiding inside software components. These are among the most severe issues organizations can face, and traditional scoring systems and frameworks treat them like a side-note. To solve this blind spot, we built the Binarly Risk Score (BRS) system. It uses a unified scoring algorithm that can evaluate any type of security vulnerability by combining the strengths of existing scoring systems and frameworks with organization-specific custom metrics to distill a single normalized risk score between 0 and 1.

Motivation

There have been multiple attempts over the years to solve the prioritization issue, each with its own strengths and weaknesses. CVSS for example is readily available, but it provides only a theoretical severity signal as it is based mostly on the static assessment of analysts rather than concrete observations of a vulnerability’s impact on systems. EPSS is more data-driven than CVSS using machine learning to calculate the probability a vulnerability will be exploited within 30 days, but its probabilities are known to be inaccurate for vulnerabilities that are not actively exploited at the time of training the model, but have been observed to be exploited prior to that time. Binarly’s own EMS is a powerful guide when exploitation is the only signal taken into account, but in reality that will rarely be enough information for vulnerability management teams. CISA's KEV catalog tracks vulnerabilities that are confirmed to be exploited in the wild, but while the data is high confidence, it is only backward-looking and may fail to capture information on emerging threats in a timely manner; furthermore, using a positive KEV status without confirmed reachability leads to a clearly incorrect decision on how to act. SSVC uses decision trees tailored to organizational context, but it frequently results in large numbers of vulnerabilities clustered at the same severity level because of its lack of granularity in outputs. Risk is inherently organization and product specific, and in our opinion, combining the strengths of existing systems and frameworks while allowing for customization is a better solution for solving the prioritization problem than previous efforts which fail to look at risk appetite as policy and the tailoring of metric importance as crucial.

Binarly Risk Score system internals

The BRS system consists of taxonomy groups, profiles, a formula and an algorithm. Taxonomy groups cluster different types of vulnerability metrics into categories. Profiles are combinations of weights and parameters intended to control the influence metrics from particular taxonomy groups have on the final score. The formula, a deterministic weighted sum model, drives the calculation. Finally, the algorithm defines how metrics from a finding are translated into numerical values, how scoring input parameters are balanced and how missing metrics are dealt with. The following figure provides an overview of the whole system:

Figure 1: Overview of the Binarly Risk Score system

Taxonomy groups

Vulnerabilities come with dozens of metrics such as CVSS, EPSS, EMS, KEV status, reachability information, etc. Taxonomy groups impose a structure on all possible metrics which opens up the possibility of specifying the importance related metrics have taken together. We have found that four groups can cover everything: the Exploitation Group consists of metrics which answer the question “Is there evidence of active or potential use by attackers?”, the Impact Group consists of metrics which answer the question “How much damage could this vulnerability cause?”, the Decision Group consists of metrics which answer the question “Can we fix this, and is the data reliable?” and finally the Kind Group consists of metrics that represent the type of vulnerability we’re dealing with. The following figure shows all metrics we consider and their corresponding taxonomy groups:

Figure 2: Metric Coverage by Taxonomy Groups

The full list of metrics with descriptions can be found in Appendix A.

Profiles

Profiles are configurations for the BRS calculation. They each define a set of weights for individual metrics and alpha parameters that control the influence of specific taxonomy groups. The same vulnerability can be ranked very differently depending on which profile is selected because each one prioritizes different aspects of risk. We have created three default profiles: the Balanced Profile is intended for general security assessments where all taxonomy groups receive an equal 25% weight distribution; the Exploitation Profile is intended for threat hunting and active exploitation scenarios with the weight distributions 55% for Exploitation, 20% for Impact, 20% for Decision and 5% for Kind; finally, the DecisionImpact Profile is intended for risk management and remediation planning with the weight distributions 40% for Decision, 35% for Impact, 10% for Exploitation and 25% for Kind. The BRS system makes it easy to define new or custom profiles.

The Formula

At the heart of the BRS system stands the formula. The numbers going into it are the boosting parameters (alphas) which control the impact of individual taxonomy groups, the weights for the metrics in a vulnerability, and the numeric values of the metrics.

Figure 3: Binarly Risk Score formula

Algorithm

The algorithm defines all steps needed to compute a final score from a vulnerability with a set of metrics, a profile and a mapping from string metrics into numeric values.

The complete list of steps is the following:

1. Normalize weights and metrics (to prevent the dominance of individual metrics)

  • Metric normalization: scale each metric to the range [0,1]
  • Weight normalization: normalize weights within each taxonomy group so they sum to 1:
Figure 4: Normalization details


2. Replace string constants in the formula with their numeric equivalents (the string-to-score mappings can be found in Appendix B)

3. Decide on how to process each finding metric:

metric/weight None Value
None The metric will not be considered in the calculation The metric is sent to the redistribution algorithm
Value The metric value is excluded from the calculation The metric's name is replaced with its numeric equivalent

4. Send input parameters to the redistribution algorithm: finding metrics, normalized weights, and redistribution metrics (a predefined list specifying the percentage of each weight that will be reassigned):

  • Collect all weights associated with metrics that have None values, grouped by their categories.
  • Create a set of redistribution parameters:
    redistributionParam = weight * redistributionMetric (where redistributionMetric is the percentage of the weight that can be redistributed, ranging from [0,1]).
  • Add all redistribution parameters and calculate new weights for metrics with non-None values:
    newWeight = oldWeight + percentageOfInfluence * redistributeAmount
    with: percentageOfInfluence = oldWeight / totalSumWithoutNull
Figure 5: Redistribution steps

5. Convert the formula into a numeric expression and calculate the result.

Examples

With all the theory out of the way, let’s look at the BRS system in action using two examples consisting of small sets of vulnerabilities and see how different profiles lead to very different final rankings.

In the first example, we have two unknown vulnerabilities, one known vulnerability, and one secret:

Figure 6: First scoring example overview

The Exploitation profile prioritizes vulnerabilities with strong evidence of exploitability. The unknown vulnerability "get-set-variable" ranks highest due to high reachability and strong estimated exploitation scores (EPSS and CVSS exploitability). Second is the unknown vulnerability "arbitrary-write-via-pointer-via-nvram-variable", which has slightly higher impact metrics but lower reachability. Third is the secret, which is low priority because it has no validated exploitation information. Last is the known vulnerability with the highest EPSS and CVSS exploitability scores but lacking reachability data and confidence.

Figure 7: First scoring example, exploitation profile

The Balanced profile gives equal weight to all taxonomy groups. The known vulnerability ranks well because its measured metrics (CVSS and EPSS) are higher than most others. Next is the unknown vulnerability "get-set-variable" due to strong reachability metrics, followed by the unknown vulnerability "arbitrary-write-via-pointer-via-nvram-variable” which ranks lowest because its reachability score is lower than the other unknown vulnerability.The secret ranks lowest because of its low confidence level.

Figure 8: First scoring example, balanced profile

The DecisionImpact profile emphasizes Impact and Decision-related signals. The unknown vulnerability "arbitrary-write-via-pointer-via-nvram-variable" is ranked first because of strong decision metrics (high confidence). The known vulnerability follows because of its high impact but lower confidence. Next is the unknown vulnerability "get-set-variable" because its impact metrics (severity, CVSS Base score, CVSS impact score, etc.) are relatively lower. The secret ranks last due to its low confidence level.

Figure 9: First scoring example, decisionImpact profile

In this second example, we have the following findings: one known vulnerability, one mitigation failure, one cryptographic finding, and one malicious-behaviour finding.

Figure 10: Second scoring example, overview

Exploitation profile: Ranked first is the malicious-behaviour finding. It has a specific exploitation metric (metric_poor) and very high kind and name scores, which amplify its overall score. Next is the cryptographic finding (“crypto/certificate/expired”), which has higher kind and name scores than other vulnerabilities and a high confidence level. After that comes the mitigation finding ("outdated-intel-microcode-version") with a high confidence level, followed by the known vulnerability with very low exploitation metrics (EPSS and CVSS exploitability) and low confidence.

Figure 11: Second scoring example, exploitation profile

Balanced profile: This profile ranks the cryptographic finding highest because of its confidence. Next is the malicious-behaviour finding, which has high kind and name scores. After that is the known vulnerability due to its EPSS, CVSS and other metrics, and last is the mitigation finding because its severity level is not very high.

Figure 12: Second scoring example, balanced profile

DecisionImpact profile: Same as for the Balanced profile, the cryptographic finding ranks highest because of its confidence and severity levels. It is followed by the mitigation finding, which has similar confidence and severity but lower kind and name scores. Next is the known vulnerability, which has relatively high impact metrics (severity and CVSS base score) but low confidence level. Last is the malicious-behaviour finding, which has low severity and confidence metrics.

Figure 13: Second scoring example, decisionImpact profile

As demonstrated by the examples, different profiles lead to different rankings of findings, proving BRS’s flexibility.

Prioritization difference example BRS vs other scores

A final concrete prioritization example should make the value of BRS clearer. Given the following set of six known vulnerabilities:

Figure 14: Prioritization difference example overview

Here is how the prioritization ranks differ between CVSS, EPSS and the three default BRS profiles:

Figure 15: Heatmap of Ranks Across Scoring Systems

Zooming in on one example (CVE-2023-52425), we notice a few things:

Figure 16: Rank Across Scoring Systems, single example

First, using CVSS as the ranking score, CVE-2023-52425 (a denial-of-service vulnerability in libexpat, CVSS 7.5) has a low priority in the set (rank 6). Second, using EPSS as the ranking score leads to the vulnerability being prioritized high (rank 2). Third, BRS with a Balanced profile doesn't let CVSS alone dominate the ranking, nor does it let EPSS ranking induce panic without context (rank 5). Fourth, BRS with a DecisionImpact profile refines the priority based on confidence and impact metrics, putting it as a low priority in the set (rank 6). Finally, BRS with an Exploitation profile emphasizes EMS, EPSS, and other exploitation metrics, mapping closely to active threat intelligence signals (rank 3).

Conclusion & Future Work

In this post we introduced the Binarly Risk Score, a transparent and customizable system that applies a single generalized formula to produce a normalized risk score for any finding type. It organizes metrics into taxonomy groups to reason about them based on what they signal, normalizes metrics and weights to prevent skewed importance, uses a redistribution algorithm to reassign the weight of missing metrics and finally defines profiles which drive the calculation based on specific security strategies. The result is a flexible scoring system which can guide prioritization based on individual organizational needs.

BRS will be included in the next major release of the Binarly Transparency Platform. In upcoming BRS updates, we plan on expanding risk calculation beyond individual vulnerabilities to provide aggregated scores by software component and products, we intend to add more transparency into how each metric contributes to the final score, and finally we intend to add more profiles to deal with ranking combinations of vulnerabilities we have not encountered in this initial iteration.

Appendix A: Metrics

EXPLOITATION GROUP
CVSS Exploitability score Indicates how easily a vulnerability can be exploited.
EPSS Indicates the likelihood of exploitation.
Estimated EPSS Binarly provides The Exploit Prediction Scoring System score for unknown vulnerabilities, indicating the likelihood of exploitation.
KEV Specifies whether a vulnerability is part of the Known Exploited Vulnerabilities Catalog.
Reachability kind Specifies whether a vulnerability is reachable and, if so, how.
Metric poor Specifies that a finding does not have enough metrics for a comprehensive score calculation, but it should be scored higher.
Public exploit Specifies whether there is a publicly known exploit for the vulnerability.
Public POC Specifies whether there is a publicly known proof of concept for the vulnerability.
Public verified exploit Specifies whether there is a publicly known verified exploit for the vulnerability.
Public weaponized exploit Specifies whether there is a publicly known weaponized exploit for the vulnerability.
SSVC automatable Indicates if the vulnerability can be automatically assessed using the SSVC framework.
SSVC exploitation Indicates the likelihood of exploitation based on SSVC assessment.
Secret validated Specifies whether a secret was valid or not at the time it was detected.
Exploitation maturity score Provided by Binarly scoring system that indicates that a vulnerability has exploitation evidence.
LEV Provided by Binarly, Likely Exploited Vulnerabilities scoring system uses historical data that helps calculate the probability that a specific CVE has already been exploited in the wild.
IMPACT GROUP
CVSS Impact score Indicates the potential damage of a vulnerability.
Severity "CRITICAL", "HIGH", "MEDIUM", "LOW" or "UNSPECIFIED".
SSVC Technical impact Indicates the potential damage based on SSVC assessment.
CVSS Base score Indicates the severity of a vulnerability.
DECISION GROUP
Confidence The confidence of the analysis that produced the finding.
Reachability confidence The confidence level in the reachability assessment of the vulnerability.
SSVC Paranoid decision Binarly provides this metric based on the SSVC decision tree model to prioritize relevant vulnerabilities into four possible decisions.
Known fixed version Specifies whether a version of the software exists with a fix for the vulnerability.
Not disputed Specifies if the vulnerability is not disputed, meaning there is consensus on its validity.
KIND GROUP
Finding name Internal unique identifier for a vulnerability. Example: vulnerability/uefi/smram-write-via-commbuffer.
Finding kind Internal categorization of finding names.

Appendix B: String-to-score mapping

Metric Numeric Score
SSVC Paranoid decision
Act 1
Attend 0.7
Track star 0.5
Track 0.3
Reachability kind
Entrypoint 1
Exported 0.8
Referenced 0.5
Undetermined 0.2
Severity
Critical 1
High 0.85
Medium 0.5
Low 0.3
Unspecified 0
SSVC Exploitation
Active 1
Poc 0.6
None 0.1
SSVC Technical impact
Total 1
Partial 0.5

Note: additional mappings that are defined internally for Binarly-specific finding names and kinds have been left out of the table

What's lurking in your firmware?