VulHunt in Depth: Inside the Binary Vulnerability Analysis Framework

By Francesco Evangelista and Sam L. Thomas

Introduction

In one of the previous posts, we introduced VulHunt, a binary vulnerability analysis framework designed to bring semantic, code-level analysis to binaries. We discussed why traditional binary analysis approaches—signature matching and version inference—fail to scale in a world where software is increasingly complex, dependency-heavy, and AI-generated.

This post builds on that foundation and shifts focus from why VulHunt exists to how it works.

At a high level, VulHunt provides:

Dataflow analysis primitives to trace attacker-controlled input through functions to sensitive operations.
Semantic code pattern matching on decompiled code, enabling detections based on program structure and behavior.
Integration of type libraries and function signatures, enabling the tool to understand function calls and data structures even in stripped binaries.
Annotated decompiled code and explainability mechanisms, providing context and making findings actionable.
Byte pattern and IR-based matching, enabling instruction-level detection across architectures.

In this post, we will walk through VulHunt’s capabilities in detail, starting with high-level semantic analyses and progressing toward the lower-level matching primitives. The goal is to not only show what VulHunt can detect, but also to explain why the way it performs detection and visualizes results matters, and how VulHunt differs fundamentally from traditional binary security tools.

Before diving into specific capabilities, it’s helpful to understand how a target is processed within the platform, since VulHunt builds on the results of prior analyses and structured metadata.

Scan Pipeline

Within the Binarly Transparency Platform (BTP), scanning a target, such as a firmware image or a docker container, follows a multi-stage pipeline designed to normalize heterogeneous inputs and enable scalable, architecture-aware analysis.

The pipeline begins with normalization. During this phase, the uploaded artifact is unpacked and decomposed into its individual components (for example, binaries extracted from a firmware image). For each component, metadata are collected, such as file type, architecture, compiler hints, and dependency relationships. The output of this phase is a BA2 archive, which serves as a representation of the target and its components, enriched with metadata.

To enable scalability, the BA2 archive is then partitioned into smaller, independent units. These partitions can be processed in parallel, allowing the platform to efficiently analyze large and complex artifacts without sacrificing depth of analysis.

Each BA2 partition is processed by Corescan, a tool that performs the following:

Core analyses, which are applied uniformly across all components (e.g., disassembly, control-flow recovery, function identification).
Specialized analyses, which are platform-specific, such as identifying non-returning functions in POSIX binaries.

By consolidating these analyses into a single tool, BTP avoids redundant computation and ensures that all downstream consumers operate on a consistent view of the binary. The results produced by Corescan—along with the BA2 partitions themselves—are then passed downstream to VulHunt, which builds on this precomputed analysis state to perform vulnerability detection.

***Corescan*** *runs in* ***Phase 2****, analyzing BA2 partitions in parallel.* ***VulHunt*** *sits between* ***Phase 2*** *and* ***Phase 3***

With this foundation in place, VulHunt applies a set of carefully designed capabilities that allow analysts to reason about program behavior, detect vulnerabilities, and provide actionable, explainable insights.

VulHunt Capabilities

VulHunt provides a set of capabilities designed to enable broad and expressive vulnerability detection in binaries. These capabilities are intentionally chosen to maximize coverage across vulnerability classes while remaining generic enough to capture real-world variations.

At a high level, VulHunt combines semantic analysis and syntactic analysis. Semantic capabilities allow VulHunt to reason about program behavior—such as control flow, data flow, and function interactions—while syntactic capabilities enable precise matching over concrete code constructs, decompiled output, and raw bytes when needed.

The core design goal is flexibility in detection logic. Instead of relying on rigid signatures tied to specific implementations, VulHunt allows vulnerability logic to generalize across different architectures and ecosystems. This makes it possible to identify not only known vulnerabilities and their variants, but also recurring vulnerability patterns that can point to previously unknown issues, including potential zero-days.

A fundamental choice when building VulHunt was to own the program representation at every layer—disassembly, intermediate representation (IR), and decompiled code—which makes it straightforward to map directly between corresponding elements, such as disassembly instructions, IR statements, and decompiled constructs. While some of VulHunt’s capabilities could be replicated by combining multiple external tools, such an approach would rely on divergent sources of truth. In practice, different tools often implement different control-flow graph (CFG) reconstruction algorithms or lifting strategies, which can introduce subtle inconsistencies and make cross-layer reasoning more complex and error-prone. By maintaining a single, consistent representation throughout the analysis pipeline, VulHunt avoids these integration challenges and enables more reliable vulnerability detection logic.

In the following sections, we examine VulHunt’s capabilities in detail, starting with high-level semantic analyses and progressively moving toward the lower-level matching primitives. While these form the foundation of the framework, VulHunt also provides additional APIs and supports extending its core capabilities.

Dataflow Analysis

The first capability we introduce is dataflow analysis, and in this post we specifically focus on intra-procedural analysis. The enterprise edition of VulHunt supports inter-procedural dataflow analysis, but this feature is not available in the community edition.

The goal of dataflow analysis is to track how values propagate through a function. Two common cases are particularly important:

1. Parameter propagation: Tracking whether one or more function parameters flow into the arguments of a callee.

void vulnerable_function(int len, char *path) {
  char buffer[256];
  memcpy(buffer, path, len);
}

2. Vulnerable function tracking: Analyzing whether inputs or outputs of a function call propagate to another function’s arguments.

void vulnerable_function(char *cmd) {
  char buffer[256];

  snprintf(buffer, sizeof(buffer), “sh -c %s”, cmd);
  system(buffer);
}

Dataflow analysis is especially useful for detecting vulnerabilities where untrusted data reaches sensitive operations without proper validation. This includes classic cases like command injection, as well as memory safety issues such as use-after-free.

Another important aspect is support for sanitization functions. VulHunt allows users to define custom sanitizers that explicitly stop taint propagation. When data passes through one of these functions, it is treated as validated or safely transformed, preventing false positives in downstream analysis.

For more complex validation logic—such as conditional guards that constrain a value before it reaches a sensitive sink—sanitization can be combined with code pattern matching. This lets users handle both simple sanitizers and more complex validation checks.

Code pattern matching on decompiled code

Some vulnerabilities are difficult to detect with traditional approaches, particularly those that follow recognizable high-level code structures or rely on well-known unsafe programming patterns. In these cases, VulHunt can search for code patterns in the decompiled code, rather than just raw bytes. Its pattern-matching engine is modular and can be extended with alternative implementations; the version currently included is adapted from Weggli, allowing flexible and expressive pattern definitions.

This approach is both flexible and architecture-independent, allowing the same detection logic to apply across different processor architectures. By reasoning over the decompiled structure, VulHunt can capture patterns that would be missed by simpler syntactic or signature-based methods, including variants and subtle modifications of known vulnerabilities.

For example, consider CVE-2023-52425: a processor function (parser->m_processor) is called without any guards, leading to a denial-of-service (DoS) via quadratic parsing behavior. Searching at the decompiled code level makes it straightforward to express and detect this pattern, even if the surrounding code is complex or slightly modified from known vulnerable variants.

Once a pattern is matched in the decompiled code, VulHunt can easily map it back to the corresponding instruction addresses. This direct mapping across layers is difficult to achieve when combining multiple external tools, which often rely on divergent representations.

But what about binaries that contain no type information?

Type Libraries

VulHunt leverages type libraries—collections of C header files containing type definitions and function prototypes—to assist with vulnerability analysis and improve explainability. It is possible to build separate libraries for 32-bit and 64-bit targets, ensuring the correct types are used based on the target architecture. It is also possible to create custom type libraries by specifying build configurations, allowing users to tailor type information to specific environments or compilation settings.

Type libraries serve two key purposes:

Improved explainability: Analysts can refer to specific types, function signatures, and structure fields when reviewing a finding, making it easier to understand why a vulnerability exists, but also allows security teams to map the decompiled code back to the original source code. Clear type information also helps AI to understand the context of a vulnerability.
Enhanced detection reliability: By knowing the types of variables and structures, VulHunt can make pattern matching more precise. For example, it can detect vulnerabilities that depend on the usage of particular structure fields, rather than relying solely on generic code patterns.

*Decompiled function enriched with type library information*

In short, type libraries give VulHunt the semantic context needed to make both detection and explanation more robust.

Function Signatures

Most binaries in the wild are stripped of symbols, which introduces two main challenges:

Detection logic becomes harder to write, as we cannot rely on function names.
The same detection logic is less flexible, making it easier to miss vulnerable functions.

Additionally, function prototype application is more difficult, as the decompiler cannot automatically assign the correct types.

VulHunt addresses these issues through function signature support. For each component, a collection of FLIRT signature files are maintained, organized by version and architecture. During analysis, the engine automatically applies the appropriate signatures based on the target component and its architecture, restoring function names and enabling more accurate type and pattern-based analysis.

*FLIRT signatures for different versions of the libexpat library*

This mechanism improves both detection coverage and explainability, allowing analysts to reason about functions even in stripped binaries.

Annotations

VulHunt is designed not only to detect vulnerabilities, but also to improve explainability and help analysts understand why a finding matters. To support this, the framework allows users to annotate decompiled code using instruction addresses.

Under the hood, VulHunt maintains a mapping between instruction addresses and code ranges, which ensures that annotations are precise. This enables users to highlight a wide variety of code elements—from individual instructions and variable assignments to function calls and control structures such as for loops and if statements—directly in the decompiled code.

Byte pattern matching

Sometimes, even the simplest approaches can be highly effective. For this reason, VulHunt supports byte pattern matching, enabling detection of specific sequences of bytes within binaries. The framework is compatible with FwHunt patterns, allowing analysts to target granular instructions or recurring code patterns that might indicate vulnerabilities.

*Byte pattern generated using FwHunt’s Ghidra script*

What makes VulHunt different is that matched byte sequences can be immediately correlated with higher-level representations—such as decompiled code—allowing analysts to understand the context of each match.

Beyond vulnerability detection, byte pattern matching is particularly useful for malware analysis. Analysts can identify indicators such as embedded shellcode, unusual instruction sequences, obfuscated code, and constructs typically optimised out of decompiler listings.

Intermediate Representation (IR) matching

When detecting specific behavior at the instruction level, dealing with multiple architectures can be a major challenge. VulHunt addresses this by using an Intermediate Representation (IR). During analysis, assembly code obtained from the disassembler is lifted into IR based on Ghidra’s PCode, providing a unified, architecture-independent view of the binary.

By operating on IR instead of raw assembly, VulHunt can leverage instruction-level detection logic that works consistently across different processor architectures. This enables precise matching of low-level operations—such as arithmetic on pointers, memory accesses, or control-flow patterns—without worrying about variations in instruction sets.

Putting it all together

VulHunt’s strength comes from combining multiple analysis techniques within a single, unified framework. By correlating findings across different layers, the tool can provide a clear and actionable view of potential vulnerabilities in a binary.

The example below shows VulHunt in action: the decompiled code is annotated to highlight a detected vulnerability, illustrating how the different analyses come together to produce precise results.

Conclusion

In this post, we explored the core capabilities of VulHunt, from high-level semantic analyses like dataflow tracking and code pattern matching, to lower-level approaches such as IR and byte pattern matching. We also highlighted the features that make VulHunt actionable and explainable, including type libraries, function signatures, and code annotations.

These building blocks combine to give analysts, security researchers, and security teams a flexible, architecture-aware framework for detecting and reasoning about vulnerabilities in binaries—whether known, modified variants, or even previously unknown, as well as enabling robust malware detection.

In the next post in this series, we will explore VulHunt’s integration with large language models (LLMs) and demonstrate how it can be leveraged to enhance vulnerability analysis and detection.

‍