March 5, 2026

Agentic Vulnerability Research with VulHunt

By Francesco Evangelista and Sam L. Thomas

In previous posts, we introduced VulHunt, its core capabilities, showed how it can identify known vulnerabilities, and explored applications in vulnerability research. In this post, we explore how we exposed some of the VulHunt core functionalities to LLMs, and how the integration with LLMs can effectively scale vulnerability research and reduce the effort required for manual analysis.

One of VulHunt’s strengths is that it owns the representations at every layer: disassembly, intermediate representation (IR), and decompiled code — providing LLMs with richer context and improving their ability to reason about potential vulnerabilities.

Architecture Overview

VulHunt exposes its API through an MCP server, which allows users to load a project — automatically performing disassembly, lifting, and executing a predefined set of analyses — run Lua queries to interact with the VulHunt API, and manage function metadata such as renaming functions, adding notes, and loading signature/type libraries.

The Lua query interface makes this approach highly flexible, since it exposes access to the core set of VulHunt functionalities. This enables LLMs to decide dynamically which queries to execute based on the current analysis context.

MCP Tool Description
open_project Open a project from a given path, with optional platform attributes
query_project Execute a Lua script against the currently open project and return the result as JSON
update_function_name Update the name of a function in the currently open project
set_function_notes Set notes for a function in the currently open project
get_function_notes Get the notes for a function in the currently open project
load_signatures Load FLIRT signature databases for function identification
load_types Load a type library for function type information

To guide LLMs, we developed a set of Claude Skills folder of instructions to extend Claude’s capabilities, covering common tasks, including:

  • Searching for functions based on criteria (e.g., function calls, byte patterns, regex)  
  • Identifying call sites  
  • Leveraging the dataflow engine  
  • Performing pattern matching on decompiled code or raw bytes  
  • Interacting with the Binarly Transparency Platform
Skill Description
call-sites Find function call sites in a binary
code-pattern-matching Search for code patterns in decompiled output using Weggli
dataflow-analysis Track data flow between function parameters, calls, and arguments
decompiler Decompile a function in a binary to C-like pseudocode
functions Find and list functions in a binary
byte-pattern-matching Search for raw byte patterns in binary code
btp-ba2-cli Interact with the Binarly Transparency Platform and Binarly Analysis Archives (BA2 files)

This initial set of skills can be extended to handle more complex analyses or target specific vulnerability patterns.

To demonstrate the power of VulHunt combined with LLMs, we focus on two practical use cases: vulnerability triaging, which accelerates the assessment of known issues, and vulnerability hunting, which can uncover previously unknown flaws.

Vulnerability Triaging

The first use case, vulnerability triaging, addresses the challenge faced by security teams overwhelmed by the number of vulnerabilities or needing to quickly assess exposure to newly disclosed issues. Combining VulHunt with an LLM allows automation of parts of the analysis, speeding up prioritization and reducing manual effort.

To see this in action, let’s examine CVE-2019-14889, a command injection vulnerability in libssh, which allows an attacker to inject and execute arbitrary commands on a remote server under certain conditions.

In the demo below, we triage this vulnerability using Claude Desktop (Opus 4.6) alongside VulHunt MCP tools and VulHunt Claude Skills.

The prompt provided to the LLM is straightforward, and can be easily adapted to other CVEs. In particular, we ask the LLM to:

1. Gather CVE details to understand the nature of the vulnerability.  
2. Analyze a binary to check for the presence of vulnerable code.  
3. Generate an annotated report explaining the root cause and potential exploitation path.

As you can see, we supply Claude only with a potentially vulnerable binary and the CVE ID. By fetching the CVE description from NVD and the associated commits fixing the vulnerability, Claude is able to get a very clear picture of the vulnerability. It understands the exploitation primitive — in this case, user-controlled input reaching shell execution — but also that the functions of interest are named `ssh_scp_new` and `ssh_scp_init`.

The model then loads the relevant skills to understand its available analysis capabilities, loads the binary and executes a query to look for functions associated with the vulnerable code pattern, so functions whose names match `ssh_scp`. It then proceeds by decompiling them and checking whether any input sanitization is performed.

The final result is a summary of the vulnerability and a report including VulHunt’s decompiled output, enriched with LLM-generated annotations explaining the root cause and potential exploitation path.

Vulnerability Hunting

Beyond triage, VulHunt enables LLM-driven vulnerability hunting to uncover previously unknown flaws. This is particularly valuable for identifying logic vulnerabilities that depend on application context, which are issues that are traditionally difficult to detect without deep program understanding. LLMs are well-suited for this task, as they can reason about component interactions and dynamically search for vulnerability patterns across execution paths.

To illustrate this, consider the `rex_cgi` binary from the Netgear RAX30 firmware (version V1.0.7.78), the same target was analyzed in one of our previous blogs (Vulnerability REsearch using VulHunt).

In this scenario, we adopt the mindset of a vulnerability researcher and use the LLM to perform a source to sink analysis, checking whether attacker-controlled input can propagate to dangerous functions. In particular, we prompt the LLM to use the VulHunt dataflow engine to search for an unsanitized path from the source function `json_object_get_string` to the sink function `cmsUtl_strcpy`.

We then ask Claude Code to identify potential vulnerabilities involving these two functions.

Figure 1. Prompt provided to begin vulnerability hunting, including the rex_cgi binary and the source/sink specification

As we can see in the following figure, Claude loads the project, ingests the VulHunt Skills, and performs dataflow analysis to trace whether user-controlled input propagates from the defined source to the sink.  

Figure 2. Claude loads the project, reads VulHunt Skills, performs dataflow analysis, and decompiles candidate functions.

The model validates that the source originates from externally controllable input and confirms the absence of bounds checking before reaching the sink.  

Figure 3. The LLM verifies the source is externally controlled and reports the vulnerability reaching the sink (`cmsUtl_strcpy`)

As a result, it successfully identifies a vulnerability that was previously found through manual analysis. This demonstrates how LLM-guided workflows can replicate and scale traditional reverse engineering efforts.

This is only a starting point. By integrating frameworks such as clauders, a Rust crate exposing Rust bindings for the Claude Code CLI, it becomes possible to build agentic workflows that automatically discover sources and sinks, generate candidate vulnerabilities, and triage potential security issues at scale.

BTP Integration

To extend AI-assisted analysis beyond local workflows, VulHunt also integrates directly with the Binarly Transparency Platform (BTP). Using Claude Skills, security analysts can interact with BTP from within their analysis environment without switching tools.

This integration allows to:

  • Upload and scan binary targets through prompts  
  • Retrieve findings and triage vulnerabilities with AI guidance  
  • Deploy custom detection rules to BTP via natural language requests  
  • Download BA2 archives for offline analysis  
  • Extract and inspect components from BA2 archives on demand

Researchers can provide instructions like:

  • “Scan this UEFI firmware and show me the critical findings.”  
  • “Extract this binary from a BA2 archive and triage the most critical vulnerabilities.”

For example, if a product is affected by a vulnerability, the LLM can be instructed to retrieve the target binary, perform deeper analysis, and verify specific properties — all without relying on third-party tools. In the images below, this process is shown in action: first, the target binary is located and extracted from a BA2 archive using Claude Skills, and then a dataflow analysis is performed on it with VulHunt MCP tools.

Figure 4. Vulnerability identified by Binarly Transparency Platform

Figure 5. Claude retrieves the target binary from BTP and extracts it from the BA2 archive
Figure 6. Dataflow analysis is performed on the extracted binary using VulHunt MCP tools

Conclusion

Integrating VulHunt with LLMs opens new possibilities for both vulnerability triaging and hunting. By providing rich context across multiple representations, VulHunt helps LLMs reason about software behavior, identify potential vulnerabilities, and reduce the manual effort required for analysis.

This post highlighted practical examples, from triaging known issues to hunting unknown vulnerabilities in binaries. We’ll be presenting VulHunt at RE//verse on March 7th, where it will also be officially released!

What's lurking in your firmware?