By Francesco Evangelista and Sam L. Thomas
In previous posts, we introduced VulHunt, its core capabilities, showed how it can identify known vulnerabilities, and explored applications in vulnerability research. In this post, we explore how we exposed some of the VulHunt core functionalities to LLMs, and how the integration with LLMs can effectively scale vulnerability research and reduce the effort required for manual analysis.
One of VulHunt’s strengths is that it owns the representations at every layer: disassembly, intermediate representation (IR), and decompiled code — providing LLMs with richer context and improving their ability to reason about potential vulnerabilities.
VulHunt exposes its API through an MCP server, which allows users to load a project — automatically performing disassembly, lifting, and executing a predefined set of analyses — run Lua queries to interact with the VulHunt API, and manage function metadata such as renaming functions, adding notes, and loading signature/type libraries.
The Lua query interface makes this approach highly flexible, since it exposes access to the core set of VulHunt functionalities. This enables LLMs to decide dynamically which queries to execute based on the current analysis context.
To guide LLMs, we developed a set of Claude Skills folder of instructions to extend Claude’s capabilities, covering common tasks, including:
This initial set of skills can be extended to handle more complex analyses or target specific vulnerability patterns.
To demonstrate the power of VulHunt combined with LLMs, we focus on two practical use cases: vulnerability triaging, which accelerates the assessment of known issues, and vulnerability hunting, which can uncover previously unknown flaws.
The first use case, vulnerability triaging, addresses the challenge faced by security teams overwhelmed by the number of vulnerabilities or needing to quickly assess exposure to newly disclosed issues. Combining VulHunt with an LLM allows automation of parts of the analysis, speeding up prioritization and reducing manual effort.
To see this in action, let’s examine CVE-2019-14889, a command injection vulnerability in libssh, which allows an attacker to inject and execute arbitrary commands on a remote server under certain conditions.
In the demo below, we triage this vulnerability using Claude Desktop (Opus 4.6) alongside VulHunt MCP tools and VulHunt Claude Skills.
The prompt provided to the LLM is straightforward, and can be easily adapted to other CVEs. In particular, we ask the LLM to:
1. Gather CVE details to understand the nature of the vulnerability.
2. Analyze a binary to check for the presence of vulnerable code.
3. Generate an annotated report explaining the root cause and potential exploitation path.
As you can see, we supply Claude only with a potentially vulnerable binary and the CVE ID. By fetching the CVE description from NVD and the associated commits fixing the vulnerability, Claude is able to get a very clear picture of the vulnerability. It understands the exploitation primitive — in this case, user-controlled input reaching shell execution — but also that the functions of interest are named `ssh_scp_new` and `ssh_scp_init`.
The model then loads the relevant skills to understand its available analysis capabilities, loads the binary and executes a query to look for functions associated with the vulnerable code pattern, so functions whose names match `ssh_scp`. It then proceeds by decompiling them and checking whether any input sanitization is performed.
The final result is a summary of the vulnerability and a report including VulHunt’s decompiled output, enriched with LLM-generated annotations explaining the root cause and potential exploitation path.
Beyond triage, VulHunt enables LLM-driven vulnerability hunting to uncover previously unknown flaws. This is particularly valuable for identifying logic vulnerabilities that depend on application context, which are issues that are traditionally difficult to detect without deep program understanding. LLMs are well-suited for this task, as they can reason about component interactions and dynamically search for vulnerability patterns across execution paths.
To illustrate this, consider the `rex_cgi` binary from the Netgear RAX30 firmware (version V1.0.7.78), the same target was analyzed in one of our previous blogs (Vulnerability REsearch using VulHunt).
In this scenario, we adopt the mindset of a vulnerability researcher and use the LLM to perform a source to sink analysis, checking whether attacker-controlled input can propagate to dangerous functions. In particular, we prompt the LLM to use the VulHunt dataflow engine to search for an unsanitized path from the source function `json_object_get_string` to the sink function `cmsUtl_strcpy`.
We then ask Claude Code to identify potential vulnerabilities involving these two functions.

As we can see in the following figure, Claude loads the project, ingests the VulHunt Skills, and performs dataflow analysis to trace whether user-controlled input propagates from the defined source to the sink.

The model validates that the source originates from externally controllable input and confirms the absence of bounds checking before reaching the sink.

As a result, it successfully identifies a vulnerability that was previously found through manual analysis. This demonstrates how LLM-guided workflows can replicate and scale traditional reverse engineering efforts.
This is only a starting point. By integrating frameworks such as clauders, a Rust crate exposing Rust bindings for the Claude Code CLI, it becomes possible to build agentic workflows that automatically discover sources and sinks, generate candidate vulnerabilities, and triage potential security issues at scale.
To extend AI-assisted analysis beyond local workflows, VulHunt also integrates directly with the Binarly Transparency Platform (BTP). Using Claude Skills, security analysts can interact with BTP from within their analysis environment without switching tools.
This integration allows to:
Researchers can provide instructions like:
For example, if a product is affected by a vulnerability, the LLM can be instructed to retrieve the target binary, perform deeper analysis, and verify specific properties — all without relying on third-party tools. In the images below, this process is shown in action: first, the target binary is located and extracted from a BA2 archive using Claude Skills, and then a dataflow analysis is performed on it with VulHunt MCP tools.



Integrating VulHunt with LLMs opens new possibilities for both vulnerability triaging and hunting. By providing rich context across multiple representations, VulHunt helps LLMs reason about software behavior, identify potential vulnerabilities, and reduce the manual effort required for analysis.
This post highlighted practical examples, from triaging known issues to hunting unknown vulnerabilities in binaries. We’ll be presenting VulHunt at RE//verse on March 7th, where it will also be officially released!