VulHunt in Practice: Detecting a Remote Code Execution Vulnerability in rsync

By Fernando Mercês

We walk you through writing a VulHunt rule to detect a known vulnerability in a widely used software component.
We show how VulHunt produces detailed results that accurately pinpoint the vulnerability’s root cause.

In our very first blogpost about VulHunt, we’ve explained what is the problem our framework solves, how it differs from common approaches to detect vulnerabilities in binaries, and introduced you to our framework’s core capabilities. As we previously discussed, VulHunt has two usage modes: standalone and agentic. In this blogpost, we’ll focus on the standalone mode, which works with VulHunt rules – Lua language scripts that interact with the core engine to detect vulnerabilities.

We’ll guide you through the process of writing a VulHunt rule for a known vulnerability in Rsync, a popular tool to transfer files over the network present in many systems. We start by understanding the vulnerability before proceeding to how we can use VulHunt to detect the exact point in the code where the vulnerability occurs.

Understanding our target: CVE-2024-12084

The impacted Rsync versions range from 3.2.7 to 3.3.9. As part of the protocol negotiation, client and server must agree on a digest algorithm to verify the integrity of the file chunk being transferred. Let’s take a look at the relevant code now.

In `rsync.h` two structs are defined:

#define SUM_LENGTH 16
--snip--
struct sum_buf {
    OFF_T offset;           /**< offset in file of this chunk */
    int32 len;              /**< length of chunk of file */
    uint32 sum1;            /**< simple checksum */
    int32 chain;            /**< next hash-table collision */
    short flags;            /**< flag bits */
    char sum2[SUM_LENGTH];  /**< checksum  */
};

struct sum_struct {
    OFF_T flength;		/**< total file length */
    struct sum_buf *sums;	/**< points to info for each chunk */
    int32 count;		/**< how many chunks */
    int32 blength;		/**< block_length */
    int32 remainder;	/**< flength % block_length */
    int s2length;		/**< sum2_length */
};

Then, in the `receive_sums` function from `sender.c` we have:

struct sum_struct *s = new(struct sum_struct);
--snip--
s->sums = new_array(struct sum_buf, s->count);

for (i = 0; i < s->count; i++) {
    s->sums[i].sum1 = read_int(f);
    read_buf(f, s->sums[i].sum2, s->s2length);

Both `new` and `new_array` are wrappers for memory allocation functions. The `sums` field of the struct pointed by `s` variable is a pointer to an array of `sub_buf` structs. Within the loop shown above, `s->s2length` bytes are read from `f` and written to the memory reserved for the `.sum2` field of every `sum_buf struct` in the `sums` array.

The problem is that `sum2` has a fixed length of 16 bytes and `s->s2length` is controlled by the attacker, bounded by a `MAX_DIGEST_LEN` constant. All samples we found had this value set to 64. Nevertheless, if `s->s2length` is greater than 16, it will cause an out-of-bounds write to the `sum2` buffer.

Another important thing to highlight is that `read_int` also calls `read_buf`, and it is commonly found inlined. The decompiled version of the relevant code is as follows:

Figure 1 - `send_files` function decompiled in IDA

As you can see in Figure 1, both calls to `new` and `new_array` were replaced by calls to `my_alloc`. Also, `read_int` became a call to `read_buf`. This is because the compiler inlines these function calls for better performance. We have to take that into account when writing VulHunt rules as function names in the decompiled code might be different from source code.

Now, if we were to draft an algorithm to detect this vulnerability, it could be:

In `rsync` binaries:

Ensure the value of MAX_DIGEST_LEN is 64 (or at least that it is greater than SUM_LENGTH).
Find the `send_file` function.
Verify it has two calls to `my_alloc`.
Find the second call to `read_buf`, which copies `s2length` to `sum->sum2`.

We should also make sure `SUM_LENGTH` is indeed 16, but since this was the case for all vulnerable binaries we analyzed, we’ll skip this check for the sake of simplicity.

Alright, let’s see how we translate this logic to VulHunt.

Writing a VulHunt rule

A rule is a small program written in Lua programming language that reports a result to VulHunt. It contains metadata and detection logic, where the latter is split into one or more functions. Let’s see how it works.

Rule metadata

Each rule starts with a preamble containing rule metadata such as rule author, name, target platform and architecture, conditions for the rule to run, type libraries and function signatures needed. We start our rule with the following:

author = "Binarly"
name = "CVE-2024-12084"
platform = "posix-binary"
architecture = "*:*:*"
types = "rsync/v3.3.0/rsync"
signatures = {project = "rsync", from = "3.2.7", to = "3.3.9"}
conditions = {name_with_prefix = {"rsync"}}
scopes = scope:functions{
  target = {matching = "send_files", kind = "symbol"},
  with = check
}

The name field contains the rule name. As we're dealing with a known vulnerability, it’s reasonable to use its identifier.

Because VulHunt also supports other targets such as UEFI modules, we tell explicitly that this rule applies only to `posix-binary` in the platform field.

The `"*:*:*"` triplet assigned to the architecture field tells VulHunt this rule will scan binaries compiled to all supported architectures (currently x86 and ARM, both 32- and 64-bit). Other possible include “x86:LE:*” and “AARCH:LE:64”. Asterisk characters mean any value and can be used in any triplet member.

As this rule applies to the rsync binary only, we set it as the component name in conditions. Think of container images containing thousands of binaries. We built VulHunt to be fast, so the conditions field allows rules to scan only on binaries that make sense.

Finally, we load any needed type library and signatures. The way they work is beyond the scope of this article, but keep in mind that a type library will improve the output of the decompiler significantly and function signatures allows our rule to find functions by name even when target binaries contain no symbols (stripped binaries).

There’s one more field required in the preamble, which is the scopes field. It tells VulHunt which scopes this rule uses. Think of scopes as strategies you want to use to find the vulnerable piece of code you’re interested in. Effectively, they set the analysis context in which the detection logic will focus on. The following scopes are supported:

Analysis Scope Table

Scope	Analysis Context	Description
Project	Whole binary	The entire executable file or library being analyzed.
Functions	Functions present in the binary	Individual subroutines or methods contained within the executable.
Calls	Call-sites within functions present in the binary	Specific locations where a function is invoked (called) by another function.

In this case, we want to look for the vulnerable function by its symbol name, so we’ll be using `scope:functions` this way:

scopes = scope:functions{
  target = {matching = "send_files", kind = "symbol"},
  with = check
}

Other possible values include `scope:project` and the powerful `scope:calls`. We’ll show how they work in a future article. For now, let’s continue.

The `with = check` part sets the checking function that will run once VulHunt finds our `send_files` function. This is where our main rule logic will be located. The function name here is not important, but `check` is a good generic name. Let’s get to the logic!

Initial rule logic

Because we used `scope:functions` with `target = {matching = "send_files", kind = "symbol"}`, we can safely assume the `check` function will only analyze the function of interest. To start small, let’s write a simple `check` function that reports a single call to `my_alloc` from within `send_files`:

    if #my_alloc < 2 then
        return
    end

    -- returns a `result:critical` table
    return result:critical{
        name = "CVE-2024-12084",
        description = "Heap overflow in rsync",
        evidence = {
            functions = {
                -- within `send_files`...
                [context.address] = {
                    -- ...annotate one call to `my_alloc`
                    annotate:at{
                        location = my_alloc[1],
                        message = "An allocation happens here"
                    }
                }
            }
        }
    }
end

NOTE: In Lua, a table is a versatile data type that implements associative arrays and can hold other tables as well. Tables are created with the `{}` expression.

Our `check` function takes two parameters: `project` and `context`. Let’s understand what they are for:

Parameter Description Table

Parameter	Description
`project`	This parameter allows our rule to access information and perform actions over the full binary. Using this, we could look for other functions, decompile them, search for byte patterns, etc.
`context`	This provides granular access to the context of the function of interest (`send_files` in this example). With this, we can get its inner function calls, search for code patterns within it, etc.

In case there are at least two calls to `my_alloc`, our simple `check` function returns a `result:critical` table to VulHunt containing three required fields: name, description, and evidence. The latter is a table that contains tables of addresses we want to annotate at. We annotate the address stored at `my_alloc[1]` with a message. In other words, this message will be added to VulHunt’s output for this call to `my_alloc` from within `send_files`. Figure 2 shows the VulHunt output produced by the command-line version when this rule is used to scan a vulnerable binary.

Figure 2 - Output of a minimal rule to detect CVE-2024-12084

The example works, but it’s of course not complete. Ideally, our rule should:

Include an annotation for the function prototype.
Annotate both calls to `my_alloc`, not just one.
Use more descriptive messages.
Point out where exactly the vulnerability occurs (calling `my_alloc` is not a problem per se).
Return more information than just the CVE number and description.

In the following sections, we’ll improve the rule with the above. Stick with us!

Improving annotation

Let’s get the `send_files` function prototype annotated, cover both calls to `my_alloc`, and use more descriptive messages. Here’s a table that achieves this:

return result:critical{
	name = "CVE-2024-12084",
	description = "A heap-based buffer overflow flaw was found in the rsync daemon.\nThis issue is due to improper handling of attacker-controlled checksum lengths (s2length) in the code.\nWhen MAX_DIGEST_LEN exceeds the fixed SUM_LENGTH (16 bytes), an attacker can write out of bounds in the sum2 buffer.",
	evidence = {
    	functions = {
        	[context.address] = {
            	annotate:prototype "void send_files(int f_in, int f_out)",
                annotate:at{
                	location = my_alloc[1],
                    message = "This call to `my_alloc` allocates memory for a `sum_struct`, which contains\nthe `s2length` field that will be populated from attacker-controlled data"
                }, annotate:at{
                	location = my_alloc[2],
                    message = "This call to `my_alloc` allocates memory for an array of `sum_buf` structures\nthat contain a fixed-size `sum2` field (16 bytes) and this array is\nlater assigned to the previously allocated `sum_struct->sums`"
}
	}
    		}
        		}
            			}

We used `annotate:prototype` to give VulHunt more information about the `send_files` function we’re working with. Also, both calls to `my_alloc` were annotated. As a result, the output is nicer as Figure 3 shows:

Figure 3 - Rule output after annotating `send_file` prototype and both calls to `my_alloc` and using more descriptive messages

The output looks good, but we still miss the most important part, which is where exactly the vulnerability occurs. Also, because we use different annotations for each call to `my_alloc`, we must ensure VulHunt gets it right.

Order guarantee

When VulHunt sees the `context:calls "my_alloc"` code, it returns a table containing all calls to `my_alloc` from that specific context (`send_files` in our case), but the order of these function calls is not guaranteed for performance reasons. In this case though, we need to know which call comes first because we’re annotating them with different annotation messages. The following snippet achieves this:

  -- get the calls to `my_alloc`
    local my_alloc = context:calls "my_alloc"

    -- if the number of calls to `my_alloc` is less than 2, return
    if #my_alloc < 2 then return end

    -- two or more calls to `my_alloc`; get the second one
    local my_alloc1
    local my_alloc2
    if context:precedes(my_alloc[1], my_alloc[2]) then
        my_alloc1 = my_alloc[1]
        my_alloc2 = my_alloc[2]
    elseif context:precedes(my_alloc[2], my_alloc[1]) then
        my_alloc1 = my_alloc[2]
        my_alloc2 = my_alloc[1]
    end

Then we need to use `my_alloc1` and `my_alloc2` in the table returned by the `check` function.

Annotating only the vulnerable call to `read_buf`

From Figure 1, we can see there are two calls to `read_buf`, but we’re interested in the second one, which is where the vulnerability actually is. As we saw in the previous section, annotating a function call normally means finding its address first. Here’s a snippet that does this for us:

    -- get the calls to `read_buf`
    local read_buf = context:calls("read_buf")
    if #read_buf < 2 then return end

    -- a variable that will hold the address of the `read_buf` call we want
    local read_buf2

    -- two or more calls to `read_buf`; get the second one
    if context:precedes(read_buf[1], read_buf[2]) then
        read_buf2 = read_buf[2]
    elseif context:precedes(read_buf[2], read_buf[1]) then
        read_buf2 = read_buf[1]
    end

Once again, we used `context:precedes` because we’re interested in the second call to `read_buf`. Expanding the evidence table to include both annotations to `my_alloc` (now honoring their order) and to the second call to `read_buf` is seamless:

 evidence = {
            functions = {
                [context.address] = {
                    annotate:prototype "void send_files(int f_in, int f_out)",
                    annotate:at{
                        location = my_alloc1,
                        message = "This call to `my_alloc` allocates memory for a `sum_struct`, which contains\nthe `s2length` field that will be        populated from attacker-controlled data"
                    }, annotate:at{
                        location = my_alloc2,
                        message = "This call to `my_alloc` allocates memory for an array of `sum_buf` structures\nthat contain a fixed-size `sum2` field (16 bytes) and this array is\nlater assigned to the previously allocated `sum_struct->sums`"
                    }, annotate:at{
                        location = read_buf2,
                        message = "A heap-based buffer overflow occurs here as `sum_struct.sums[i].sum2` is\n16 bytes long and it's populated with       `sum_struct.s2length` bytes, which is\nattacker-controlled and not properly checked"
                    }
                }
            }
        }

Our output has now reached a level of completeness we can be proud of:

Figure 4 - Rule output containing annotations for both calls to `my_alloc` and for the vulnerable call to `read_buf`

Verifying a constant value

As we previously mentioned, we need to check the value of `MAX_DIGEST_LEN`. This constant is used in the `read_sum_head` function. The following snippet extracted from rsync source code shows how it is used:

if (sum->s2length < 0 || sum->s2length > MAX_DIGEST_LEN) {
	rprintf(FERROR, "Invalid checksum length %d [%s]\n",
			sum->s2length, who_am_i());
	exit_cleanup(RERR_PROTOCOL);
}

It wouldn’t be easy to come up with a set of byte patterns to find this comparison in binaries compiled to different architectures. However, our framework is shipped with a decompiler extension that supports powerful queries based on weggli. First, let’s take a look at a decompiled version of `read_sum_head` shown in Figure 5:

Figure 5 - Comparison between `var3` and `MAX_DIGEST_LEN` in `read_sum_head`

We’ll now write a Lua function that will look for the comparison and return its address. The following function does the trick:

function get_digest_len_check(project, faddr)
  local decomp = project:decompile(faddr)
  local queries = {
    [[
        sum->s2length = $var;
        if ($var < 0x41) {}
      ]], [[
        sum->s2length = $var;
        if (0x40 < $var) {}
      ]]
  }

  for _, query in ipairs(queries) do
    local matches = decomp:query(query)
    local max_digest_len_check = matches:address_of_match(2)
    if max_digest_len_check then return max_digest_len_check end
  end
end

The `get_digest_len_check` function searches for a comparison between a variable that gets assigned to `sum->s2length` and the literal 64. This will make sure our rule only applies to binaries that have `MAX_DIGEST_LEN` set to 64. Because compilers may use a different set of instructions to achieve the same result, the decompiler can also decompile them differently. That’s the reason why we created two different queries and stored them in the `queries` variable, which is a Lua table.

PS.: In Lua, the two brackets create long strings that are allowed to contain newlines.

We use `project:decompile(faddr)` to get the decompiled code of a function. Therefore we should pass a function address to `get_digest_len_check` in its `faddr` parameter and this should be the address of `read_sum_read`. Now we just need to add the following to the beginning of our `check` function:

    -- find the address of `read_buf_head` function, where
    -- the comparison with MAX_DIGEST_LEN happens
    local read_sum_head = project:functions("read_sum_head")

    -- if the binary does not have a `read_sum_head` function, return
    if not read_sum_head then return end

    -- if the comparison with 64 is not found, return
    if not get_digest_len_check(project, read_sum_head.address) then return end

    -- from this point onwards, we are sure MAX_DIGEST_LEN is 64
    -- and can continue with the previous logic

There are smarter ways of doing this, but they involve introducing more concepts and we don’t want to overcomplicate things. But don’t worry, we’ll explore them in the near future.

Giving the rule its final polish

A vulnerability has more than just description and a CVE number assigned to it. That’s why we support a full range of fields in the table returned to VulHunt. Most of them are self-explanatory:

    provenance = {
      kind = "posix.ELF",
      linkage = "project",
      vendor = "samba",
      product = "rsync",
      license = "GPL-3.0-or-later",
      affected_versions = {">=3.2.7", "<3.4.0"}
    },
    cwes = {"CWE-122", "CWE-787"},
    cvss = cvss:v3_1{
      base = "9.8",
      exploitability = "3.9",
      impact = "5.9",
      vector = "CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H"
    },
    advisory = "https://github.com/google/security-research/security/advisories/GHSA-p5pg-x43v-mvqj",
    identifiers = {"CVE-2024-12084", "GHSA-p5pg-x43v-mvqj"},
    references = {
      ["NVD"] = "https://nvd.nist.gov/vuln/detail/CVE-2024-12084",
      ["CVE"] = "https://www.cve.org/CVERecord?id=CVE-2024-12084",
      ["Carnegie Mellon University"] = "https://kb.cert.org/vuls/id/952657",
      ["RedHat"] = "https://access.redhat.com/security/cve/CVE-2024-12084",
      ["NetApp"] = "https://security.netapp.com/advisory/ntap-20250131-0002"
    },
    patch = "https://github.com/RsyncProject/rsync/commit/0902b52f6687b1f7952422080d50b93108742e53",
    source = "https://github.com/RsyncProject/rsync/blob/9615a2492bbf96bc145e738ebff55bbb91e0bbee/sender.c#L96-L100",

If you’re in doubt about the linkage field of the provenance table, this is where we tell VulHunt about the context this rule applies to. In most cases, it will contain the value project, meaning it is part of a project or product. There are other possible values, but we’ll skip them for now and focus on our rule!

Now the rule is at the level of completeness you get from Binarly Transparency Platform (BTP). Let’s claim the reward for our work.

Results in BTP

If you’ve made this far, congratulations! We did some interesting work together and now it’s time to see the outcome. The video below shows the result when we upload a vulnerable Rsync binary to BTP.

Conclusion

In this article, we showed how to write a VulHunt rule to scan binaries for a known vulnerability using the functions scope and the decompiler extension in standalone mode, but there’s much more to come. VulHunt can scan UEFI components and POSIX binaries within firmware images or docker containers, find unknown vulnerabilities, look for more advanced high-level code constructs, and much more. As a powerful engine, we can’t cover all its features in one blogpost, but make sure you follow us to receive a notification when we release a new article covering VulHunt. Also, we plan to release an open source version of VulHuint in Q2. If you can’t wait, don’t hesitate to contact us to test our platform with your own data.

References

Introducing VulHunt: A High-Level Look at Binary Vulnerability Detection - https://www.binarly.io/blog/vulhunt-intro

RSync: Heap Buffer Overflow, Info Leak, Server Leaks, Path Traversal and Safe links Bypass - https://github.com/google/security-research/security/advisories/GHSA-p5pg-x43v-mvqj

weggli - https://github.com/weggli-rs/weggli