By Binarly REsearch
The accidental leakage of sensitive information such as API keys, passwords, and authentication tokens can cause major damage for organizations large and small. These leaks can happen in a variety of ways, but one of the most common sources for secret leakage is containers. To reduce the risk of potential exposure, each image should be thoroughly scanned for secrets before being published to a public registry, shipped privately to customers, or used internally within company infrastructure.
Although the concept sounds simple, there are several important challenges to consider. First, secrets come in multiple shapes and formats, making comprehensive detection difficult. Second, and closely related, false positive rates should be kept as close to zero as possible. This is particularly hard to achieve because secrets often look like random strings of characters. Finally, performance is a crucial factor, especially in CI/CD pipelines where any delay in delivering container images can directly affect users and services.
Our Binarly Transparency Platform has been fitted with secret scanning capabilities since version 2.0. In this blog post, we demonstrate how we applied this technology to scan a large dataset of Docker containers, share some of the interesting findings from our experiments, and explain how these insights helped us improve the platform. We will also highlight the differences between our product and some of the most well-known and established solutions for Docker images secret scanning, like ggshield, trivy and trufflehog. During this research we used the open-source versions of these tools, and while we identified a few limitations, their commercial editions likely represent better and more comprehensive solutions.
We began this research by defining the scope of our dataset. We decided to focus primarily on popular repositories from DockerHub library, as it is more likely that our customers will use the same (or similar) images in their deployment processes. In total, we collected over 80,000 unique Docker images from 54 organizations and 3539 repositories. This set of images is very heterogeneous, covering a wide range of operating systems and processor architectures.
To minimize network latency and avoid any rate limiting from DockerHub, we stored all images in a private container registry, resulting in a total size of over 13 TB.
After scanning all images in our dataset, the Binarly Transparency Platform identified 757 unique secrets. We manually reviewed these findings, filtered out false positives, and refined our extraction rules and detection logic. Following this round of improvements, we rescanned the dataset and retrieved 644 remaining secrets.
Here’s a breakdown of the top 10 most commonly detected secrets in our dataset:
Table 1: The top 10 most commonly detected secrets
The manual validation process was tedious and not something we wanted our Binarly Transparency Platform users to handle, especially when dealing with large numbers of findings or working to prioritize risks. To address this, we introduced a validation service that automatically checks whether a detected secret is still valid or not. With this new capability, 53 unique findings were highlighted as potentially high-risk secrets, warranting immediate investigation and prioritization:
Table 2: High-risk validated secrets detected in our dataset
By gathering all the necessary context of these findings (for example, by listing the permissions granted to an API token or examining the resources available on a password-protected FTP server), we determined that, fortunately, all of these validated secrets only provided access to public resources or were used for testing purposes.
An important consideration regarding validation is that if a secret fails verification, this does not necessarily mean that the finding is not high risk. The secret could have been valid in the past, and attackers might already have exploited it to gain access. This is especially relevant because many modern services use short-lived tokens that are only valid for a limited time. This means an attacker can still exploit a leaked secret during the brief window between the package release and secret revocation (see this advisory as an example). As a result, even a delay of a second can impact the effectiveness of security scanning tools.
Container images can contain a wide variety of files, especially when developers don’t use separate images for building and runtime purposes. With our scan results available, we were able to answer another important question: which file types are most likely to contain secrets?
You may notice that most of the entries represent different variations of source code files. Bytecode.python is by far the most common; the number of detected secrets in this file type alone exceeds the combined total for all other types. Surprisingly, none of the three alternative secret scanning solutions we tested were able to reliably detect secrets hidden within binary files like these, even though this is a relatively common issue (as an example, we found more than 300 secrets in ELF files alone).
For instance, one container in our dataset included an ELF binary named observability-pipelines-worker, over 113 MiB in size, containing various types of secrets embedded in its .rodata section. Although these secrets were not valid, they were in the correct format – therefore, without additional context, the security tool should have detected and reported them.
While running our experiments, we noticed another common mistake associated with Docker images. Developers sometimes clone an entire Git repository (including all source code and commit history) to build their target application. However, they often forget to use a different image for the runtime, which should contain only the necessary components.
This negatively impacts the image size, but also introduces a serious security risk if a private repository is cloned and left inside a container. These cases can lead to potential leakage of intellectual property and make it easier for attackers to analyze the application’s logic. Additionally, private repositories often contain sensitive information within their Git configuration files and commit history.
Upon reviewing the results of scanning our dataset, we found that 2,473 images contained at least one Git repository. Of these, 277 were public and 25 were private. Scanning the Git history of these repositories yielded 34 additional valid findings, so we added this functionality to our default scanning pipeline in the Binarly Transparency Platform:
None of the three alternative secret scanning solutions we tested were able to detect secrets stored within the commits of Git repositories found in the container images. We agree that most of the leaks related to Git repositories should be identified before they appear in the container, ideally by blocking all the commits that contain sensitive value. However, it is also possible for dangerous commits to be made during the deployment process, which can evade the detection of any security tool working on VCS level. In this case, scanning of built Docker containers is the only option.
When it comes to detecting secrets, scanning coverage is only one side of the coin. Another important factor is scanning performance, which needs to be fast enough to fit into a wide range of release pipelines.
As shown in the previous histogram, Docker container image sizes vary widely, with more than 50% of the images in our dataset exceeding 256 MiB, and a few surpassing 1 GiB. The average container size is 375 MiB, and thanks to our extremely fast scanning engine written in Rust, scanning such an image for leaked secrets typically takes less than 10 seconds. The largest sample in our dataset was an image from google/deepsomatic, which occupied almost 25 GiB of disk space. At Binarly, we pay attention to the full spectrum of scenarios, so we compared the performance of our tool on such large inputs against competing tools.
To run this comparison, we scanned the 25GiB image on the same virtual machine (Ubuntu 24.04, 8 CPUs, and 32 GB of RAM) five times in a row for all tools, collecting the best scan times. The total scanning time may be affected by factors such as network latency, for example during secret validation or when sending files to a remote server, as in the case of ggshield. The time required to download images from the registry was also not included in our measurements. Since not all tools provide metrics useful for performance evaluation, such as the list of scanned files and the total number of bytes processed, we modified them to collect and log this additional data. However, for the scanning speed evaluation, we used the unmodified versions of the tools to ensure that our instrumentation did not affect scanning time. We again want to highlight that we used the free, open-source versions of the tools, and their performance may differ from their commercial counterparts.
Table 3: Statistics extracted from scanning a large 25GiB image. Potentially useful findings calculated by triaging all findings generated by each tool and filtering out the false positives
As we see in the previous table, our solution is not the fastest, as trivy and ggshield both complete the scan in under five minutes. This difference is mainly due to the much larger volume of data processed by Binarly, which includes very large binary files. This results in a 73x and 126x increase in the total scanned data from our solution compared to these tools.
On the other hand, trufflehog leads in the number of files processed. However, after comparing its scanned data with our solution, we found that many of these additional files are timezone data, terminfo databases, files used for hash integrity checks, and media resources like PNG, GIF, and ICO files. In our experience, these types of files rarely contain sensitive information and can be safely excluded from scanning to reduce overall scan time.
The decision to choose Rust for our entire technology really shows off when it comes to comparing scan speeds. Our product achieved the best result in this category by far, processing more than 39 megabytes per second (MiB/s), over two times better than the fastest competitor (truffelhog, which is implemented in Go).
Scanning containers for exposed secrets is a critical step in securing modern software delivery pipelines. Our research shows that secrets appear in a wide variety of file types, including source code, configuration files, and even large binary files, areas where many existing scanners fall short. Moreover, the presence of entire Git repositories inside container images represents a serious and often overlooked security risk.
The Binarly Transparency Platform not only offers broad scanning coverage but also delivers high performance thanks to its Rust-based engine. While it may not always be the fastest tool in raw scan time, it processes vastly more data and identifies more relevant findings with fewer false positives. It also includes a secrets validation service to help highlight and prioritise the most important findings.
The Binarly Transparency platform can identify more than 200 types of secrets in various input types. While this research was focused on Docker containers, the platform can identify secrets in any target, including POSIX firmware images like BMC/xIoT.
Binarly Secret Scan in action: https://showcase.binarly.io/share/NQjUBD6TSDh7ndOD75zJ