Firmware Supply Chain is Hard(coded)
August 20, 2021 - Binarly Team

At Black Hat USA 2021, Binarly CEO Alex Matrosov jointly presented with Nvidia security researchers Alex Tereshkin and Adam 'pi3' Zabrocki their findings in the “Safeguarding UEFI Ecosystem: Firmware Supply Chain is Hard(coded)” talk, highlighting five high severity vulnerabilities that affected the whole UEFI ecosystem.

Historically, security software vendors have overlooked the attack surfaces below the operating system, but attackers did NOT.

Figure1 https://www.platformsecuritysummit.com/2019/speaker/matrosov/

From a supply chain security point of view, the UEFI firmware ecosystem is very complex involving multiple parties; Intel or AMD developing the firmware reference code and companies such as AMI, Phoenix or Insyde developing their core frameworks for system firmware development.

The code produced by a hardware platform vendor and shipped to its customers represents less than 10% of the whole UEFI system firmware code base.

The reality is that vulnerabilities can be discovered not only in the platform vendor codebase, but also inside the reference code. The impact is far worse and reflects on the whole UEFI ecosystem.

Given different patch cycles across vendors, these vulnerabilities can stay unpatched on endpoint devices for 6-9 months (sometimes even longer).

Moreover, vendors can patch the same vulnerability differently making the fix verification difficult and expensive. Due to the complexity of the firmware supply chain discussed in one of our previous blog posts, the validation process that a certain vulnerability has been correctly patched across all the vendors is even more difficult.

Most of the time, the threat models between different stages of the boot process do not have any connections with each other. However, in reality, security boundaries overlap and provide attackers with new opportunities. For example, developers frequently fail to sanitize untrusted data flows such as NVRAM variables that often can be controlled by an attacker for malicious purposes..

Figure2

A look at NVRAM's attack surface from an attacker's perspective shows some interesting opportunities.

Figure3

NVRAM region in the figure below is not protected by Intel Boot Guard in the SPI flash and can be abused by an attacker with physical access to the device (supply chain vector). An example of this type of vulnerability is the lack of sanitation of the PlatformLang NVRAM variable. It can be exploited each time a computer is rebooted since this vulnerability can be used as a persistent vector. The PlatformLang is stored in the SPI flash and it can be restored from persistent NVRAM storage after each reboot..

Figure4

Those attack vectors were already discussed in our research “efiXplorer: Hunting for UEFI Firmware Vulnerabilities at Scale with Automated Static Analysis” at Black Hat Europe 2020.

In the picture below it is shown a classic example of a vulnerability that arises from incorrect data size initialization of an NVRAM variable that is controlled by an attacker, which results in stack buffer overflows during execution.

Figure5

The Binarly team is focused on helping the firmware industry eliminate such issues and reduce the security risks they pose. A vulnerability checker based on efiXplorer plugin that automatically detects a potential misuse of GetVariable() service was released last year at Black Hat Europe 2020. An updated version of efiXplorer that includes a more fine tuned vulnerability checker will soon be released. A sample output of the vulnerability checker is shown below.

Figure6

A year after the first release of the efiXplorer vulnerability checker tool we are still discovering the same security issues. Now, let's focus our attention on several vulnerabilities from our research presented at Black Hat USA 2021 that were recently patched by Dell.

The Story of The Two Buffer Overflows

We started our Black Hat talk introducing two buffer overflow vulnerabilities we found in Dell PowerEdge servers. Both issues were patched in June by Dell and the corresponding security advisory, DSA-2021-103, released.

Before we get into technical details of those vulnerabilities, let's discuss a potential attacker model. In most cases, firmware vulnerabilities are used as the second or third vulnerability in the attack chain. An attacker must already have privileged access to the system to successfully exploit such issues. As a result of exploitation, the attacker expects to gain persistence or to breach security boundaries around memory isolation (such as hardware-based virtualization).

Attacker Model:

Local attackers can trigger the vulnerability, allowing PEI/DXE stage code execution in System Management Mode (SMM), by gaining privileged access to the host operating system.

Potential Impact:

The execution of PEI/DXE code in SMM context enables the installation of persistent implants in the NVRAM SPI flash region. Malicious implant persistence across OS installations, can further bypass Secure Boot and attack guest VM's in bare metal cloud deployments.

DSA-2021-103 (CVE-2021-21555)

The Dell PowerEdge firmware contains a heap buffer overflow vulnerability(CVE-2021-21555) in the CrystalRidge (C4EB3614-4986-42B9-8C0D-9FE118278908) DXE SMM module, which may lead to code execution in the System Management Mode (SMM). This vulnerability affects the server systems which support NVDIMM storage (Intel Optane persistent memory). This new type of persistent memory is very common for the modern server ecosystem.

A pretty frequent misuse of the GetVariable() function happens when the returned value in DataSize BYREF parameter representing the size in bytes of the NVRAM variable value that is controlled by an attacker is not checked properly (the returned size is ignored when allocation is performed). A bug hidden in the variable data parsing routine leads to a heap overflow resulting in the execution of an attacker controlled payload.

// mEraseRecordShare - buffer is allocated on heap. 
// AepErrorLog - NVRAM variable is controlled by the attacker. 

Tries = 3;
DataSize = 0; 
...
  do
  {
    // overflow occurs after the second try because variable length exceeds 964 bytes
    Status = gRT->GetVariable(L"AepErrorLog", &VendorGuid, 0, &DataSize, mEraseRacordShare);
    --Tries;
    v5 = Status < 0;
  }
  while ( Status < 0 && Tries );
...

The following screenshot shows a decompiled listing of the vulnerable function:

Figure7

DSA-2021-103 (CVE-2021-21556), INTEL-SA-00463 (CVE-2020-24486)

Another vulnerability we uncovered is a classical stack overflow caused by the misuse of memset() function in PEI module UncoreInitPeim (D71C8BA4-4AF2-4D0D-B1BA-F2409F0C20D3) in the Dell PowerEdge firmware. Interestingly, this vulnerability affects not only the Dell server ecosystem where we originally discovered it, but, based on Intel advisory INTEL-SA-00463 (CVE-2020-24486), it exists in any device that is based on Intel reference code to which the aforementioned module belongs.

In the case of a vulnerability affecting the Intel reference code, the coverage and impact are much wider than a single vendor issue. The advisory from hybrid cloud company NetApp confirms that multiple product lines were impacted by CVE-2020-24486. Another advisory issued by Siemens confirmed the impact on multiple industrial product lines. It's exactly the reason why we named our Black Hat USA 2021 talk "Safeguarding UEFI Ecosystem: Firmware Supply Chain is Hard(coded)".

Because shared code (like Intel/AMD reference, EDKII, or common frameworks from AMI/Phoenix/Insyde) is impacted across multiple vendors, it creates a chain reaction.

Let's look deeper into the technical aspects of this vulnerability. When the MirrorRequest NVRAM variable length exceeds 5 byte, a subsequent memset() will overwrite the stack with zeroes during PEI phase. An attacker controls the length of the overwritten buffer and can modify parts of saved return addresses to change execution flow which may lead to arbitrary code execution during PEI phase. The following screenshot shows a decompiled listing of the vulnerable code pattern:

Figure8

As we can see, NVRAM variables offer a pretty large attack surface, and it's only getting bigger over time.

The result of exploitation (payload execution) for all vulnerabilities discussed in this blog post can’t be measured ​and TPM PCR's will not be extended to detect such threats. The remote health attestation will not detect the active exploitation on affected systems.