Securing JTAG-based At-Scale Debug

Security through obscurity is not a meaningful means to mitigate malevolent attacks. With the greater forensics capabilities offered by At-Scale Debug (ASD), how are platforms protected?

Baseboard Management Controllers (BMCs) are the service management engines on virtually all of today’s servers. Among their capabilities, they provide out-of-band alerts, server status, power management, video redirection, and remote control capabilities to diagnose OS or hardware problems remotely. Being “out-of-band” means that they are separate from the rest of the platform hardware and OS, and even have their own power rails. So, even if a server crashes, the BMC can still run independently. Much specialized firmware runs on BMCs, and in fact this firmware often differentiates the server offerings of different OEMs.

By their very nature, BMCs are equipped with powerful privileges. They interact with the host system, and they are often network-accessible so as to interact with remote management consoles. Among their more specialized capabilities, they typically can:

  • Install software patches
  • Remotely receive and apply firmware updates
  • Remotely update BIOS settings
  • Push these same updates and BIOS settings to thousands of servers (cloning)
  • Reset, reboot, or even power off the host platform

Given their privileges, BMCs must present a minimal attack surface. Companies like Dell and HPE have locked down their BMCs with extensive protection against firmware attacks. HPE recently touted its servers as being the first to implement a silicon-based root of trust; that is, a link between the custom HPE silicon and the HPE Integrated Lights Out (iLO) firmware to ensure that the BMC does not execute compromised firmware code: HPE Unveils the World’s Most Secure Industry Standard Servers.

How is this implemented? It is worthwhile to put this in the context of the general security requirements used by the NIST’s Common Vulnerability Scoring System (CVSS), in the areas of:

  • Confidentiality
  • Integrity
  • Availability

Confidentiality: note that the binary image of any BMC may not be protected. So, it should not contain secrets. It should be considered, if not expected, that hackers could obtain, disassemble and reverse-engineer the firmware binary.

Integrity: the kernel of the BMC OS must provide for integrity of its code and data segments. This is accomplished via digital signatures, hashing, and CRC (to prevent accidental changes to data).

Availability (that refers to the accessibility of the services provided by the BMC): the BMC must not cause the host to crash or become unavailable. But, by design, a BMC may be able to issue an NMI on a host. Mind you, if the attacker has physical access to the platform, then BMC availability is not a consideration. With physical access, one can destroy the system with a hammer. And with unauthorized local (such as with a USB stick) or network access, it is certainly possible to install malware on the host operating system, which might be a preferred target.

In many ways, a BMC is similar to the Intel Management Engine (ME), dedicated logic that resides within Intel chipsets. The ME is a computing platform with a dedicated processor, its own real-time operating system, code and data caches, direct-memory access (DMA) to host memory, cryptography engines, SRAM, and other supporting devices. In his book, Platform Embedded Security Technology Revealed, Intel’s Xiaoyu Ruan describes how the ME was designed from the ground up with security in mind. Given its power over the platform, including direct read/write access to host memory, it makes for an interesting attack target. The ME employs technologies including random number generation, secure hash, symmetric and asymmetric key encryption, task isolation, and others to protect itself. Although not perfect, the ME forms a hardware root of trust on Intel platforms. And successful remote attacks against hardware typically requires highly advanced skills and equipment.  

At the risk of getting sidetracked a little, I would like to point out that hardware (and even silicon!) can never utterly be trusted, because it almost never constitutes a completely closed system, and because it often contains firmware that might be modified. The article from the Google Cloud Platform Blog, Fuzzing PCI Express, is an excellent read in that respect. Google offers NVIDIA and AMD GPU support within their cloud, and since these GPUs can directly access system memory, Google conducted a series of tests to ensure that such PCIe-connected devices were secure before letting them loose on their platform. It makes a fascinating read.

So, getting back to BMCs: equipping a platform for At-Scale Debug (ASD), which involves connecting BMC I/O to the JTAG chain of the main board CPU, and adding new firmware, also increases its attack surface. The functionality of ASD, such as read/write access to host CPU registers, memory and I/O, is a superior privilege that, if abused, can result in serious security consequences.

Like the ME, modern BMCs support memory integrity check controllers, hash and crypto engines, mailbox registers for communication with the host, verified boot, and other supporting technologies. These need to be employed as part of a comprehensive architectural description that addresses the following points (from Platform Embedded Security Technology Revealed):

  • Security architecture: The architecture includes components of the products, functionalities of each component, internal and external interfaces, dependencies, flow diagrams, and so on. A product’s architecture is driven by its assets and functional requirements.
  • Assets: Assets are valuable data that must be protected by the product, for confidentiality, integrity, and/or anti-replay.
  • Security objectives: Security objectives are the goals that the product intends to meet for protection.
  • Threat analysis: Based on the in-scope security objectives, a list of possible attacker threats to compromise the objectives and assets are documented and analyzed.
  • Mitigations against threats: The mitigation plans detail how the architecture is designed to deter threats, protect assets, and achieve security objectives. In most cases, effective mitigations are realized through well-known and proven cryptography and security approaches.

A good visual of this process from Mr. Ruan’s book is as follows:

Threat mitigation architecture

I should point out that when architecting any computing system, there are always tradeoffs between functionality and security. In any insecure environment, additional functionality must be added very carefully, to avoid introducing attack vectors. Within secure environments, new capabilities can be introduced safely to improve reliability, availability and serviceability. In both cases, designers must dive deeply into threat analysis and risk assessment.

In the environment of the BMC, as with the ME, security requirements are as important as, or even more important than, functional requirements.

Never rely on security solely through obscurity.

To learn more about ASSET’s version of At-Scale Debug, named ScanWorks Embedded Diagnostics (SED), see our technical brief (note: requires registration).

Alan Sguigna