Advanced firmware debugging of BMCs

Baseboard Management Controllers (BMCs) form the brains of any high-end Intel-based server platform: they perform system management functions ranging from fan speed control to remote user authentication. A new wave of these service processors is coming to market to support new capabilities needed on the next generation of Intel silicon. What are these unique features, and how will they be debugged?

A BMC is an example of a specialized service processor, which is typically found on server, storage and telecom platforms. Because of their role in overall system management, they are implemented as systems on a chip (SoC), with a plethora of built-in capability and external I/O support. The market leaders are ASPEED, Emulex (now Avago), and Nuvoton. A (somewhat dated) image of a BMC’s role on a server is below:

  BMC image

As an example, a subset of features available on the Emulex Pilot 3 BMC is as follows:

  • Fully IPMI 2.0 and DCMI Compliant
  • Integrated 32-bit 400MHz ARM9 processor with MMU and 16KB I/D caches
  • Integrated 32-bit 200MHz RISC Second Service Processor (SSP) offloads real-time processing
  • Integrated 200MHz 8051 processor for BMC test infrastructure or general purpose
  • 16-bit DDR-2/3 memory interface (up to 800MHz)
  • Three independent SPI interfaces, one with 3 Chip Selects on Boot SPI
  • 8-bit NAND/Memory Address Data External bus interface supports up to three devices
  • SD/MMC card controller with DMA support
  • Direct PECI 3.0 interface
  • Two watchdog timers
  • Dedicated RTC counter that can be synced with system RTC
  • 16 direct, 108 shared GPIOs and 80 Serial GPIOs in and 80 Serial GPIOs out
  • Eight independent I2C/SMBus controllers
  • Two independent 10/100/1000 Ethernet Controllers with RMII/RGMII support
  • Three UARTs, one for ICMB support
  • 16 Mailbox Registers for communication between the host and BMC
  • Six general purpose timers
  • 16 10-bit analog-digital converters
  • 18 independent Fan Tach Inputs
  • Eight independent Pulse Width Modulators (PWM)
  • Chassis Intrusion Logic with battery backed general purpose register
  • Programmable second PCIe function for high speed Host to BMC communication
  • LED support with programmable blink rate controls on GPIOs
  • Programmable IO Port snooping, can be used to snoop on Port 80h
  • 32-bit Random Number generator
  • Unique Chip ID
  • Hardware MCTP protocol offload engine on PCIe interface

As can be seen, these devices are fairly complex.  Things get even more interesting with the next generation of Intel silicon, which requires these ASICs to evolve further. In particular, the new Intel Enhanced Serial Peripheral Interface (eSPI), designed as a replacement for the Low Pin Count (LPC) bus, allows the BMC to have more control over the SPI flash boot process. This feature, and others, is driving a need for greater CPU horsepower, faster memory, and upgraded security in BMCs. As an example, Emulex’s support for these new capabilities on their new Pilot 4 BMC can be seen in their press release here.

To accommodate the new SoC functionality needed, these devices are evolving to more powerful ARM-based architectures. And they typically (but not always) support an embedded Linux OS to service the drivers and I/O on the chip. Porting of existing BMC firmware libraries over to the new platforms, and enabling the new features, is a large development effort for the OEMs. Bugs are being created and found, and previously good code is now crashing. Fortunately, many of these new devices support some degree of ARM CoreSight Trace capability, which makes debugging them much easier.

Debug of Linux-based BMC firmware is both an art and a science. With a combination of static analysis, code analysis, OS-aware features and trace analysis, root cause of module crashes and code failures can be more easily found. An excellent treatise on the specifics of using these techniques can be read in our recent eBook, Trace accelerates debug analysis in complex Linux systems.

Alan Sguigna