Using the Intel Innovation Engine for Embedded DDR4 Memory Test

Last week, I wrote about Intel’s public announcement of the Innovation Engine (IE), an Intel architecture processor and I/O sub-system embedded into their upcoming generations of server platforms. This article describes the use of the IE for JTAG boundary-scan testing of memories.

Studies done by the International Electronics Manufacturing Initiative (iNEMI) reveal that testing memories is one of the top issues facing electronics manufacturing. To quote, “Structural connectivity test of external memory devices is a ‘crisis in waiting’ as memory devices get larger and faster”.  This is especially true for DDR4 memory, where signal integrity concerns preclude placing test points down on the nets for legacy structural testers like In-Circuit Test (ICT). DDR4 is a step up in speed and a step down in voltage compared to its predecessor, so it is much more susceptible to interference from test pad stubs. Functional test might be expected to pick up the slack, except for constraints imposed by the Intel Memory Reference Code (MRC).

On Intel platforms, the BIOS MRC is used to initialize the memory controller and optimize read/write timing and voltage for optimal performance. The MRC is very complex: its job is to optimize multiple parallel buses operating at 2GT/s and beyond and get them to act as “one system”. On DDR4, it does this by using sophisticated methods including on-die termination (ODT), read/write leveling (using a “fly-by” topology to deliberately introduce flight-time skew, thereby avoiding simultaneous switching noise), Vref tuning, CMD/CTL/ADDR timing training, and other methods.

The MRC has a goal of “booting at all costs”: that is, it brings the memories up in as efficient a way as possible, finding the “sweet spots” of timing and voltage so that the memories are up and running quickly, and the system can continue booting. It is supposed to perform its function rapidly because users of such gear want minimal boot times and “always-on” performance. Its behavior is dictated by the platform on which it is running; that is, on a consumer device, if a major fault on a DIMM (such as an open DQ) is found, it will “blue screen” the system, because laptop users need to know something catastrophic has happened. On the other hand, for an enterprise system like a server, the MRC will quietly disable the affected channel with the defective DIMM, and continue to get the rest of the system up and running, to achieve five-nines (99.999%) availability.

And therein lies the major challenge of functional test based upon the MRC: the trade-off between boot time, defect coverage, and fault isolation. In general, the MRC will minimize boot time, which means its defect coverage and fault isolation can be low. And, if a rank or DIMM has been mapped out, it will be invisible to any functional test subsequently run. This is especially important for “memory down” systems (where the memory has been soldered down to the board), where, if a rank is disabled, there could be eight or more suspect devices – clearly very difficult to debug.

So, this is where boundary scan comes to the rescue. It is a structural test, is independent of the MRC, and is capable of net-level diagnostics. Presuming that the memory controller supports a full boundary scan implementation on these nets, shorts and opens coverage is available on address, data and control pins. This can be accomplished, in general, through the use of memory access verification (MAV) involving reading/writing of such pins, or where available, the new JEDEC DDR4 Connectivity Test (CT) mode.

On upcoming generations of Intel server platforms, the IE can be used to dramatically reduce the cost of JTAG-based memory structural test, through a new innovative approach of embedding the tester within the system. Running on a lightweight OS like OpenRTOS, the boundary scan “action player” provides a containerized execution engine and diagnostics facility. This solves the test access issue by eliminating the need for external cables, fixtures, and hardware controllers. The cost of such implementation is thus reduced dramatically. Presuming that the test can be initiated independently of the mission mode of the memory controller, it “cuts the cord” and allows the JTAG test to act like a true functional test – useful at all stages of a product’s lifecycle, with the push of a button – but with none of the limitations of MRC-based functional test as described above.

For an informative treatment of memory test, have a look at our eBook, How to Test High-Speed Memory with Non-intrusive Embedded Instruments.

Alan Sguigna