Fault Detection Mechanisms for COTS FPGA Systems Used in Low Earth Orbit

authored by
Tim Oberschulte, Jakob Marten, Holger Blume
Abstract

Field-programmable gate array (FPGAs) in space applications come with the drawback of radiation effects, which inevitably will occur in devices of small process size. This also applies to the electronics of the Bose Einstein Condensate and Cold Atom Laboratory (BECCAL) apparatus, which will operate on the International Space Station (ISS) for several years. A total of more than 100 FPGAs distributed throughout the setup will be used for high-precision control of specialized sensors and actuators at nanosecond scale. On ISS, radiation effects must be taken into account, the functionality of the electronics must be monitored, and errors must be handled properly. Due to the large number of devices in BECCAL, commercial off-the-shelf (COTS) FPGAs are used, which are not radiation hardened. This paper describes the methods and measures used to mitigate the effects of radiation in an application specific COTS-FPGA-based communication network. Based on the firmware for a central communication network switch in BECCAL the steps are described to integrate redundancy into the design while optimizing the firmware to stay within the FPGA’s resource constraints. A redundant integrity checker module is developed that can notify preceding network devices of data and configuration bit errors. The firmware is validated and evaluated by injecting faults into data and configuration registers in simulation and real hardware. In the end, the FPGA resource usage of the firmware is reduced by more than half, enabling the use of dual modular redundancy (DMR) for the switching fabric. Together with the triple modular redundancy (TMR) protected integrity checker, this combination completely prevents silent data corruptions in the design as shown in simulation and by injecting faults into hardware using the Intel Fault Injection FPGA IP Core while staying within the resource limitation of a COTS FPGA.

Organisation(s)
Architectures and Systems Section
Type
Conference contribution
Pages
19-32
No. of pages
14
Publication date
2023
Publication status
Published
Peer reviewed
Yes
ASJC Scopus subject areas
Theoretical Computer Science, Computer Science(all)
Electronic version(s)
https://doi.org/10.1007/978-3-031-46077-7_2 (Access: Closed)