

# Improving the Processing Performance of a DSP for High Temperature Electronics using Circuit-Level Timing Speculation

Guillermo Payá-Vayá, Steffen Roskamp, Fritz Webering, and Holger Blume



Payá-Vayá et al.

Tensilica Day

16<sup>th</sup> February 2017

# **High-Temperature Electronics**

| Temperature Range | Category <sup>1</sup> |
|-------------------|-----------------------|
| 0 to 85 °C        | Commercial            |
| -40 to 100 ℃      | Industrial            |
| -40 to 125 ℃      | Extended              |
| -55 to 125 ℃      | Military              |
| > 150°C           | High-Temperature      |







# Some applications require integrated circuits, which <u>reliably</u> operate across a large range of temperatures

<sup>1</sup> Altera Corporation, Enhanced Temperature Device Support

<sup>2</sup> http://www.analog.com/en/analog-dialogue/articles/high-temperature-electronic-pose-design-challenges.html

Payá-Vayá et al.

Tensilica Day, 16th February 2017



# Traditional Design Approach: Worst Case (I)

# Two main goals:

- Ensure correct behaviour in all environments
- Clock frequency needs to account for variations



#### **Problem:** Speed of digital circuits changes with temperature!





### Traditional Design Approach: Worst Case (II)



**Solution:** Design for Worst Case, <u>i.e., Highest Temperature Range</u>



## Traditional Design Approach: Worst Case (III)



# **Problem:** System working on a larger temperature range $\rightarrow$ <u>High performance loss</u>



## So How Much Performance Do We Lose? (I)

# Arithmetic Unit:

- 24-bit unsigned multiplier
- 1 µm SOI Technology (up to 250 degrees)

# Case Study:

- Gate-level timing simulation
- Two temperature corners
- 160 frequencies: 5 to 20 MHz
- 1 Moperations (random) per frequency







### So How Much Performance Do We Lose? (II)



# Up to 40% of performance is lost in typical conditions when designing for the worst case





Idea:

- Circuit-level error detection
- Single-cycle error correction





# Razor – Circuit–Level Timing Speculation (I)

Intentionally violate timing of critical paths



[1] Ernst, Dan, et al. "Razor: circuit-level correction of timing errors for low-power operation." IEEE Micro 24.6 (2004): 10-20. Payá-Vayá et al. Tensilica Day, 16<sup>th</sup> February 2017





# Razor – Circuit–Level Timing Speculation (II)

- Intentionally violate timing of critical paths
- On-line error detection







## Razor – Circuit–Level Timing Speculation (III)

- Intentionally violate timing of critical paths
- On-line error detection
- On-line error correction





# Razor – Circuit–Level Timing Speculation (IV)





## Razor – Circuit–Level Timing Speculation (IV)





## Razor – Circuit–Level Timing Speculation (IV)





## Razor – Fast-Path Problem (I)





#### Problem:

- Paths are not equally long
- Consequent operations may corrupt the shadow register



## Razor – Fast-Path Problem (II)



#### Problem:

- Paths are not equally long
- Consequent operations may corrupt the shadow register

#### Solution:

- Delay shorter paths
- ➢ Increased area overhead



# Case Study: Razor-CFX DSP (I)

# RTL Description

Institute of Microelectronic Systems

- Full description of the Razor mechanism
- Pipeline stall implementation
- ASIC Implementation
  - Netlist generation (Synthesis)
  - Standard-cell placement and routing
  - Automatic buffer insertion (timing analysis)



[1] Roeven, Hans, Jeroen Coninx, and Marleen Ade. "CoolFlux DSP-The embedded ultra low power C-programmable DSP core." Proc. Intl. Signal Proc. Conf.(GSPx. 2004.



## Case Study: Razor-CFX DSP (II)



[1] Roeven, Hans, Jeroen Coninx, and Marleen Ade. "CoolFlux DSP-The embedded ultra low power C-programmable DSP core." Proc. Intl. Signal Proc. Conf.(GSPx. 2004. Dual-Datapath architecture

- 2 Multipliers
- 2 ALUs
- 8 data registers
  - 4x 24-bit single width
  - 4x 56-bit accumulator
- Up to 4 parallel instructions
  - 2x Arithmetic
  - 2x Move
- Zero-Overhead loops





## Case Study: Razor-CFX DSP (III)

## Timing Analysis (Critical Path = 200ns)





# Case Study: Razor-CFX DSP (IV)

## When an error is detected:

- Control unit sets restore signal
- Control unit stalls pipeline
- Correct result is recovered from shadow register







## Case Study: Razor-CFX DSP (V)

Timing Analysis (after Razor (Critical Path = 170ns))







## Case Study: Razor-CFX DSP (VI)

### Example application:

**OFDM Encoder for Powerline Communications** 







## Case Study: Razor-CFX DSP (VI)

## Example application:

**OFDM Encoder for Powerline Communications** 



### Timing Speculation (with Razor) can be used to improve performance, but more research to reduce the hardware cost





# Thank you for the attention!