



Dr.-Ing Jens Benndorf (DCT) Gregor Schewior (DCT)

A new Computer Vision Processor Chip Design for automotive ADAS CNN applications in 22nm FDSOI based on Cadence VP6 Technology

Tensilica Day 2017 16th Feb. 2017, Leibniz University Hanover, Germany



# **DCT Company Profile**

# Dream Chip Technologies ...



- Positioned as a Fabless Microelectronic Engineering Company for medium to large SoC designs covering the whole range from Architecture, Specification, Design, Verification to GDSII
- Technologies: 130nm, 40nm, 28nm, 22nm FDX, 14/16nm FF
- 60 Employees/ 52 Engineers with 10 ... 20 years SoC design experience
- Based in Hanover (HQ) and Hamburg, Germany
- Member of Silicon Saxony/ Germany
- Cadence Design Center Partner for Tensilica tensilica



# Assisted Driving requires Cameras, Radar and Ultrasonic





## Use Case #1: Digital Mirroring





# Use Case #1: Digital Mirroring - The Multiview

 Automotive multi camera systems for Bird-View, Rear-View and Panorama-View are a major part of today's emerging technologies to make driving more safe and comfortable and to move towards autonomous vehicles.





## Use Case #2 : 360 deg Top View Camera





## Use Case #3 : Pedestrian detection via CNNs





# **Partitioning of Algorithms**

Dream Chip Technologies Confidential



## Introduction – Image Sensor Processing Overview





## **Heterogeneous Cores for Image Sensor Processing**





## **Example: 360 deg Top View**

- Tasks: Fish-Eye Lens correction, Stitching, Warping, Photometric synchronization for ADAS surround sensors
- Mapping:





# **MPSoC Chip Architecture**

Dream Chip Technologies Confidential







## **Vision P6 architecture**



| VLIW & SIMD           | 5 issue slots<br>64way 8-bit<br>32way 16-bit<br>16way 32-bit             |
|-----------------------|--------------------------------------------------------------------------|
| ALU Ops               | 64 32-bit<br>128 16-bit<br>256 8-bit                                     |
| Memory width          | 1024-bits<br>2 vector load/store units                                   |
| # of vector registers | 32                                                                       |
| SuperGather           | 32 non-contiguous<br>locations read/written per<br>instruction           |
| Bus interface         | AXI4                                                                     |
| iDMA                  | no alignment restrictions,<br>local memory to local<br>memory transfers, |
| Target frequency      | 800 MHz @ 28 nm<br>1.1 GHz @ 16 nm                                       |
| Optional              | Vector floating point, ECC                                               |



# Convolutional neural networks (CNNs)

Dream Chip Technologies Confidential



# Why CNNs?



[Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. "Deep Residual Learning for Image Recognition". CVPR 2016.]

• Human: 5.1%



# Why CNNs?

- ADAS video applications using CNNs
  - Traffic sign recognition
  - Pedestrian detection
  - Image segmentation / Scene labeling
  - Object localization
  - Self driving cars



# What is a CNN?

- Special case of a neural network a deep learning based approach for highquality object detection
- Neural network:
  - System of interconnected artificial "neurons" inspired by biological neural system
  - Neurons are the basic computation units of the brain connected with synapses
  - Connections have numeric weights, tuned during training process
  - Properly trained network will respond correctly when presented image or pattern to recognize





## What is a CNN? – cont.

- Neural network organized in multiple layers
  - Fully connected layers



- CNNs have additional layers
  - Convolutional layers
  - Pooling / subsampling layers
  - Non-linear layers





## What is a CNN? – cont.

- Convolutional layer:
  - Motivated by visual cortex
    - Contains cells responsible for detecting light in small, overlapping sub-regions of the visual field, called receptive fields
  - k x k x D multiply-accumulates (MAC) required to create one element of one output feature
  - Convolution outputs 3-dimensional
  - Multiple convolutional layers
  - Results in a lot of MACs per image (see next slide)









- 60M weighs
- ~800M multiply-accumulate to process one 227x227x3 image
- Trigger function ReLU: f(x)=max(0,x)









# Assembling the SOM...

Dream Chip Technologies GmbH





### System on Module



#### Overview

- DCT ADAS Heterogeneous Multi-Core Chip (22nm FDSOI Global Foundries)
- Board-to-board header with chip interfaces
- Expandable flash storage
- Power management and measurement
- Real Time Clock (RTC)
- Chip power supplies included

### Benefits

- Reduced application-specific baseboard complexity
- Interfaces customizable to application requirements
- Expandable application flash storage
- Power consumption measurement
- Only Single 12 VDC power supply required

#### System-on-Module features

- Embedded 4GB LP-DDR4 2400 RAM
- 128MB ARM Cortex A53 storage
- 32MB ARM Cortex R5 storage
- Gigabit Ethernet PHY
- Power Management IC
- Real Time Clock (RTC)

### Interfaces

- Four 300MB/s video input interfaces
- One 300MB/s video output interface
- Gigabit Ethernet
- Dual Quad-SPI for application storage
- UART, I2C, SPI, and GPIO

### **Dimensions**

• 194mm x 100mm



### **4xHDMI** Application Board



#### Overview

- DCT ADAS Quad-HDMI Base Board
- Four HDMI 1.4b inputs
- One HDMI 1.4 output
- Custom high-speed headers available
- Remote power management
- Periodic power measurement
- Gigabit Ethernet
- CAN 2.0B
- USB UART
- Video Genlock generation

#### **Benefits**

- Official reference design
- Prepared for custom sensor interfaces
- Remote system control

#### **Base board features**

- Four ADV7611 HDMI 1.4b receivers
- One ADV7511 HDMI 1.4 transmitter
- Video data rates up to 1080p60
- Two Intel MAX10 10M08DC FPGAs
- Four high-speed interface headers
- MCP2515 CAN 2.0B controller
- Gigabit Ethernet jack for SoM
- Micro-USB UART to SoM
- Micro-USB to System Controller
- Video Genlock generation
- Video input synchronization
- True output genlock possible

### Dimensions

• 200mm x 180mm



## Software Development Kit

| Debug Access                   | ARM Cortex-A53<br>Multi-Core User Application |                           | Tensilica LX <b>7-V</b> P6<br>Multi-Core User Application |             |
|--------------------------------|-----------------------------------------------|---------------------------|-----------------------------------------------------------|-------------|
| Secure TrustZone               | Linux Kernel                                  |                           |                                                           |             |
| PSCI API                       | Kernel Drivers                                | Video4Linux API           |                                                           | Mailbox API |
| ARM Cortex-R5<br>Lock-step CPU | Communication<br>Interfaces                   | Video Input<br>Interfaces | Video Output<br>Interface                                 |             |

#### Overview

- DCT ADAS Software Development Kit
- LEDE distribution with stable Linux 4.4.42
- 32-bit and 64-bit flavors available
- Tensilica LX7-IVP6 development support
- Kernel API drivers

#### Benefits

- Official Software Development Kit
- Kernel drivers available
- Video buffer framework
- Multi-core processing examples

#### **SDK** features

- Complete ARM build environment
- LEDE distribution (lede-project.org)
- GNU ARM gcc 5.4.0
- Linux 4.4.42
- u-boot 2017.01
- musl libc 1.1.15
- 32-bit and 64-bit flavors
- all changes against respective mainline versions
- Kernel Drivers for
- QSPI, UART, I2C, Ethernet
- video framework
- Tensilica LX7-VP6 support
- Firmware control
- Debug access



### *Timeline, next steps*

|                                       | 2014         | 2015 | 2016           | 2017         | 2018                      |   |  |  |  |
|---------------------------------------|--------------|------|----------------|--------------|---------------------------|---|--|--|--|
|                                       |              |      |                |              |                           |   |  |  |  |
| FDSOI Technology Development (ST, GF) |              |      |                |              |                           |   |  |  |  |
|                                       | 28nm<br>(ST) |      | 22nm<br>(GF)   |              | 12nm<br>(GF)              |   |  |  |  |
|                                       |              |      | Forerunner     | (DCT)        |                           |   |  |  |  |
|                                       |              |      |                |              |                           |   |  |  |  |
|                                       |              |      | Tape J<br>Out1 | Tape<br>Out2 | Tape<br>Out3<br>(planning | ) |  |  |  |
|                                       |              |      |                | Prototype 1  | Prototype 2               |   |  |  |  |









# Thank You



Please contact jens.benndorf@dreamchip.de

Dream Chip Technologies GmbH Steinriede 10 D-30827 Garbsen/Hannover Germany ++49-5131-90805-0

Dream Chip Technologies Confidential