

# Low-power high performance ASIP design ... with open source tools

Joonas Multanen Customized Parallel Computing (CPC) research group <u>www.tuni.fi/cpc</u>

14 Oct 2020





### Contents

- 1. Transport Triggered Architecture (TTA)
- 2. TTA-based Co-design Toolset (TCE)
- 3. Case studies using TCE tools

## Tampere University

### **Transport Triggered Architecture (TTA)**

- Internal datapaths exposed in the instruction set
  - An instruction can contain multiple parallel MOVEs
- Long instruction word (similar to VLIW)
- FUs can store data in their inputs and outputs
- TTA-specific optimizations
- Allows simpler RF compared to VLIW
- Flexible and modular
- Suitable for application specific processors and accelerators





- www.openasip.org
- Toolset to design TTA processors
  - Graphical processor designer, C/OpenCL compiler, instruction set simulator, RTL generator, ...
- Ongoing work for nearly 20 years, so the tools are very robust and tested
- Easy configurability
  - Number of FUs, FU operations, RFs, interconnection, address spaces, ...
  - Custom operations, platform integrator
- Experimental features:
  - SIMD instructions + RTL
  - L1 cache, loop buffer, instruction register file



LSU\_DAT



























# Case studies using TCE tools



#### **TCE Case Studies: LordCore**

- Software defined radio (SDR) co-processor
- MIMO detection
  - MMSE & LORD written in OpenCL
- SIMD for computational performance
- Very high performance & energy-efficiency
  - Penalty for programmability very small







#### **TCE Case Studies: LoTTA cores**

- 1-3 stage pipeline
- Fast execution of control code + DSP capabilities
- Uses instruction register file (IRF), a "software controlled cache" to improve instruction stream energy efficiency
- Up to 2.6GHz @ 0.95V on 28 nm FD-SOI
- Compared to Zero-riscy (RISCV) with matched function units
  - Core only: 2.5x lower energy consumption
  - 2.1x higher clock frequency and 1.8 lower wall clock time in maximum Cf design point, with similar energy-delay product
  - On average 14% and up to 68% better energy-delay product in energy-optimized design point





# TCE tutorial download: <a href="http://openasip.org/tutorial\_files/tce\_tutorials.tar.gz">http://openasip.org/tutorial\_files/tce\_tutorials.tar.gz</a>



# Thank you!

TCE tutorial download: <a href="http://openasip.org/tutorial\_files/tce\_tutorials.tar.gz">http://openasip.org/tutorial\_files/tce\_tutorials.tar.gz</a>

Customized Parallel Computing (CPC) group www.tuni.fi/cpc www.openasip.org joonas.multanen@tuni.fi