Systolic Array Accelerators for Transformers

Research Line

Accelerators

We propose a solution to the challenge of implementing transformer models on resource-constrained platforms due to their computational complexity and a large number of parameters. Our solution involves introducing tightly-coupled, small-scale systolic arrays (TiC-SATs) governed by dedicated ISA extensions to accelerate execution. We also employ software optimizations to maximize data reuse and lower miss rates across cache hierarchies. Our TiC-SAT framework is available as open-source.

Keywords
Systolic Array, Tightly-coupled Accelerators, Transformers

Team

  Ansaloni Giovanni
  Atienza Alonso David
  Medina Morillas Rafael

Research Partners

Logitech Europe SA Logitech Europe

Sources of Funding

Fvllmonti
WiPLASH H2020
ACCESS
Edge Companions
SwissChips


Our project aims to address the computational challenge posed by the massive size and large number of parameters of typical transformer implementations in artificial intelligence (AI) scenarios. Transformers, originally developed for natural language processing (NLP) tasks, are now widely used for various applications such as question answering, sentiment analysis, image classification, clinical note analysis, and speech-to-text generation.

To accelerate the inference of transformer models, we propose a novel strategy called TiC-SAT (Tightly-Coupled Systolic Array Accelerators for Transformers). TiC-SATs are integrated into CPUs as custom functional units governed by dedicated instructions, avoiding the need for dedicated scratchpad memories and reducing resource consumption. Moreover, TiC-SATs leverage software optimizations that increase data locality, taking advantage of available resources in cache hierarchies without disrupting locality when transitioning from accelerated to non-accelerated computation segments.

To validate our strategy, we implement TiC-SAT as a parametric module in the gem5-X full system simulation environment and conduct comprehensive explorations across various SA sizes and benchmark applications. Our contributions include showcasing how SA accelerators can be integrated into computing systems, enabling full-system and application-wide explorations, and highlighting how tightly-coupled lightweight SAs, such as TiC-SATs, can aptly exploit software optimizations to improve data locality and performance. We also assess the benefits of small-scale, tightly-coupled SAs for accelerating inference in transformer models, considering different TiC-SAT sizes and benchmark applications.

Find us on github:
https://github.com/gem5-X/TiC-SAT





Related Publications

HEEPstor: an Open-Hardware Co-design Framework for Quantized Machine Learning at the Edge
Palacios Almendros, Pedro; Medina Morillas, Rafael; Ansaloni, Giovanni; Atienza Alonso, David
2025Computer Frontiers Workshop on Open-Source HardwarePublication funded by Edge Companions (Edge Companions)Publication funded by Fvllmonti ((FETPROACT))Publication funded by ACCESS ()Publication funded by SwissChips (SwissChips - State Secretariat for Education, Research and Innovation)
Accelerator-driven Data Arrangement to Minimize Transformers Run-time on Multi-core Architectures
Amirshahi, Alireza; Ansaloni, Giovanni; Atienza Alonso, David
2024-01-18Conference PaperPublication funded by Fvllmonti ((FETPROACT))Publication funded by ACCESS ()
TiC-SAT: Tightly-coupled Systolic Accelerator for Transformers
Amirshahi, Alireza; Klein, Joshua Alexander Harrison; Ansaloni, Giovanni; Atienza Alonso, David
2023-01-16Conference PaperPublication funded by WiPLASH H2020 (New on-chip wireless communication plane)Publication funded by Fvllmonti ((FETPROACT))