W05 OSSMPIC - Open Source Solutions for Massively Parallel Integrated Circuits
W05.1 Session 1: Invited Talks
This session includes two invited talks
The Hidden Costs of Open-Source Hardware Research
Prof. Blaise Tine - UCLA
In this talk, Dr. Tine will delve into the challenges and trade-offs inherent in open-source hardware research. While open-source initiatives democratize hardware development and foster innovation, they also introduce hidden complexities that can impact adoption, collaboration, and long-term sustainability. Drawing from his extensive experience in open-source hardware and system design, Dr. Tine will shed light on these often-overlooked challenges and offer strategies for researchers and practitioners to navigate them effectively. Looking ahead, Dr. Tine will discuss emerging trends that could shape the future of open-source hardware research.
CGRAs for the Edge: Balancing Compute Efficiency and Flexibility
Prof. Henk Corporaal (Eindhoven University of Technology)
Driven by AI and advanced signal-processing developments we observe a huge increase of computational requirements. Not only in the cloud, but even more at the Edge. There are substantial advantages of performing computation locally at the edge, like less data traffic, performing the computation close to the sensing data, reliability, real-time feedback and data privacy. This drives a strong demand for smart Edge computing. Edge compute devices have limited resources, and therefore require high energy- and area-efficient computing. This naturally demands for highly specialized processors. However, high specialization typically means high development costs and lower volume. Much worse, it makes them inflexible; they cannot adapt to (late) application changes and code updates, which are very common in our fast moving (software) world. Coarse Grain Reconfigurable Architectures (CGRAs) may be the solution; they aim to find a good balance between flexibility and compute efficiency. They can be easily tuned and scaled for application domains, while staying flexible, especially when they are fully programmable. In this presentation, we give an overview of CGRAs and their recent developments. We more precisely define and characterize CGRAs. We also present a metric for flexibility. Designing CGRAs results into various challenges. We illustrate key concepts and challenges using the recent open-source R-Blocks CGRA as example. Finally, we conclude by offering a glimpse into the CGRA future, exploring potential breakthroughs on the horizon.
W05.2 Session 2: Open Source GPU Applications
In the context of the Horizon Europe project, METASAT, a hardware platform was developed as a prototype of future space systems.
The platform is based on a multiprocessor NOEL-V, an established space-grade processor, which is integrated with the SPARROW AI accelerator and connected to a GPU, Vortex. Both processing systems follow the RISC-V specification. This is a novel hardware architecture for the space domain as the use of massive parallel processing units, such as GPUs, is starting to be considered for upcoming space missions due to the increased performance required to future space-related workloads, in particular, related to AI. However, such solutions are only currently adopted for New Space, since their limitations come not only from the hardware, but also from the software, which needs to be qualified before being deployed on an institutional mission.
For this reason, the METASAT platform is one of the first endeavors towards enabling the use of high performance hardware in a qualifiable environment for safety critical systems. The software stack is based on baremetal, RTEMS and the XtratuM hypervisor, providing different options for applications of various degrees of criticality.
The platform has been tested with space-relevant AI workloads taking full advantage of the hardware resources, even when multiple tasks are sharing the GPU.
W05.2.1 GPGPUs on FPGAs: A Competitive Approach for Scientific Computing ?
FPGA architectures include increasingly complex arithmetic operators and optimized hard IPs, such as memory subsystems and Networks-on-Chip (NoC). This evolution leads to higher compute density also linked with high memory bandwidth. It represents an opportunity to tailor an architecture to niche application needs while being competitive with a costly ASIC implementation. More specifically, scientific computing requires high precision (> 32 bits) floating point computation. However, GPU vendors are progressively favoring low precision performance for AI needs, and are even phasing out support for 64-bit floating point compute. We present an analytical study motivating the need to investigate the implementation of an open source 64-bit GPGPU architecture on a state of the art FPGA, as an alternative to GPUs for scientific computing.
W05.3 Poster Session / Coffee Break
Evaluation of CGRA Toolchains
From Concept to Silicon: Rapid GPGPU Core Design and Integration with Open-Source ASIC Tools
Open-hardware GPUs as platforms for research: a feedback on the use of Vortex
Multiport Support for Vortex OpenGPU Memory Hierarchy
Benchmarking Floating Point Performance of Massively Parallel Dataflow Overlays on AMD Versal FPGA Compute Primitives
W05.4 Software and Tools
Hardware vs. Software Implementation of Warp-Level Features in Vortex RISC-V GPU
RISC-V GPUs present a promising path for supporting GPU applications. Traditionally, GPUs achieve high efficiency through the SPMD (Single Program Multiple Data) programming model. However, modern GPU programming increasingly relies on warp-level features, which diverge from the conventional SPMD paradigm. In this paper, we explore how RISC-V GPUs can support these warp-level features both through hardware implementation and via software-only approaches. Our evaluation shows that a hardware implementation achieves up to 4 times geomean IPC speedup in microbenchmarks, while softwarebased solutions provide a viable alternative for area-constrained scenarios.
Case Study on Combining Open-Source Tool Flows for Grids of Processing Cells
Massively parallel computer architectures based on identical microprocessor tiles are well known for their high scalability and performance. In this work, we introduce an opensource tool flow for scalable on-chip grids of RISC-V processor cells that seamlessly combines high-level SystemC modeling with the generation and simulation of hardware models at RTL down to FPGA implementation featuring the Chipyard framework. Our experimental evaluation quantifies the speed-accuracy trade-offs at different abstraction levels and compares them with their physical implementation on an FPGA.
W05.5 Invited Talks
X-HEEP + CGRAs + GPU work-in-progress activities
This talk presents an ongoing evaluation of a Very-Wide-Register Coarse-Grained Reconfigurable Arrays (CGRAs) and a RISC-V GPU for edge computing nodes within the ESL EPFL. We are currently performing an analysis of these architectures in TSMC 16nm technology, aiming to identify optimal solutions for diverse computational workloads. This work is still in progress, but we will discuss preliminary findings, and insights, including HW and SW extensions for the open-source Vortex GPGPU. Furthermore, we will present three more CGRA designs, two of which have also been fabricated in TSMC 65nm LP. We will present initial results, highlighting their performance characteristics and potential applications. All of these accelerators are being integrated within the X-HEEP platform, a versatile RISC-V microcontroller system. X-HEEP leverages a rich ecosystem of open-source IPs, including CPUs from the OpenHW Group, uncore IPs from the PULP platform and OpenTitan project, and custom IPs. X-HEEP enables seamless integration and rapid prototyping. We will discuss the integration process and demonstrate how X-HEEP facilitates the evaluation and deployment of custom accelerators.
ESP as an Open-Source Platform for Massively Parallel Integrated Circuits
Open-source hardware can play a unique role to spark interdisciplinary research across computer architecture, programming languages, operating systems and computer-aided design. Further, it can enable collaborative engineering among researchers in academic, industrial and government labs. ESP is an open-source research platform for SoC design that combines a scalable tile-based architecture, and a flexible system-level design methodology. With ESP, designers can rapidly prototype a SoC architecture with multiple RISC-V processor cores and dozens of loosely coupled accelerators, all interconnected with a multiplane network-on-chip. Conceived as a heterogeneous system integration platform, ESP can scale to support the realization of massively parallel integrated circuits and chiplet-based systems.