back

Accelerating Sparse Matrix Multiplication on FPGAs

Sparse matrix multiplication (SpGEMM) is a fundamental operation in scientific computing, graph analytics, and machine learning. However, its irregular memory access patterns and high computational complexity make it challenging to accelerate efficiently. In this project, I explored FPGA-based acceleration for SpGEMM using High-Level Synthesis (HLS), focusing on different dataflows and sparse storage formats to optimize performance.

Sparse matrix multiplications differ from dense operations due to the need to handle zero values efficiently. The major challenges include:

To address these challenges, I implemented and compared multiple SpGEMM dataflows on an FPGA:

I implemented SpGEMM on an FPGA using Vivado HLS, targeting the Xilinx Arty A7 board. While the implementation showed promising potential, a few challenges arose. There was an issue with understanding how the AXI master interface works, leading to incorrect mem access. This was my first HLS project, but since the Arty A7 lacked an SoC, it was not possible to run a full application on the FPGA.

For a deeper dive into the design, implementation, and challenges, check out the full paper here. If you have any insights or suggestions, feel free to reach out—I’d love to discuss optimizations for sparse computations on FPGAs!