FPGA-based heterogeneous systems are a popular choice for accelerating Deep Neural Networks (DNNs), but efficiently integrating and orchestrating HW and SW tasks remains challenging. FPGA overlay architectures have been proposed to simplify accelerator management, yet state-of-the-art solutions struggle with performance bottlenecks caused by frequent CPU-FPGA interactions. We introduce a novel overlay-based methodology enabling the Proxy Computing paradigm, leveraging a local orchestrator and shared memory to (i) reduce accelerator control overhead and (ii) minimize unnecessary data movements. As a case study, we integrate the AMD/Xilinx Deep Learning Processing Unit (DPU) with additional accelerators for unsupported layers. Experiments show that our approach significantly reduces memory transfers, achieving up to 4 × speed up in the proposed case study.
Enabling the Proxy Computing Paradigm on DPU-based FPGA Acceleration / Brilli, Gianluca; Capotondi, Alessandro; Burgio, Paolo; Marongiu, Andrea. - 1:(2025), pp. 80-83. ( 22nd ACM International Conference on Computing Frontiers 2025, CF 2025 ita 2025) [10.1145/3719276.3725192].
Enabling the Proxy Computing Paradigm on DPU-based FPGA Acceleration
Brilli, Gianluca;Capotondi, Alessandro;Burgio, Paolo;Marongiu, Andrea
2025
Abstract
FPGA-based heterogeneous systems are a popular choice for accelerating Deep Neural Networks (DNNs), but efficiently integrating and orchestrating HW and SW tasks remains challenging. FPGA overlay architectures have been proposed to simplify accelerator management, yet state-of-the-art solutions struggle with performance bottlenecks caused by frequent CPU-FPGA interactions. We introduce a novel overlay-based methodology enabling the Proxy Computing paradigm, leveraging a local orchestrator and shared memory to (i) reduce accelerator control overhead and (ii) minimize unnecessary data movements. As a case study, we integrate the AMD/Xilinx Deep Learning Processing Unit (DPU) with additional accelerators for unsupported layers. Experiments show that our approach significantly reduces memory transfers, achieving up to 4 × speed up in the proposed case study.Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris




