Parallel dot-products for deep learning on FPGA

Véstias, Mário; Duarte, Rui; De Sousa, Jose; Cláudio de Campos Neto, Horácio

http://hdl.handle.net/10400.21/9807

Use this identifier to reference this record.

Name:	Description:	Size:	Format:
Parallel_MVestias.pdf		544.44 KB	Adobe PDF	Download

Send Feedback

Authors

Véstias, Mário

Duarte, Rui

De Sousa, Jose

Cláudio de Campos Neto, Horácio

Abstract(s)

Deep neural networks have recently shown great results in a vast set of image applications. The associated deep learning models are computationally very demanding and, therefore, several hardware solutions have been proposed to accelerate their computation. FPGAs have recently shown very good performances for these kind of applications and so it is considered a promising platform to accelerate the execution of deep learning algorithms. A common operation in these algorithms is multiply-accumulate (MACC) that is used to calculate dot-products. Since many dot products can be calculated in parallel, as long as memory bandwidth is available, it is very important to implement this operation very efficiently to increase the density of MACC units in an FPGA. In this paper, we propose an implementation of parallel MACC units in FPGA for dot-product operations with very high performance/area ratios using a mix of DSP blocks and LUTs. We consider fixed-point representations with 8 bits of size, but the method can be applied to other bit widths. The method allows us to achieve TOPs performances, even for low cost FPGAs.

Keywords

Multiply-accumulate Deep learning FPGA Multiplicar-acumular

URI

http://hdl.handle.net/10400.21/9807

Citation

VÉSTIAS, Mário; [et al] – Parallel dot-products for deep learning on FPGA. In 2017 27th International Conference on Field Programmable Logic and Applications (FPL). Ghent, Belgium: IEEE, 2017. ISBN 978-9-0903-0428-1. Pp. 1-4