Repository logo
 
Loading...
Profile Picture
Person

Cláudio de Campos Neto, Horácio

Search Results

Now showing 1 - 4 of 4
  • Fast convolutional neural networks in low density FPGAs using zero-skipping and weight pruning
    Publication . Véstias, Mário; Duarte, Rui Policarpo; De Sousa, Jose; Cláudio de Campos Neto, Horácio
    Edge devices are becoming smarter with the integration of machine learning methods, such as deep learning, and are therefore used in many application domains where decisions have to be made without human intervention. Deep learning and, in particular, convolutional neural networks (CNN) are more efficient than previous algorithms for several computer vision applications such as security and surveillance, where image and video analysis are required. This better efficiency comes with a cost of high computation and memory requirements. Hence, running CNNs in embedded computing devices is a challenge for both algorithm and hardware designers. New processing devices, dedicated system architectures and optimization of the networks have been researched to deal with these computation requirements. In this paper, we improve the inference execution times of CNNs in low density FPGAs (Field-Programmable Gate Arrays) using fixed-point arithmetic, zero-skipping and weight pruning. The developed architecture supports the execution of large CNNs in FPGA devices with reduced on-chip memory and computing resources. With the proposed architecture, it is possible to infer an image in AlexNet in 2.9 ms in a ZYNQ7020 and 1.0 ms in a ZYNQ7045 with less than 1% accuracy degradation. These results improve previous state-of-the-art architectures for CNN inference.
  • Parallel dot-products for deep learning on FPGA
    Publication . Véstias, Mário; Duarte, Rui; De Sousa, Jose; Cláudio de Campos Neto, Horácio
    Deep neural networks have recently shown great results in a vast set of image applications. The associated deep learning models are computationally very demanding and, therefore, several hardware solutions have been proposed to accelerate their computation. FPGAs have recently shown very good performances for these kind of applications and so it is considered a promising platform to accelerate the execution of deep learning algorithms. A common operation in these algorithms is multiply-accumulate (MACC) that is used to calculate dot-products. Since many dot products can be calculated in parallel, as long as memory bandwidth is available, it is very important to implement this operation very efficiently to increase the density of MACC units in an FPGA. In this paper, we propose an implementation of parallel MACC units in FPGA for dot-product operations with very high performance/area ratios using a mix of DSP blocks and LUTs. We consider fixed-point representations with 8 bits of size, but the method can be applied to other bit widths. The method allows us to achieve TOPs performances, even for low cost FPGAs.
  • Improving the area of fast parallel decimal multipliers
    Publication . Véstias, Mário; Cláudio de Campos Neto, Horácio
    Financial and commercial applications depend on decimal arithmetic because they must produce results that match exactly those obtained by human calculations. Decimal multiplication is a frequently used operation in these applications and also in the design of decimal floating-point units. In this paper we propose a new architecture for parallel decimal multiplication that improves the area of previous decimal multipliers while keeping the best performances. A decimal adder [1] based on a mixed BCD/excess-6 representation of the operands is utilized. A new partial product generation unit is proposed based on a 5221 recoding of the multiplier digits. With the proposed multiplier, we are able to improve on state-of-the-art parallel decimal multipliers targeting LUT-6 FPGAs. Compared to previous decimal multipliers, implementation results for 2, 4, 8, 16, 32 and 34-digits show that the proposed multiplier achieves over 20% better area without performance degradation.
  • Design of a Multiband Full-Rate Ultra-Wideband Receiver in FPGA
    Publication . Véstias, Mário; Cláudio de Campos Neto, Horácio; Sarmento, Helena
    MultiBand OFDM (MB-OFDM) UWB [1] is a short-range promising wireless technology for high data rate communications up to 480 Mbps. In this paper, we have designed and implemented in an Virtex-6 FPGA an MB-OFDM UWB receiver for the highest data rate of 480 Mbps. To test the system, we have also implemented an MB-OFDM transmitter and an AWGN generator in VHDL and determined the bit error rates at the receiver running in an FPGA.