Loading...
3 results
Search Results
Now showing 1 - 3 of 3
- Lite-CNN: a high-performance architecture to execute CNNs in low density FPGAsPublication . Véstias, Mário; Duarte, Rui; De Sousa, Jose; Cláudio de Campos Neto, HorácioDue to the computational complexity of Convolutional Neural Networks (CNNs), high performance platforms are generally considered for their execution. However, CNNs are very useful in embedded systems and its execution right next to the source of data has many advantages, like avoiding the need for data communication. In this paper, we propose an architecture for CNN inference (Lite-CNN) that can achieve high performance in low density FPGAs. Lite-CNN adopts a fixed-point representation for both neurons and weights, which was already shown to be sufficient for most CNNs. Also, with a simple and known dot product reorganization, the number of multiplications is reduced to half. We show implementation results for 8 bit fixed-point in a ZYNQ7020 and extrapolate for other larger FPGAs. Lite-CNN achieves 410 GOPs in a ZYNQ7020.
- A survey of convolutional neural networks on edge with reconfigurable computingPublication . Véstias, MárioThe convolutional neural network (CNN) is one of the most used deep learning models for image detection and classification, due to its high accuracy when compared to other machine learning algorithms. CNNs achieve better results at the cost of higher computing and memory requirements. Inference of convolutional neural networks is therefore usually done in centralized high-performance platforms. However, many applications based on CNNs are migrating to edge devices near the source of data due to the unreliability of a transmission channel in exchanging data with a central server, the uncertainty about channel latency not tolerated by many applications, security and data privacy, etc. While advantageous, deep learning on edge is quite challenging because edge devices are usually limited in terms of performance, cost, and energy. Reconfigurable computing is being considered for inference on edge due to its high performance and energy efficiency while keeping a high hardware flexibility that allows for the easy adaption of the target computing platform to the CNN model. In this paper, we described the features of the most common CNNs, the capabilities of reconfigurable computing for running CNNs, the state-of-the-art of reconfigurable computing implementations proposed to run CNN models, as well as the trends and challenges for future edge reconfigurable platforms.
- A configurable architecture for running hybrid convolutional neural networks in low-density FPGAsPublication . Véstias, Mário; Duarte, Rui; De Sousa, Jose; Cláudio de Campos Neto, HorácioConvolutional neural networks have become the state of the art of machine learning for a vast set of applications, especially for image classification and object detection. There are several advantages to running inference on these models at the edge, including real-time performance and data privacy. The high computing and memory requirements of convolutional neural networks have been major obstacles to the broader deployment of CNNs on edge devices. Data quantization is an optimization method that reduces the number of bits used to represent weights and activations of a network model, minimizing storage requirements and computing complexity. Quantization can be applied at the layer level, by using different bit widths in different layers: this is called hybrid quantization. This article proposes a new efficient and configurable architecture for running CNNs with hybrid quantization in low-density Field-Programmable Gate Arrays (FPGAs) targeting edge devices. The architecture has been implemented on the Xilinx ZYNQ7020/45 devices and is running the AlexNet and VGG16 networks. Running AlexNet, the architecture has a throughput up to 508 images per second on the ZYNQ7020 device, and 1639 images per second on the ZYNQ7045 device. Considering VGG16, the architecture delivers up to 43 images per second on the ZYNQ7020 device, and 81 images per second on the ZYNQ7045 device. The proposed hybrid architecture achieves up to 13.7 x improvement in performance compared to state-of-the-art solutions, with small accuracy degradation.