Loading...
5 results
Search Results
Now showing 1 - 5 of 5
- Smart embedded system for skin cancer classificationPublication . Durães, Pedro F. F.; Véstias, MárioThe very good results achieved with recent algorithms for image classification based on deep learning have enabled new applications in many domains. The medical field is one that can greatly benefit from these algorithms in order to help the medical professional elaborate on his/her diagnostic. In particular, portable devices for medical image classification are useful in scenarios where a full analysis system is not an option or is difficult to obtain. Algorithms based on deep learning models are computationally demanding; therefore, it is difficult to run them in low-cost devices with a low energy consumption and high efficiency. In this paper, a low-cost system is proposed to classify skin cancer images. Two approaches were followed to achieve a fast and accurate system. At the algorithmic level, a cascade inference technique was considered, where two models were used for inference. At the architectural level, the deep learning processing unit from Vitis-AI was considered in order to design very efficient accelerators in FPGA. The dual model was trained and implemented for skin cancer detection in a ZYNQ UltraScale+ MPSoC ZCU104 evaluation kit with a ZU7EV device. The core was integrated in a full system-on-chip solution and tested with the HAM10000 dataset. It achieves a performance of 13.5 FPS with an accuracy of 87%, with only 33k LUTs, 80 DSPs, 70 BRAMs and 1 URAM.
- A fast and scalable architecture to run convolutional neural networks in low density FPGAsPublication . Véstias, Mário; Duarte, Rui; De Sousa, Jose; Neto, Horácio CDeep learning and, in particular, convolutional neural networks (CNN) achieve very good results on several computer vision applications like security and surveillance, where image and video analysis are required. These networks are quite demanding in terms of computation and memory and therefore are usually implemented in high-performance computing platforms or devices. Running CNNs in embedded platforms or devices with low computational and memory resources requires a careful optimization of system architectures and algorithms to obtain very efficient designs. In this context, Field Programmable Gate Arrays (FPGA) can achieve this efficiency since the programmable hardware fabric can be tailored for each specific network. In this paper, a very efficient configurable architecture for CNN inference targeting any density FPGAs is described. The architecture considers fixed-point arithmetic and image batch to reduce computational, memory and memory bandwidth requirements without compromising network accuracy. The developed architecture supports the execution of large CNNs in any FPGA devices including those with small on-chip memory size and logic resources. With the proposed architecture, it is possible to infer an image in AlexNet in 4.3 ms in a ZYNQ7020 and 1.2 ms in a ZYNQ7045.
- Moving deep learning to the edgePublication . Véstias, Mário; Duarte, Rui Policarpo; De Sousa, Jose; Neto, Horácio CDeep learning is now present in a wide range of services and applications, replacing and complementing other machine learning algorithms. Performing training and inference of deep neural networks using the cloud computing model is not viable for applications where low latency is required. Furthermore, the rapid proliferation of the Internet of Things will generate a large volume of data to be processed, which will soon overload the capacity of cloud servers. One solution is to process the data at the edge devices themselves, in order to alleviate cloud server workloads and improve latency. However, edge devices are less powerful than cloud servers, and many are subject to energy constraints. Hence, new resource and energy-oriented deep learning models are required, as well as new computing platforms. This paper reviews the main research directions for edge computing deep learning algorithms.
- Efficient design of pruned convolutional neural networks on FPGAPublication . Véstias, MárioConvolutional Neural Networks (CNNs) have improved several computer vision applications, like object detection and classification, when compared to other machine learning algorithms. Running these models in edge computing devices close to data sources is attracting the attention of the community since it avoids high-latency data communication of private data for cloud processing and permits real-time decisions turning these systems into smart embedded devices. Running these models is computationally very demanding and requires a large amount of memory, which are scarce in edge devices compared to a cloud center. In this paper, we proposed an architecture for the inference of pruned convolutional neural networks in any density FPGAs. A configurable block pruning method is proposed together with an architecture that supports the efficient execution of pruned networks. Also, pruning and batching are studied together to determine how they influence each other. With the proposed architecture, we run the inference of a CNN with an average performance of 322 GOPs for 8-bit data in a XC7Z020 FPGA. The proposed architecture running AlexNet processes 240 images/s in a ZYNQ7020 and 775 images/s in a ZYNQ7045 with only 1.2% accuracy degradation.
- A configurable architecture for running hybrid convolutional neural networks in low-density FPGAsPublication . Véstias, Mário; Duarte, Rui; De Sousa, Jose; Cláudio de Campos Neto, HorácioConvolutional neural networks have become the state of the art of machine learning for a vast set of applications, especially for image classification and object detection. There are several advantages to running inference on these models at the edge, including real-time performance and data privacy. The high computing and memory requirements of convolutional neural networks have been major obstacles to the broader deployment of CNNs on edge devices. Data quantization is an optimization method that reduces the number of bits used to represent weights and activations of a network model, minimizing storage requirements and computing complexity. Quantization can be applied at the layer level, by using different bit widths in different layers: this is called hybrid quantization. This article proposes a new efficient and configurable architecture for running CNNs with hybrid quantization in low-density Field-Programmable Gate Arrays (FPGAs) targeting edge devices. The architecture has been implemented on the Xilinx ZYNQ7020/45 devices and is running the AlexNet and VGG16 networks. Running AlexNet, the architecture has a throughput up to 508 images per second on the ZYNQ7020 device, and 1639 images per second on the ZYNQ7045 device. Considering VGG16, the architecture delivers up to 43 images per second on the ZYNQ7020 device, and 81 images per second on the ZYNQ7045 device. The proposed hybrid architecture achieves up to 13.7 x improvement in performance compared to state-of-the-art solutions, with small accuracy degradation.