Repository logo
 

Search Results

Now showing 1 - 3 of 3
  • Fast convolutional neural networks in low density FPGAs using zero-skipping and weight pruning
    Publication . Véstias, Mário; Duarte, Rui Policarpo; De Sousa, Jose; Cláudio de Campos Neto, Horácio
    Edge devices are becoming smarter with the integration of machine learning methods, such as deep learning, and are therefore used in many application domains where decisions have to be made without human intervention. Deep learning and, in particular, convolutional neural networks (CNN) are more efficient than previous algorithms for several computer vision applications such as security and surveillance, where image and video analysis are required. This better efficiency comes with a cost of high computation and memory requirements. Hence, running CNNs in embedded computing devices is a challenge for both algorithm and hardware designers. New processing devices, dedicated system architectures and optimization of the networks have been researched to deal with these computation requirements. In this paper, we improve the inference execution times of CNNs in low density FPGAs (Field-Programmable Gate Arrays) using fixed-point arithmetic, zero-skipping and weight pruning. The developed architecture supports the execution of large CNNs in FPGA devices with reduced on-chip memory and computing resources. With the proposed architecture, it is possible to infer an image in AlexNet in 2.9 ms in a ZYNQ7020 and 1.0 ms in a ZYNQ7045 with less than 1% accuracy degradation. These results improve previous state-of-the-art architectures for CNN inference.
  • ZX Fusion: A ZX Spectrum Implementation on an FPGA with Modern Peripherals.
    Publication . Jacinto, Gustavo; Duarte, Rui Policarpo
    The ZX Spectrum was a popular 8-bit home computer by Sinclair Research in the 1980s. Even though some of these computers may still work, the audio tapes, the TV with an analog tuner, and the micro-switch joystick that were used with the original ZX Spectrum are outdated and hard to find in good working order or to replicate. As many other old closed systems are also very difficult to update to support modern peripherals there is a necessity to provide a methodology to adapt such systems to support new peripherals while being compatible with existing software. This implementation is a means by which to validate the methodology before applying it to a physical system. The work proposed in this paper focused on recreating a ZX Spectrum+/48K computer and interfacing it with modern peripherals on an FPGA. This was accomplished by adding a co-processor to assist with the control of the more complex peripherals. Otherwise, the original system would require complex architectural changes and would perform poorly due to the low performance of the Z80 CPU. This work distanced itself from previous works on emulating a ZX Spectrum, as it focused on the use of different upgraded peripherals and the use of a NIOS II soft processor as a co-processor to manage the SD card accesses and save-state functionality. A demonstration of the proposed modernized architecture was made by successfully running a diagnostics ROM and playing original ZX Spectrum games from an SD card for games with a PS/2 keyboard and a pair of joysticks.
  • Configurable hardware core for IoT object detection
    Publication . Miranda, Pedro R.; Pestana, Daniel; D. Lopes, João; Duarte, Rui Policarpo; Véstias, Mário; Neto, Horácio C; De Sousa, Jose
    Object detection is an important task for many applications, like transportation, security, and medical applications. Many of these applications are needed on edge devices to make local decisions. Therefore, it is necessary to provide low-cost, fast solutions for object detection. This work proposes a configurable hardware core on a field-programmable gate array (FPGA) for object detection. The configurability of the core allows its deployment on target devices with diverse hardware resources. The object detection accelerator is based on YOLO, for its good accuracy at moderate computational complexity. The solution was applied to the design of a core to accelerate the Tiny-YOLOv3, based on a CNN developed for constrained environments. However, it can be applied to other YOLO versions. The core was integrated into a full system-on-chip solution and tested with the COCO dataset. It achieved a performance from 7 to 14 FPS in a low-cost ZYNQ7020 FPGA, depending on the quantization, with an accuracy reduction from 2.1 to 1.4 points of mAP50.