Rui Policarpo Duarte

Search Results

Now showing 1 - 10 of 16

Hyperspectral compressive sensing - a low power consumption approach
Publication . Nascimento, Jose; Véstias, Mário; Duarte, Rui
Hyperspectral imaging instruments allow data collection in hundreds of spectral bands for the same area on the surface of the Earth. The resulting multidimensional data cube typically comprises several GBs per light. Due to the extremely large volumes of data collected by imaging spectrometers, hyperspectral data compression, dimensionality reduction and Compressive Sensing (CS) techniques has received considerable interest in recent years. These data are usually acquired by a satellite or an airbone instrument and sent to a ground station on Earth for subsequent processing. Usually the bandwidth connection between the satellite/airborne platform and the ground station is reduced, which limits the amount of data that can be transmitted. As a result, there is a clear need for (either lossless or lossy) hyperspectral data compression techniques that can be applied on-board the imaging instrument. This paper, presents a study of the power and time consumption and accuracy of a parallel implementation for a spectral compressive acquisition method on a Jetson TX2 platform, which is well suited to perform vector operations such as dot products. This implementation exploits the architecture at low level, using shared memory and coalesced accesses to memory. The conducted experiments have been performed to demonstrate the applicability, in terms of accuracy, time consuming and power consumption of these methods for onboard processing. The results show that by using this low power consumption GPU is it possible to obtain real-time performance with a very limited power requirement.
2018Conference object Restricted access Show more
A review of synthetic-aperture radar image formation algorithms and implementations: a computational perspective
Publication . Cruz, Helena; Véstias, Mário; Monteiro, J; Cláudio de Campos Neto, Horácio; Duarte, Rui
Designing synthetic-aperture radar image formation systems can be challenging due to the numerous options of algorithms and devices that can be used. There are many SAR image formation algorithms, such as backprojection, matched-filter, polar format, Range–Doppler and chirp scaling algorithms. Each algorithm presents its own advantages and disadvantages considering efficiency and image quality; thus, we aim to introduce some of the most common SAR image formation algorithms and compare them based on these two aspects. Depending on the requisites of each individual system and implementation, there are many device options to choose from, for in stance, FPGAs, GPUs, CPUs, many-core CPUs, and microcontrollers. We present a review of the state of the art of SAR imaging systems implementations. We also compare such implementations in terms of power consumption, execution time, and image quality for the different algorithms used.
2022-03-04Journal article Open access Show more
Hybrid dot-product calculation for convolutional neural networks in FPGA
Publication . Véstias, Mário; Duarte, Rui Policarpo; De Sousa, Jose; Cláudio de Campos Neto, Horácio
Convolutional Neural Networks (CNN) are quite useful in edge devices for security, surveillance, and many others. Running CNNs in embedded devices is a design challenge since these models require high computing power and large memory storage. Data quantization is an optimization technique applied to CNN to reduce the computing and memory requirements. The method reduces the number of bits used to represent weights and activations, which consequently reduces the size of operands and of the memory. The method is more effective if hybrid quantization is considered in which data in different layers may have different bit widths. This article proposes a new hardware module to calculate dot-products of CNNs with hybrid quantization. The module improves the implementation of CNNs in low density FPGAs, where the same module runs dot-products of different layers with different data quantizations. We show implementation results in ZYNQ7020 and compare with state-of-the-art works. Improvements in area and performance are achieved with the new proposed module.
2019-11-07Conference object Restricted access Show more
Onboard processing of synthetic aperture radar backprojection algorithm in FPGA
Publication . Mota, David; Cruz, Helena; Miranda, Pedro R.; Duarte, Rui Policarpo; De Sousa, Jose; Cláudio de Campos Neto, Horácio; Véstias, Mário
Synthetic aperture radar is a microwave technique to extracting image information of the target. Electromagnetic waves that are reflected from the target are acquired by the aircraft or satellite receivers and sent to a ground station to be processed by applying computational demanding algorithms. Radar data streams are acquired by an aircraft or satellite and sent to a ground station to be processed in order to extract images from the data since these processing algorithms are computationally demanding. However, novel applications require real-time processing for real-time analysis and decisions and so onboard processing is necessary. Running computationally demanding algorithms on onboard embedded systems with limited energy and computational capacity is a challenge. This article proposes a configurable hardware core for the execution of the backprojection algorithm with high performance and energy efficiency. The original backprojection algorithm is restructured to expose computational parallelism and then optimized by replacing floating-point with fixed-point arithmetic. The backprojection core was integrated into a system-onchip architecture and implemented in a field-programmable gate array. The proposed solution runs the optimized backprojection algorithm over images of sizes 512 x 512 and 1024 x 1024 in 0.14 s (0.41 J) and 1.11 s (3.24 J), respectively. The architecture is 2.6x faster and consumes 13x less energy than an embedded Jetson TX2 GPU. The solution is scalable and, therefore, a tradeoff exists between performance and utilization of resources.
2022-04-25Journal article Open access Show more
Lite-CNN: a high-performance architecture to execute CNNs in low density FPGAs
Publication . Véstias, Mário; Duarte, Rui; De Sousa, Jose; Cláudio de Campos Neto, Horácio
Due to the computational complexity of Convolutional Neural Networks (CNNs), high performance platforms are generally considered for their execution. However, CNNs are very useful in embedded systems and its execution right next to the source of data has many advantages, like avoiding the need for data communication. In this paper, we propose an architecture for CNN inference (Lite-CNN) that can achieve high performance in low density FPGAs. Lite-CNN adopts a fixed-point representation for both neurons and weights, which was already shown to be sufficient for most CNNs. Also, with a simple and known dot product reorganization, the number of multiplications is reduced to half. We show implementation results for 8 bit fixed-point in a ZYNQ7020 and extrapolate for other larger FPGAs. Lite-CNN achieves 410 GOPs in a ZYNQ7020.
2018-08Conference object Open access Show more
Energy-efficient and real-time wearable for wellbeing-monitoring IoT system based on SoC-FPGA
Publication . Frutuoso, Maria Inês; Cláudio de Campos Neto, Horácio; Véstias, Mário; Duarte, Rui Policarpo
Wearable devices used for personal monitoring applications have been improved over the last decades. However, these devices are limited in terms of size, processing capability and power consumption. This paper proposes an efficient hardware/software embedded system for monitoring bio-signals in real time, including a heart rate calculator using PPG and an emotion classifier from EEG. The system is suitable for outpatient clinic applications requiring data transfers to external medical staff. The proposed solution contributes with an effective alternative to the traditional approach of processing bio-signals offline by proposing a SoC-FPGA based system that is able to fully process the signals locally at the node. Two sub-systems were developed targeting a Zynq 7010 device and integrating custom hardware IP cores that accelerate the processing of the most complex tasks. The PPG sub-system implements an autocorrelation peak detection algorithm to calculate heart rate values. The EEG sub-system consists of a KNN emotion classifier of preprocessed EEG features. This work overcomes the processing limitations of microcontrollers and general-purpose units, presenting a scalable and autonomous wearable solution with high processing capability and real-time response.
2023-03-04Journal article Open access Show more
Fast convolutional neural networks in low density FPGAs using zero-skipping and weight pruning
Publication . Véstias, Mário; Duarte, Rui Policarpo; De Sousa, Jose; Cláudio de Campos Neto, Horácio
Edge devices are becoming smarter with the integration of machine learning methods, such as deep learning, and are therefore used in many application domains where decisions have to be made without human intervention. Deep learning and, in particular, convolutional neural networks (CNN) are more efﬁcient than previous algorithms for several computer vision applications such as security and surveillance, where image and video analysis are required. This better efﬁciency comes with a cost of high computation and memory requirements. Hence, running CNNs in embedded computing devices is a challenge for both algorithm and hardware designers. New processing devices, dedicated system architectures and optimization of the networks have been researched to deal with these computation requirements. In this paper, we improve the inference execution times of CNNs in low density FPGAs (Field-Programmable Gate Arrays) using ﬁxed-point arithmetic, zero-skipping and weight pruning. The developed architecture supports the execution of large CNNs in FPGA devices with reduced on-chip memory and computing resources. With the proposed architecture, it is possible to infer an image in AlexNet in 2.9 ms in a ZYNQ7020 and 1.0 ms in a ZYNQ7045 with less than 1% accuracy degradation. These results improve previous state-of-the-art architectures for CNN inference.
2019-11-09Journal article Open access Show more
Parallel dot-products for deep learning on FPGA
Publication . Véstias, Mário; Duarte, Rui; De Sousa, Jose; Cláudio de Campos Neto, Horácio
Deep neural networks have recently shown great results in a vast set of image applications. The associated deep learning models are computationally very demanding and, therefore, several hardware solutions have been proposed to accelerate their computation. FPGAs have recently shown very good performances for these kind of applications and so it is considered a promising platform to accelerate the execution of deep learning algorithms. A common operation in these algorithms is multiply-accumulate (MACC) that is used to calculate dot-products. Since many dot products can be calculated in parallel, as long as memory bandwidth is available, it is very important to implement this operation very efficiently to increase the density of MACC units in an FPGA. In this paper, we propose an implementation of parallel MACC units in FPGA for dot-product operations with very high performance/area ratios using a mix of DSP blocks and LUTs. We consider fixed-point representations with 8 bits of size, but the method can be applied to other bit widths. The method allows us to achieve TOPs performances, even for low cost FPGAs.
2017-10-05Conference object Restricted access Show more
Coarse-grained reconfigurable computing with the versat architecture
Publication . D. Lopes, João; Véstias, Mário; Duarte, Rui Policarpo; Neto, Horácio C; De Sousa, Jose
Reconfigurable computing architectures allow the adaptation of the underlying datapath to the algorithm. The granularity of the datapath elements and data width determines the granularity of the architecture and its programming flexibility. Coarse-grained architectures have shown the right balance between programmability and performance. This paper provides an overview of coarse-grained reconfigurable architectures and describes Versat, a Coarse-Grained Reconfigurable Array (CGRA) with self-generated partial reconfiguration, presented as a case study for better understanding these architectures. Unlike most of the existing approaches, which mainly use pre-compiled configurations, a Versat program can generate and apply myriads of on-the-fly configurations. Partial reconfiguration plays a central role in this approach, as it speeds up the generation of incrementally different configurations. The reconfigurable array has a complete graph topology, which yields unprecedented programmability, including assembly programming. Besides being useful for optimising programs, assembly programming is invaluable for working around post-silicon hardware, software, or compiler issues. Results on core area, frequency, power, and performance running different codes are presented and compared to other implementations.
2021-03Journal article Open access Show more
ZX Fusion: A ZX Spectrum Implementation on an FPGA with Modern Peripherals.
Publication . Jacinto, Gustavo; Duarte, Rui Policarpo
The ZX Spectrum was a popular 8-bit home computer by Sinclair Research in the 1980s. Even though some of these computers may still work, the audio tapes, the TV with an analog tuner, and the micro-switch joystick that were used with the original ZX Spectrum are outdated and hard to find in good working order or to replicate. As many other old closed systems are also very difficult to update to support modern peripherals there is a necessity to provide a methodology to adapt such systems to support new peripherals while being compatible with existing software. This implementation is a means by which to validate the methodology before applying it to a physical system. The work proposed in this paper focused on recreating a ZX Spectrum+/48K computer and interfacing it with modern peripherals on an FPGA. This was accomplished by adding a co-processor to assist with the control of the more complex peripherals. Otherwise, the original system would require complex architectural changes and would perform poorly due to the low performance of the Z80 CPU. This work distanced itself from previous works on emulating a ZX Spectrum, as it focused on the use of different upgraded peripherals and the use of a NIOS II soft processor as a co-processor to manage the SD card accesses and save-state functionality. A demonstration of the proposed modernized architecture was made by successfully running a diagnostics ROM and playing original ZX Spectrum games from an SD card for games with a PS/2 keyboard and a pair of joysticks.
2024-01-22Journal article Open access Show more

Duarte, Rui

Filters

Author

Subject

Date

Entity

Settings

Sort By

Results per page

Search Results