Loading...
4 results
Search Results
Now showing 1 - 4 of 4
- Scalable unified transform architecture for advanced video coding embedded systemsPublication . Dias, Tiago; Lopez, Sebastian; Roma, Nuno; Sousa, LeonelA novel high throughput and scalable unified architecture for the computation of the transform operations in video codecs for advanced standards is presented in this paper. This structure can be used as a hardware accelerator in modern embedded systems to efficiently compute all the two-dimensional 4 x 4 and 2 x 2 transforms of the H.264/AVC standard. Moreover, its highly flexible design and hardware efficiency allows it to be easily scaled in terms of performance and hardware cost to meet the specific requirements of any given video coding application. Experimental results obtained using a Xilinx Virtex-5 FPGA demonstrated the superior performance and hardware efficiency levels provided by the proposed structure, which presents a throughput per unit of area relatively higher than other similar recently published designs targeting the H.264/AVC standard. Such results also showed that, when integrated in a multi-core embedded system, this architecture provides speedup factors of about 120x concerning pure software implementations of the transform algorithms, therefore allowing the computation, in real-time, of all the above mentioned transforms for Ultra High Definition Video (UHDV) sequences (4,320 x 7,680 @ 30 fps).
- High throughput and scalable architecture for unified transform coding in embedded H.264/AVC video coding systemsPublication . Dias, Tiago; Lopez, Sebastian; Roma, Nuno; Sousa, LeonelAn innovative high throughput and scalable multi-transform architecture for H.264/AVC is presented in this paper. This structure can be used as a hardware accelerator in modern embedded systems to efficiently compute the 4×4 forward/inverse integer DCT, as well as the 2-D 4×4 / 2×2 Hadamard transforms. Moreover, its highly flexible design and hardware efficiency allows it to be easily scaled in terms of performance and hardware cost to meet the specific requirements of any given video coding application. Experimental results obtained using a Xilinx Virtex-4 FPGA demonstrate the superior performance and hardware efficiency levels provided by the proposed structure, which presents a throughput per unit of area at least 1.8× higher than other similar recently published designs. Furthermore, such results also showed that this architecture can compute, in realtime, all the above mentioned H.264/AVC transforms for video sequences with resolutions up to UHDV.
- A Flexible Architecture for the Computation of Direct and Inverse Transforms in H.264/AVC Video CodecsPublication . Dias, Tiago; Lopez, S.; Roma, N.; Sousa, L.A new high throughput and scalable architecture for unified transform coding in H.264/AVC is proposed in this paper. Such flexible structure is capable of computing all the 4x4 and 2x2 transforms for Ultra High Definition Video (UHDV) applications (4320x7680@ 30fps) in real-time and with low hardware cost. These significantly high performance levels were proven with the implementation of several different configurations of the proposed structure using both FPGA and ASIC 90 nm technologies. In addition, such experimental evaluation also demonstrated the high area efficiency of theproposed architecture, which in terms of Data Throughput per Unit of Area (DTUA) is at least 1.5 times more efficient than its more prominent related designs(1).
- Hardware/software co-design of H.264/AVC encoders for multi-core embedded systemsPublication . Dias, Tiago; Roma, Nuno; Sousa, LeonelThis paper presents a multi-core H.264/AVC encoder suitable for implementations in small and medium complexity embedded systems. The proposed structure results from an efficient hardware/software co-design methodology, where the encoder software application is highly optimized and structured in a very modular and efficient manner, so as to allow its most complex and time consuming operations to be offloaded to dedicated hardware accelerators. The considered methodology adopts a simple and efficient core interconnection mechanism to easily allow the inclusion and the removal of such optimized processing cores. Experimental results obtained with the implementation in a Virtex4 FPGA of an H.264/AVC encoder using an ASIP IP core as a ME hardware accelerator have proven the advantages of this methodology. For the considered system, speedup factors greater than 15 were obtained with a very modest increase of the involved hardware resources.