Repository logo
 
Publication

Algorithm-oriented design of efficient many-core architectures applied to dense matrix multiplication

dc.contributor.authorJosé, Wilson M.
dc.contributor.authorSilva, Ana Rita
dc.contributor.authorVéstias, Mário
dc.contributor.authorNeto, Horácio
dc.date.accessioned2016-02-25T15:07:51Z
dc.date.available2016-02-25T15:07:51Z
dc.date.issued2015-01
dc.description.abstractRecent integrated circuit technologies have opened the possibility to design parallel architectures with hundreds of cores on a single chip. The design space of these parallel architectures is huge with many architectural options. Exploring the design space gets even more difficult if, beyond performance and area, we also consider extra metrics like performance and area efficiency, where the designer tries to design the architecture with the best performance per chip area and the best sustainable performance. In this paper we present an algorithm-oriented approach to design a many-core architecture. Instead of doing the design space exploration of the many core architecture based on the experimental execution results of a particular benchmark of algorithms, our approach is to make a formal analysis of the algorithms considering the main architectural aspects and to determine how each particular architectural aspect is related to the performance of the architecture when running an algorithm or set of algorithms. The architectural aspects considered include the number of cores, the local memory available in each core, the communication bandwidth between the many-core architecture and the external memory and the memory hierarchy. To exemplify the approach we did a theoretical analysis of a dense matrix multiplication algorithm and determined an equation that relates the number of execution cycles with the architectural parameters. Based on this equation a many-core architecture has been designed. The results obtained indicate that a 100 mm(2) integrated circuit design of the proposed architecture, using a 65 nm technology, is able to achieve 464 GFLOPs (double precision floating-point) for a memory bandwidth of 16 GB/s. This corresponds to a performance efficiency of 71 %. Considering a 45 nm technology, a 100 mm(2) chip attains 833 GFLOPs which corresponds to 84 % of peak performance These figures are better than those obtained by previous many-core architectures, except for the area efficiency which is limited by the lower memory bandwidth considered. The results achieved are also better than those of previous state-of-the-art many-cores architectures designed specifically to achieve high performance for matrix multiplication.pt_PT
dc.identifier.citationJOSÉ, Wilson M.; [et al] - Algorithm-oriented design of efficient many-core architectures applied to dense matrix multiplication. Analog Integrated Circuits and Signal Processing. ISSN 0925-1030. Vol. 82, N.º 1 (2015), pp. 147-158
dc.identifier.doi10.1007/s10470-014-0441-7pt_PT
dc.identifier.issn0925-1030
dc.identifier.issn1573-1979
dc.identifier.urihttp://hdl.handle.net/10400.21/5739
dc.language.isoengpt_PT
dc.peerreviewedyespt_PT
dc.publisherSpringerpt_PT
dc.relation.ispartofseriesSI;
dc.relation.publisherversionhttp://link.springer.com/article/10.1007%2Fs10470-014-0441-7pt_PT
dc.subjectMatrix multiplicationpt_PT
dc.subjectMany-corept_PT
dc.subjectHigh-performancept_PT
dc.subjectAlgorithm-oriented designpt_PT
dc.subjectApplication-specific integrated circuitpt_PT
dc.titleAlgorithm-oriented design of efficient many-core architectures applied to dense matrix multiplicationpt_PT
dc.typejournal article
dspace.entity.typePublication
oaire.citation.endPage158pt_PT
oaire.citation.issue1pt_PT
oaire.citation.startPage147pt_PT
oaire.citation.volume82pt_PT
person.familyNameVéstias
person.givenNameMário
person.identifier.ciencia-id4717-C2C7-3F2C
person.identifier.orcid0000-0001-8556-4507
person.identifier.ridH-9953-2012
person.identifier.scopus-author-id14525867300
rcaap.rightsclosedAccesspt_PT
rcaap.typearticlept_PT
relation.isAuthorOfPublicationa7d22b29-c961-45ac-bc09-cd5e1002f1e8
relation.isAuthorOfPublication.latestForDiscoverya7d22b29-c961-45ac-bc09-cd5e1002f1e8

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Algorithm-oriented design of efficient many-core architectures applied to dense matrix multiplication.pdf
Size:
949.68 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: