Repository logo
 
Publication

An n-gram cache for large-scale parallel extraction of multiword relevant expressions with LocalMaxs

dc.contributor.authorGonçalves, Carlos
dc.contributor.authorSilva, Joaquim F.
dc.contributor.authorCunha, José C.
dc.date.accessioned2019-03-06T10:17:22Z
dc.date.available2019-03-06T10:17:22Z
dc.date.issued2017-03-06
dc.description.abstractLocalMaxs extracts relevant multiword terms based on their cohesion but is computationally intensive, a critical issue for very large natural language corpora. The corpus properties concerning n-gram distribution determine the algorithm complexity and were empirically analyzed for corpora up to 982 million words. A parallel LocalMaxs implementation exhibits almost linear relative efficiency, speedup, and sizeup, when executed with up to 48 cloud virtual machines and a distributed key-value store. To reduce the remote data communication, we present a novel n-gram cache with cooperative-based warm-up, leading to reduced miss ratio and time penalty. A cache analytical model is used to estimate the performance of cohesion calculation of n-gram expressions, based on corpus empirical data. The model estimates agree with the real execution results.pt_PT
dc.description.versioninfo:eu-repo/semantics/publishedVersionpt_PT
dc.identifier.citationGONÇALVES, Carlos; SILVA, Joaquim F.; CUNHA, José C. – An n-gram cache for large-scale parallel extraction of multiword relevant expressions with LocalMaxs. In Proceeding of the 2016 IEEE 12th International Conference on e-Science (e-Science). Baltimore, MD, USA: IEEE, 2016. ISBN 978-1-5090-4273-9. Pp. 120-129pt_PT
dc.identifier.doi10.1109/eScience.2016.7870892pt_PT
dc.identifier.isbn978-1-5090-4273-9
dc.identifier.isbn978-1-5090-4272-2
dc.identifier.isbn978-1-5090-4274-6
dc.identifier.issn2325-372X
dc.identifier.urihttp://hdl.handle.net/10400.21/9637
dc.language.isoengpt_PT
dc.publisherInstitute of Electrical and Electronics Engineerspt_PT
dc.relation.publisherversionhttps://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7870892&tag=1pt_PT
dc.subjectLarge corporapt_PT
dc.subjectStatistical extractionpt_PT
dc.subjectMultiword termspt_PT
dc.subjectParallel processingpt_PT
dc.subjectn-gram cachept_PT
dc.subjectPerformance evaluationpt_PT
dc.subjectCloud computingpt_PT
dc.titleAn n-gram cache for large-scale parallel extraction of multiword relevant expressions with LocalMaxspt_PT
dc.typeconference object
dspace.entity.typePublication
oaire.awardURIinfo:eu-repo/grantAgreement/FCT/5876/UID%2FCEC%2F04516%2F2013/PT
oaire.citation.conferencePlace23-27 Oct. 2016 - Baltimore, MD, USApt_PT
oaire.citation.endPage129pt_PT
oaire.citation.startPage120pt_PT
oaire.citation.title12th International Conference on e-Science (e-Science)pt_PT
oaire.fundingStream5876
project.funder.identifierhttp://doi.org/10.13039/501100001871
project.funder.nameFundação para a Ciência e a Tecnologia
rcaap.rightsclosedAccesspt_PT
rcaap.typeconferenceObjectpt_PT
relation.isProjectOfPublicationcbce3bca-c959-4bd5-a02c-0f1de598f8e0
relation.isProjectOfPublication.latestForDiscoverycbce3bca-c959-4bd5-a02c-0f1de598f8e0

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
CGoncalves.pdf
Size:
882.67 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: