Repository logo
 
Publication

Feature transformation and reduction for text classification

dc.contributor.authorJ. Ferreira, Artur
dc.contributor.authorFigueiredo, Mario
dc.date.accessioned2024-11-18T09:39:54Z
dc.date.available2024-11-18T09:39:54Z
dc.date.issued2010
dc.description.abstractText classification is an important tool for many applications, in su pervised, semi-supervised, and unsupervised scenarios. In order to be processed by machine learning methods, a text (document) is usually represented as a bag-of-words (BoW). A BoW is a large vector of features (usually stored as floating point values), which represent the relative frequency of occurrence of a given word/term in each document. Typically, we have a large number of features, many of which may be non-informative for classification tasks and thus the need for feature transformation, reduction, and selection arises. In this paper, we propose two efficient algorithms for feature transformation and reduction for BoW-like representations. The proposed algorithms rely on simple statistical analysis of the input pattern, exploiting the BoW and its binary version. The algorithms are evaluated with support vector machine (SVM) and AdaBoost classifiers on standard benchmark datasets. The experimental results show the adequacy of the reduced/transformed binary features for text classification problems as well as the improvement on the test set error rate, using the proposed methods.pt_PT
dc.description.versioninfo:eu-repo/semantics/publishedVersionpt_PT
dc.identifier.citationFerreira, A., Figueiredo, M. – Feature Transformation and Reduction for Text Classification. In 10th International Workshop on Pattern Recognition in Information Systems - PRIS 2010, in conjunction with ICEIS 2010. Funchal, Portugal: SciTePress, 2010, ISBN 978-989-8425-14-0. Pp. 72-81pt_PT
dc.identifier.isbn978-989-8425-14-0
dc.identifier.urihttp://hdl.handle.net/10400.21/17914
dc.language.isoengpt_PT
dc.publisherSciTePresspt_PT
dc.subjecttext classificationpt_PT
dc.subjectbag-of-words (BoW)pt_PT
dc.titleFeature transformation and reduction for text classificationpt_PT
dc.typeconference object
dspace.entity.typePublication
oaire.awardURIinfo:eu-repo/grantAgreement/FCT/3599-PPCDT/PTDC%2FEEA-TEL%2F72572%2F2006/PT
oaire.awardURIinfo:eu-repo/grantAgreement/FCT/PIDDAC/SFRH%2FBD%2F45176%2F2008/PT
oaire.citation.conferencePlaceJunho 2010 - Funchal, Portugalpt_PT
oaire.citation.endPage81pt_PT
oaire.citation.startPage72pt_PT
oaire.citation.title10th International Workshop on Pattern Recognition in Information Systems - PRIS 2010pt_PT
oaire.fundingStream3599-PPCDT
oaire.fundingStreamPIDDAC
person.familyNameFerreira
person.familyNameFigueiredo
person.givenNameArtur
person.givenNameMario
person.identifier1049438
person.identifier3015485
person.identifier.ciencia-id091A-96FB-A88C
person.identifier.ciencia-idED1E-A787-3569
person.identifier.orcid0000-0002-6508-0932
person.identifier.orcid0000-0002-0970-7745
person.identifier.ridAAL-4377-2020
person.identifier.ridC-5428-2008
person.identifier.scopus-author-id35315359300
person.identifier.scopus-author-id34769730500
project.funder.identifierhttp://doi.org/10.13039/501100001871
project.funder.identifierhttp://doi.org/10.13039/501100001871
project.funder.nameFundação para a Ciência e a Tecnologia
project.funder.nameFundação para a Ciência e a Tecnologia
rcaap.rightsopenAccesspt_PT
rcaap.typeconferenceObjectpt_PT
relation.isAuthorOfPublication734bfe75-0c68-4cdf-8a87-2aef3564f5bd
relation.isAuthorOfPublicationd3d068dc-5887-4ecd-bf7f-067ef06e1943
relation.isAuthorOfPublication.latestForDiscoveryd3d068dc-5887-4ecd-bf7f-067ef06e1943
relation.isProjectOfPublicationee17ccab-f933-470e-a1c5-fb46c3c436cf
relation.isProjectOfPublication743e2656-2bd7-4c53-aa32-4f38226bb52a
relation.isProjectOfPublication.latestForDiscoveryee17ccab-f933-470e-a1c5-fb46c3c436cf

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Feature_AJFerreira.pdf
Size:
111.12 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: