Logo do repositório
 
Publicação

Feature selection for clustering categorical data with an embedded modelling approach

dc.contributor.authorSilvestre, Cláudia
dc.contributor.authorCardoso, Margarida
dc.contributor.authorFigueiredo, Mário
dc.date.accessioned2014-12-12T12:07:00Z
dc.date.available2014-12-12T12:07:00Z
dc.date.issued2014-09-23
dc.description.abstractResearch on the problem of feature selection for clustering continues to develop. This is a challenging task, mainly due to the absence of class labels to guide the search for relevant features. Categorical feature selection for clustering has rarely been addressed in the literature, with most of the proposed approaches having focused on numerical data. In this work, we propose an approach to simultaneously cluster categorical data and select a subset of relevant features. Our approach is based on a modification of a finite mixture model (of multinomial distributions), where a set of latent variables indicate the relevance of each feature. To estimate the model parameters, we implement a variant of the expectation-maximization algorithm that simultaneously selects the subset of relevant features, using a minimum message length criterion. The proposed approach compares favourably with two baseline methods: a filter based on an entropy measure and a wrapper based on mutual information. The results obtained on synthetic data illustrate the ability of the proposed expectation-maximization method to recover ground truth. An application to real data, referred to official statistics, shows its usefulness.en
dc.identifier.citationSilvestre C., Cardoso M. G. M. S. and Figueiredo M. (2014), Feature selection for clustering categorical data with an embedded modelling approach, Expert Systemspor
dc.identifier.doi10.1111/exsy.12082
dc.identifier.urihttp://hdl.handle.net/10400.21/4046
dc.language.isoengpor
dc.peerreviewedyespor
dc.publisherJohn Wiley & Sons, Ltdpor
dc.relation.publisherversionhttp://onlinelibrary.wiley.com/doi/10.1111/exsy.12082/pdfpor
dc.subjectCluster analysispor
dc.subjectFinite mixture modelspor
dc.subjectEM-MML algorithmpor
dc.subjectFeature selectionpor
dc.subjectCategorical featurespor
dc.titleFeature selection for clustering categorical data with an embedded modelling approachpor
dc.typejournal article
dspace.entity.typePublication
oaire.citation.issuexxxx 2014, Vol. 00, No. 00por
oaire.citation.titleExpert Systemspor
person.familyNameSilvestre
person.givenNameCláudia
person.identifier.ciencia-idDA12-EF3F-C7CD
person.identifier.orcid0000-0002-8850-4304
rcaap.rightsopenAccesspor
rcaap.typearticlepor
relation.isAuthorOfPublication08fbc1bf-3387-4137-8c03-c4664dd43375
relation.isAuthorOfPublication.latestForDiscovery08fbc1bf-3387-4137-8c03-c4664dd43375

Ficheiros

Principais
A mostrar 1 - 1 de 1
Miniatura indisponível
Nome:
RESUMO_ExpS2014.doc
Tamanho:
39.5 KB
Formato:
Microsoft Word
Licença
A mostrar 1 - 1 de 1
Miniatura indisponível
Nome:
license.txt
Tamanho:
1.71 KB
Formato:
Item-specific license agreed upon to submission
Descrição:

Coleções