Repository logo
 
Publication

Categorical data clustering using a minimum message length criterion

dc.contributor.authorSilvestre, Cláudia
dc.contributor.authorCardoso, Margarida
dc.contributor.authorFigueiredo, Mário
dc.date.accessioned2014-12-12T12:22:05Z
dc.date.available2014-12-12T12:22:05Z
dc.date.issued2012-10
dc.description.abstractResearch on cluster analysis for categorical data continues to develop, new clustering algorithms being proposed. However, in this context, the determination of the number of clusters is rarely addressed. We propose a new approach in which clustering and the estimation of the number of clusters is done simultaneously for categorical data. We assume that the data originate from a finite mixture of multinomial distributions and use a minimum message length criterion (MML) to select the number of clusters (Wallace and Bolton, 1986). For this purpose, we implement an EM-type algorithm (Silvestre et al., 2008) based on the (Figueiredo and Jain, 2002) approach. The novelty of the approach rests on the integration of the model estimation and selection of the number of clusters in a single algorithm, rather than selecting this number based on a set of pre-estimated candidate models. The performance of our approach is compared with the use of Bayesian Information Criterion (BIC) (Schwarz, 1978) and Integrated Completed Likelihood (ICL) (Biernacki et al., 2000) using synthetic data. The obtained results illustrate the capacity of the proposed algorithm to attain the true number of cluster while outperforming BIC and ICL since it is faster, which is especially relevant when dealing with large data sets.en
dc.identifier.citationSilvestre, Cláudia; Cardoso, Margarida; Figueiredo, Mário - Categorical Data Clustering Using a Minimum Message Length Criterio. In The Eleventh International Symposium on Intelligent Data Analysis (IDA 2012), Helsinki, (Finland), 25–27 October 2012. Posterpor
dc.identifier.urihttp://hdl.handle.net/10400.21/4047
dc.language.isoengpor
dc.peerreviewedyespor
dc.relation.publisherversionhttp://ida2012.org/program.pdfpor
dc.subjectCluster analysisen
dc.subjectCategorical dataen
dc.subjectExpectation-maximization algorithmen
dc.subjectMML - Minimum Message Lenght - criterionen
dc.titleCategorical data clustering using a minimum message length criterionpor
dc.typeconference object
dspace.entity.typePublication
oaire.citation.conferencePlaceHelsinki (Finland)por
oaire.citation.titleThe Eleventh International Symposium on Intelligent Data Analysis (IDA 2012)por
person.familyNameSilvestre
person.givenNameCláudia
person.identifier.ciencia-idDA12-EF3F-C7CD
person.identifier.orcid0000-0002-8850-4304
rcaap.rightsclosedAccesspor
rcaap.typeconferenceObjectpor
relation.isAuthorOfPublication08fbc1bf-3387-4137-8c03-c4664dd43375
relation.isAuthorOfPublication.latestForDiscovery08fbc1bf-3387-4137-8c03-c4664dd43375

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
RESUMO_ida2012 CS MC MF.doc
Size:
29 KB
Format:
Microsoft Word
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections