Repository logo
 
Publication

Model selection in discrete clustering: the EM-MML algorithm

dc.contributor.authorSilvestre, Cláudia
dc.contributor.authorCardoso, Margarida
dc.contributor.authorFigueiredo, Mário
dc.date.accessioned2017-12-12T14:48:58Z
dc.date.available2017-12-12T14:48:58Z
dc.date.issued2016-12
dc.description.abstractFinite mixture models are widely used for cluster analysis in several areas of application. They are commonly estimated through likelihood maximization (using diverse variants of the expectation-maximization algorithm) and the number of components (or clusters) is determined resorting to information criteria: the EM algorithm is run several times and then one of the pre-estimated candidate models is selected (e.g. using the BIC criterion). We propose a new clustering approach to deal with the clustering of categorical data (quite common in social sciences) and simultaneously identify the number of clusters - the EM-MML algorithm. This approach assumes that the data comes from a finite mixture of multinomials and uses a variant of EM to estimate the model parameters and a minimum message length (MML) criterion to estimate the number of clusters. EM-MML thus seamlessly integrates estimation and model selection in a single algorithm. The EM-MML is compared with traditional EM approaches, using alternative information criteria. Comparisons rely on synthetic datasets and also on a real dataset (data from the European Social Survey). The results obtained illustrate the parsimony of the EM-MML solutions as well as their clusters cohesion-separation and stability. A clear advantage of EM-MML is also the computation time.en
dc.description.versionN/Apt_PT
dc.identifier.citationSILVESTRE, Cláudia; CARDOSO, Margarida; FIGUEIREDO, Mário - Model selection in discrete clustering: the EM-MML algorithm. In: International Conference of the ERCIM WG on Computational and Methodological Statistics, 9th, Sevilha, Espanha, (Universidad de Sevilla), 2016 (9-11 de dezembro)pt_PT
dc.identifier.urihttp://hdl.handle.net/10400.21/7688
dc.language.isoengpt_PT
dc.peerreviewedyespt_PT
dc.publisherCMStatisticspt_PT
dc.relation.publisherversionhttp://cmstatistics.org/CMStatistics2016/fullprogramme.phpen
dc.relation.publisherversionhttp://cmstatistics.org/RegistrationsV2/CMStatistics2016/viewSubmission.php?in=1307&token=q02p26s7q3nq46oo2422167os6856p69en
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/pt_PT
dc.subjectFinite mixture modelsen
dc.subjectEM-MML algorithmen
dc.subjectNumber of clustersen
dc.titleModel selection in discrete clustering: the EM-MML algorithmen
dc.typeconference object
dspace.entity.typePublication
oaire.citation.conferencePlaceUniversidad de Sevilla, Sevilla, Espanhapt_PT
oaire.citation.title: International Conference of the ERCIM WG on Computational and Methodological Statistics, 9then
person.familyNameSilvestre
person.givenNameCláudia
person.identifier.ciencia-idDA12-EF3F-C7CD
person.identifier.orcid0000-0002-8850-4304
rcaap.rightsrestrictedAccesspt_PT
rcaap.typeconferenceObjectpt_PT
relation.isAuthorOfPublication08fbc1bf-3387-4137-8c03-c4664dd43375
relation.isAuthorOfPublication.latestForDiscovery08fbc1bf-3387-4137-8c03-c4664dd43375

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Model selection in discrete clustering- international conference.pdf
Size:
175.31 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: