Repository logo
 
Publication

An MML embedded approach for estimating the number of clusters

dc.contributor.authorSilvestre, Cláudia
dc.contributor.authorCardoso, Maria Margarida G. M. S.
dc.contributor.authorFigueiredo, Mário
dc.date.accessioned2023-12-13T11:45:56Z
dc.date.available2023-12-13T11:45:56Z
dc.date.issued2023-12-08
dc.description.abstractAssuming that the data originate from a finite mixture of multinomial distributions, we study the performance of an integrated Expectation Maximization (EM) algorithm considering Minimum Message Length (MML) criterion to select the number of mixture components. The referred EM-MML approach, rather than selecting one among a set of pre-estimated candidate models (which requires running EM several times), seamlessly integrates estimation and model selection in a single algorithm. Comparisons are provided with EM combined with well-known information criteria – e.g. the Bayesian information Criterion. We resort to synthetic data examples and a real application. The EM-MML computation time is a clear advantage of this method; also, the real data solution it provides is more parsimonious, which reduces the risk of model order overestimation and improves interpretabilitypt_PT
dc.description.versioninfo:eu-repo/semantics/publishedVersionpt_PT
dc.identifier.citationSilvestre, C., Cardoso, M.G.M.S., Figueiredo, M. (2023). An MML embedded approach for estimating the number of clusters. In P. Brito, J.G. Dias, B. Lausen, A. Montanari, & R. Nugent (eds), Classification and data science in the Digital Age. IFCS 2022. Studies in Classification, Data Analysis, and Knowledge Organization (pp. 353-361), Springer. https://doi.org/10.1007/978-3-031-09034-9_38pt_PT
dc.identifier.doihttps://doi.org/10.1007/978-3-031-09034-9_38pt_PT
dc.identifier.isbn978-3-031-09034-9
dc.identifier.isbn978-3-031-09033-2 (print)
dc.identifier.urihttp://hdl.handle.net/10400.21/16694
dc.language.isoengpt_PT
dc.peerreviewedyespt_PT
dc.publisherSpringerpt_PT
dc.relation.publisherversionhttps://link.springer.com/chapter/10.1007/978-3-031-09034-9_38pt_PT
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/pt_PT
dc.subjectFinite mixture modelpt_PT
dc.subjectEM algorithmpt_PT
dc.subjectModel selectionpt_PT
dc.subjectMinimum message lengthpt_PT
dc.subjectCategorical datapt_PT
dc.titleAn MML embedded approach for estimating the number of clusterspt_PT
dc.typebook part
dspace.entity.typePublication
oaire.citation.conferencePlaceChampt_PT
oaire.citation.endPage361pt_PT
oaire.citation.startPage353pt_PT
oaire.citation.titleClassification and data science in the Digital Age. IFCS 2022. Studies in Classification, Data Analysis, and Knowledge Organizationpt_PT
person.familyNameSilvestre
person.familyNameCardoso
person.givenNameCláudia
person.givenNameMaria Margarida
person.identifier.ciencia-idDA12-EF3F-C7CD
person.identifier.ciencia-id3E1B-1DAD-9287
person.identifier.orcid0000-0002-8850-4304
person.identifier.orcid0000-0001-6239-7283
person.identifier.scopus-author-id21233265300
rcaap.rightsopenAccesspt_PT
rcaap.typebookPartpt_PT
relation.isAuthorOfPublication08fbc1bf-3387-4137-8c03-c4664dd43375
relation.isAuthorOfPublication069c85c4-ebe3-4293-b88a-dd066bc288de
relation.isAuthorOfPublication.latestForDiscovery069c85c4-ebe3-4293-b88a-dd066bc288de

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
An MML Embedded Approach for Estimating the Number of Clusters.pdf
Size:
330.95 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: