Repository logo
 
No Thumbnail Available
Publication

An MML embedded approach for estimating the number of clusters

Use this identifier to reference this record.

Advisor(s)

Abstract(s)

Assuming that the data originate from a finite mixture of multinomial distributions, we study the performance of an integrated Expectation Maximization (EM) algorithm considering Minimum Message Length (MML) criterion to select the number of mixture components. The referred EM-MML approach, rather than selecting one among a set of pre-estimated candidate models (which requires running EM several times), seamlessly integrates estimation and model selection in a single algorithm. Comparisons are provided with EM combined with well-known information criteria – e.g. the Bayesian information Criterion. We resort to synthetic data examples and a real application. The EM-MML computation time is a clear advantage of this method; also, the real data solution it provides is more parsimonious, which reduces the risk of model order overestimation and improves interpretability

Description

Keywords

Finite mixture model EM algorithm Model selection Minimum message length Categorical data

Citation

Silvestre, C., Cardoso, M.G.M.S., Figueiredo, M. (2023). An MML embedded approach for estimating the number of clusters. In P. Brito, J.G. Dias, B. Lausen, A. Montanari, & R. Nugent (eds), Classification and data science in the Digital Age. IFCS 2022. Studies in Classification, Data Analysis, and Knowledge Organization (pp. 353-361), Springer. https://doi.org/10.1007/978-3-031-09034-9_38

Research Projects

Organizational Units

Journal Issue