Logo do repositório
 
Miniatura indisponível
Publicação

Model selection in discrete clustering: the EM-MML algorithm

Utilize este identificador para referenciar este registo.

Orientador(es)

Resumo(s)

Finite mixture models are widely used for cluster analysis in several areas of application. They are commonly estimated through likelihood maximization (using diverse variants of the expectation-maximization algorithm) and the number of components (or clusters) is determined resorting to information criteria: the EM algorithm is run several times and then one of the pre-estimated candidate models is selected (e.g. using the BIC criterion). We propose a new clustering approach to deal with the clustering of categorical data (quite common in social sciences) and simultaneously identify the number of clusters - the EM-MML algorithm. This approach assumes that the data comes from a finite mixture of multinomials and uses a variant of EM to estimate the model parameters and a minimum message length (MML) criterion to estimate the number of clusters. EM-MML thus seamlessly integrates estimation and model selection in a single algorithm. The EM-MML is compared with traditional EM approaches, using alternative information criteria. Comparisons rely on synthetic datasets and also on a real dataset (data from the European Social Survey). The results obtained illustrate the parsimony of the EM-MML solutions as well as their clusters cohesion-separation and stability. A clear advantage of EM-MML is also the computation time.

Descrição

Palavras-chave

Finite mixture models EM-MML algorithm Number of clusters

Contexto Educativo

Citação

SILVESTRE, Cláudia; CARDOSO, Margarida; FIGUEIREDO, Mário - Model selection in discrete clustering: the EM-MML algorithm. In: International Conference of the ERCIM WG on Computational and Methodological Statistics, 9th, Sevilha, Espanha, (Universidad de Sevilla), 2016 (9-11 de dezembro)

Projetos de investigação

Unidades organizacionais

Fascículo

Editora

CMStatistics