Publicação
Model selection in discrete clustering: the EM-MML algorithm
| dc.contributor.author | Silvestre, Cláudia | |
| dc.contributor.author | Cardoso, Margarida | |
| dc.contributor.author | Figueiredo, Mário | |
| dc.date.accessioned | 2017-12-12T14:48:58Z | |
| dc.date.available | 2017-12-12T14:48:58Z | |
| dc.date.issued | 2016-12 | |
| dc.description.abstract | Finite mixture models are widely used for cluster analysis in several areas of application. They are commonly estimated through likelihood maximization (using diverse variants of the expectation-maximization algorithm) and the number of components (or clusters) is determined resorting to information criteria: the EM algorithm is run several times and then one of the pre-estimated candidate models is selected (e.g. using the BIC criterion). We propose a new clustering approach to deal with the clustering of categorical data (quite common in social sciences) and simultaneously identify the number of clusters - the EM-MML algorithm. This approach assumes that the data comes from a finite mixture of multinomials and uses a variant of EM to estimate the model parameters and a minimum message length (MML) criterion to estimate the number of clusters. EM-MML thus seamlessly integrates estimation and model selection in a single algorithm. The EM-MML is compared with traditional EM approaches, using alternative information criteria. Comparisons rely on synthetic datasets and also on a real dataset (data from the European Social Survey). The results obtained illustrate the parsimony of the EM-MML solutions as well as their clusters cohesion-separation and stability. A clear advantage of EM-MML is also the computation time. | en |
| dc.description.version | N/A | pt_PT |
| dc.identifier.citation | SILVESTRE, Cláudia; CARDOSO, Margarida; FIGUEIREDO, Mário - Model selection in discrete clustering: the EM-MML algorithm. In: International Conference of the ERCIM WG on Computational and Methodological Statistics, 9th, Sevilha, Espanha, (Universidad de Sevilla), 2016 (9-11 de dezembro) | pt_PT |
| dc.identifier.uri | http://hdl.handle.net/10400.21/7688 | |
| dc.language.iso | eng | pt_PT |
| dc.peerreviewed | yes | pt_PT |
| dc.publisher | CMStatistics | pt_PT |
| dc.relation.publisherversion | http://cmstatistics.org/CMStatistics2016/fullprogramme.php | en |
| dc.relation.publisherversion | http://cmstatistics.org/RegistrationsV2/CMStatistics2016/viewSubmission.php?in=1307&token=q02p26s7q3nq46oo2422167os6856p69 | en |
| dc.rights.uri | http://creativecommons.org/licenses/by-nc-sa/4.0/ | pt_PT |
| dc.subject | Finite mixture models | en |
| dc.subject | EM-MML algorithm | en |
| dc.subject | Number of clusters | en |
| dc.title | Model selection in discrete clustering: the EM-MML algorithm | en |
| dc.type | conference object | |
| dspace.entity.type | Publication | |
| oaire.citation.conferencePlace | Universidad de Sevilla, Sevilla, Espanha | pt_PT |
| oaire.citation.title | : International Conference of the ERCIM WG on Computational and Methodological Statistics, 9th | en |
| person.familyName | Silvestre | |
| person.givenName | Cláudia | |
| person.identifier.ciencia-id | DA12-EF3F-C7CD | |
| person.identifier.orcid | 0000-0002-8850-4304 | |
| rcaap.rights | restrictedAccess | pt_PT |
| rcaap.type | conferenceObject | pt_PT |
| relation.isAuthorOfPublication | 08fbc1bf-3387-4137-8c03-c4664dd43375 | |
| relation.isAuthorOfPublication.latestForDiscovery | 08fbc1bf-3387-4137-8c03-c4664dd43375 |
Ficheiros
Principais
1 - 1 de 1
Miniatura indisponível
- Nome:
- Model selection in discrete clustering- international conference.pdf
- Tamanho:
- 175.31 KB
- Formato:
- Adobe Portable Document Format
Licença
1 - 1 de 1
Miniatura indisponível
- Nome:
- license.txt
- Tamanho:
- 1.71 KB
- Formato:
- Item-specific license agreed upon to submission
- Descrição:
