Repository logo
 
Publication

An MML embedded approach for estimating the number of clusters

dc.contributor.authorSilvestre, Cláudia
dc.contributor.authorCardoso, Margarida
dc.contributor.authorFigueiredo, Mário
dc.date.accessioned2023-01-05T15:57:14Z
dc.date.available2023-01-05T15:57:14Z
dc.date.issued2022-07-19
dc.description.abstractAssuming that the data originate from a finite mixture of multinomial distributions, we study the performance of an integrated Expectation Maximization (EM) algorithm considering Minimum Message Length (MML) criterion to select the number of mixture components. The referred EM-MML approach, rather than selecting one among a set of pre-estimated candidate models (which requires running EM several times), seamlessly integrates estimation and model selection in a single algorithm. Comparisons are provided with EM combined with well-known information criteria – e.g. the Bayesian information Criterion. We resort to synthetic data examples and a real application. The EM-MML computation time is a clear advantage of this method; also, the real data solution it provides is more parsimonious, which reduces the risk of model order overestimation and improves interpretability.pt_PT
dc.description.versionN/Apt_PT
dc.identifier.citationSilvestre, C., Cardoso, M. & Figueiredo, M. (2022, julho, 22-23). An MML embedded approach for estimating the number of clusters. Paper presented at 17th Conference of the IFCS 2022– International Federation of Classification Societies: Classification and Data Science in the Digital Age. Porto, Portugal.pt_PT
dc.identifier.urihttp://hdl.handle.net/10400.21/15258
dc.language.isoengpt_PT
dc.peerreviewedyespt_PT
dc.publisherCLAD - Associação Portuguesa de Classificação e Análise de Dadospt_PT
dc.publisherFaculdade de Economia, Universidade do Portopt_PT
dc.relation.publisherversionhttps://clad.pt/en/2022/02/03/ifcs-2022-xvii-congresso-da-ifcs-3/pt_PT
dc.relation.publisherversionhttps://ifcs2022.fep.up.pt/wp-content/uploads/2022/07/IFCS2022_Book_Abstracts_v1.pdfpt_PT
dc.subjectFinite mixture modelpt_PT
dc.subjectEm algorithmpt_PT
dc.subjectModel selectionpt_PT
dc.subjectMinimum message lengthpt_PT
dc.subjectCategorical datapt_PT
dc.titleAn MML embedded approach for estimating the number of clusterspt_PT
dc.typeconference object
dspace.entity.typePublication
oaire.citation.conferencePlaceFaculdade de Economia, Universidade do Porto, Portopt_PT
oaire.citation.title17th Conference of the IFCS 2022– International Federation of Classification Societies: Classification and Data Science in the Digital Age.pt_PT
person.familyNameSilvestre
person.givenNameCláudia
person.identifier.ciencia-idDA12-EF3F-C7CD
person.identifier.orcid0000-0002-8850-4304
rcaap.rightsclosedAccesspt_PT
rcaap.typeconferenceObjectpt_PT
relation.isAuthorOfPublication08fbc1bf-3387-4137-8c03-c4664dd43375
relation.isAuthorOfPublication.latestForDiscovery08fbc1bf-3387-4137-8c03-c4664dd43375

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
IFCS2022-Programme.pdf
Size:
181.73 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: