Repository logo
 
Publication

Determining the number of clusters in categorical data

dc.contributor.authorSilvestre, Cláudia
dc.contributor.authorCardoso, Margarida
dc.contributor.authorFigueiredo, Mário
dc.date.accessioned2014-12-12T13:11:24Z
dc.date.available2014-12-12T13:11:24Z
dc.date.issued2013-07
dc.description.abstractCluster analysis for categorical data has been an active area of research. A well-known problem in this area is the determination of the number of clusters, which is unknown and must be inferred from the data. In order to estimate the number of clusters, one often resorts to information criteria, such as BIC (Bayesian information criterion), MML (minimum message length, proposed by Wallace and Boulton, 1968), and ICL (integrated classification likelihood). In this work, we adopt the approach developed by Figueiredo and Jain (2002) for clustering continuous data. They use an MML criterion to select the number of clusters and a variant of the EM algorithm to estimate the model parameters. This EM variant seamlessly integrates model estimation and selection in a single algorithm. For clustering categorical data, we assume a finite mixture of multinomial distributions and implement a new EM algorithm, following a previous version (Silvestre et al., 2008). Results obtained with synthetic datasets are encouraging. The main advantage of the proposed approach, when compared to the above referred criteria, is the speed of execution, which is especially relevant when dealing with large data sets.en
dc.identifier.citationSilvestre, Cláudia; Cardoso, Margarida; Figueiredo, Mário - Determining the Number of Clusters in Categorical Data. In CONFERENCE OF THE INTERNATIONAL FEDERATION OF CLASSICATION SOCIETIES - IFCS-2013, Tilburg, (Netherlands), 14-17 July 2013por
dc.identifier.urihttp://hdl.handle.net/10400.21/4048
dc.language.isoengpor
dc.peerreviewedyespor
dc.relation.publisherversionhttp://spitswww.uvt.nl/fsw/mto/IFCS2013/Book%20of%20Abstracts.IFCS2013_Final.pdfpor
dc.subjectCluster analysisen
dc.subjectModel selectionen
dc.subjectCategorical variablesen
dc.titleDetermining the number of clusters in categorical datapor
dc.typeconference object
dspace.entity.typePublication
oaire.citation.conferencePlaceTilburg, (Netherlands)por
oaire.citation.titleConference of the International Federation of Classication Societies - IFCS-2013por
person.familyNameSilvestre
person.givenNameCláudia
person.identifier.ciencia-idDA12-EF3F-C7CD
person.identifier.orcid0000-0002-8850-4304
rcaap.rightsrestrictedAccesspor
rcaap.typeconferenceObjectpor
relation.isAuthorOfPublication08fbc1bf-3387-4137-8c03-c4664dd43375
relation.isAuthorOfPublication.latestForDiscovery08fbc1bf-3387-4137-8c03-c4664dd43375

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
RESUMO_2013 IFCS CS MC MF resumo.doc
Size:
26.5 KB
Format:
Microsoft Word
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: