Browsing by Author "Cardoso, Margarida G. M. S."
Now showing 1 - 4 of 4
Results Per Page
Sort Options
- Comparing clustering solutions: the use of adjusted paired indicesPublication . Amorim, Maria José de Pina da Cruz; Cardoso, Margarida G. M. S.In the present paper we compare clustering solutions using indices of paired agreement. We propose a new method - IADJUST - to correct indices of paired agreement, excluding agreement by chance. This new method overcomes previous limitations known in the literature as it permits the correction of any index. We illustrate its use in external clustering validation, to measure the accordance between clusters and an a priori known structure. The adjusted indices are intended to provide a realistic measure of clustering performance that excludes agreement by chance with ground truth. We use simulated data sets, under a range of scenarios - considering diverse numbers of clusters, clusters overlaps and balances - to discuss the pertinence and the precision of our proposal. Precision is established based on comparisons with the analytical approach for correction specific indices that can be corrected in this way are used for this purpose. The pertinence of the proposed correction is discussed when making a detailed comparison between the performance of two classical clustering approaches, namely Expectation-Maximization (EM) and K-Means (KM) algorithms. Eight indices of paired agreement are studied and new corrected indices are obtained.
- O critério Minimum Message Length na estimação de modelos de mistura sobre dados mistosPublication . Silvestre, Cláudia; Cardoso, Margarida G. M. S.; Figueiredo, Mário A. T.Neste trabalho propomos uma nova variante do algoritmo Expectation-Maximization para agrupar dados mistos que simultaneamente estima o número de grupos. Recorremos aos modelos demistura finita, pressupondo que os dados categoriais são modeladospor distribuições multinomiais e os métricos por distribuições gaussianas. Para estimar o número de componentes de mistura baseamo-nos no critério Minimum Message Lenght. O desempenho do algoritmo proposto, designado por EM-MML-mix, é comparado com o de outros critérios usados frequentemente para a seleção de modelos de mistura. Desta análise comparativa, realizada sobre dados simulados e sobre um conjunto de dados reais provenientes do European Social Survey, salienta-se o reduzido tempo de computação para a obtenção da solução mediante a metodologia proposta.
- Mapping atmospheric pollutants emissions in European countriesPublication . Martins, Ana Alexandra; Cardoso, Margarida G. M. S.; Pinto, IolaIn this paper we present a methodology which enables the graphical representation, in a bi-dimensional Euclidean space, of atmospheric pollutants emissions in European countries. This approach relies on the use of Multidimensional Unfolding (MDU), an exploratory multivariate data analysis technique. This technique illustrates both the relationships between the emitted gases and the gases and their geographical origins. The main contribution of this work concerns the evaluation of MDU solutions. We use simulated data to define thresholds for the model fitting measures, allowing the MDU output quality evaluation. The quality assessment of the model adjustment is thus carried out as a step before interpretation of the gas types and geographical origins results. The MDU maps analysis generates useful insights, with an immediate substantive result and enables the formulation of hypotheses for further analysis and modeling.
- Paired indices for clustering evaluation correction for agreement by chancePublication . Amorim, Maria José de Pina da Cruz; Cardoso, Margarida G. M. S.In the present paper we focus on the performance of clustering algorithms using indices of paired agreement to measure the accordance between clusters and an a priori known structure. We specifically propose a method to correct all indices considered for agreement by chance - the adjusted indices are meant to provide a realistic measure of clustering performance. The proposed method enables the correction of virtually any index - overcoming previous limitations known in the literature - and provides very precise results. We use simulated datasets under diverse scenarios and discuss the pertinence of our proposal which is particularly relevant when poorly separated clusters are considered. Finally we compare the performance of EM and KMeans algorithms, within each of the simulated scenarios and generally conclude that EM generally yields best results.
