Repository logo
 
Loading...
Profile Picture

Search Results

Now showing 1 - 5 of 5
  • An MML embedded approach for estimating the number of clusters
    Publication . Silvestre, Cláudia; Cardoso, Maria Margarida G. M. S.; Figueiredo, Mário
    Assuming that the data originate from a finite mixture of multinomial distributions, we study the performance of an integrated Expectation Maximization (EM) algorithm considering Minimum Message Length (MML) criterion to select the number of mixture components. The referred EM-MML approach, rather than selecting one among a set of pre-estimated candidate models (which requires running EM several times), seamlessly integrates estimation and model selection in a single algorithm. Comparisons are provided with EM combined with well-known information criteria – e.g. the Bayesian information Criterion. We resort to synthetic data examples and a real application. The EM-MML computation time is a clear advantage of this method; also, the real data solution it provides is more parsimonious, which reduces the risk of model order overestimation and improves interpretability
  • Electricity market price analysis using time series clustering
    Publication . Martins, Ana Alexandra; Lagarto, João; Cardoso, Maria Margarida
    The creation of the internal market of electricity has long been a goal of the European Union, for which it has established common rules through the directive 2009/72/EC. In this context, the analysis of electricity markets operation of the different countries that will form the internal market is of the utmost importance. In this work, we use clustering techniques to analyze 26 time series of day-ahead electricity prices from European markets between 2015 and 2018 in order to identify different price patterns. The cluster technique proposed uses a combination of three dissimilarity measures for time series: Euclidean, Pearson correlation based and periodogram based. Results show that there is a clear distinction between Northern markets, especially Nord Pool, and Southern markets, MIBEL and Italy. Moreover, results also show that despite some market prices presenting similar behaviors, a full integrated European electricity market is yet to be accomplished.
  • Clustering stability and ground truth: numerical experiments
    Publication . Amorim, Maria José; Cardoso, Maria Margarida
    Stability has been considered an important property for evaluating clustering solutions. Nevertheless, there are no conclusive studies on the relationship between this property and the capacity to recover clusters inherent to data ("ground truth"). This study focuses on this relationship resorting to synthetic data generated under diverse scenarios (controlling relevant factors). Stability is evaluated using a weighted cross-validation procedure. Indices of agreement (corrected for agreement by chance) are used both to assess stability and external validation. The results obtained reveal a new perspective so far not mentioned in the literature. Despite the clear relationship between stability and external validity when a broad range of scenarios is considered, within-scenarios conclusions deserve our special attention: faced with a specific clustering problem (as we do in practice), there is no significant relationship between stability and the ability to recover data clusters.
  • Short-term load forecasting using time series clustering
    Publication . Martins, Ana Alexandra; Lagarto, João; Canacsinh, Hiren; Reis, Francisco; Cardoso, Maria Margarida
    Short-term load forecasting plays a major role in energy planning. Its accuracy has a direct impact on the way power systems are operated and managed. We propose a new Clustering-based Similar Pattern Forecasting algorithm (CSPF) for short-term load forecasting. It resorts to a K-Medoids clustering algorithm to identify load patterns and to the COMB distance to capture differences between time series. Clusters' labels are then used to identify similar sequences of days. Temperature information is also considered in the day-ahead load forecasting, resorting to the K-Nearest Neighbor approach. CSPF algorithm is intended to provide the aggregate forecast of Portugal's national load, for the next day, with a 15-min discretization, based on data from the Portuguese Transport Network Operator (TSO). CSPF forecasting performance, as evaluated by RMSE, MAE and MAPE metrics, outperforms three alternative/baseline methods, suggesting that the proposed approach is promising in similar applications.
  • Picturing agreement between clustering solutions using multidimensional unfolding: An application to greenhouse gas emissions data
    Publication . Martins, Ana Alexandra; Cardoso, Maria Margarida
    When evaluating a clustering solution, we often have to compare alternative solutions - e.g., to address clustering stability or external validity. Each comparison essentially relies on a contingency table referring to a pair of (crisp) clustering solutions. These data is commonly used as an input to: (1) an assignment problem, to match the clusters of the two partitions; (2) determine several indices of agreement; (3) represent the two partitions in a two-dimensional map resorting to Correspondence Analysis. We propose using the Multidimensional Unfolding (MDU) technique to picture the cross-classification data between two partitions, complementing a clustering evaluation analysis and overcoming some limitations of the traditional approaches (1) to (3). This approach relies on a new similarity measure that excludes agreement between clusters due to chance alone. The resulting MDU map is very easy to interpret, picturing agreement between clustering solutions: the further apart are the clusters (represented by points) from the two partitions, the larger the (Euclidean) distances between the corresponding points. Two applications illustrate the relevance of this approach: an application to a data set on UCI Machine Learning Repository to access clustering external validity; and an application to greenhouse gas emissions data to address the temporal stability of clustering solutions, the clusters of European countries, which have homogeneous sources of pollutant emissions, being compared over three years.