ISEL - Eng. Elect. Tel. Comp. - Comunicações
Permanent URI for this collection
Browse
Recent Submissions
- Efficient feature selection for intrusion detection systems with priority queue-based GRASPPublication . Quincozes, Vagner E.; Quincozes, Silvio E.; Albuquerque, Célio; Passos, Diego; Massé, DanielThe Greedy Randomized Adaptive Search Proce dure for Feature Selection (GRASP-FS) is a recently-proposed metaheuristic that optimizes the feature selection process for Intrusion Detection Systems (IDS) by combining exploration and refinement techniques for more assertive intrusion detection. However, GRASP-FS may be time and resource-consuming for large datasets. In this work, we propose GRASPQ-FS, an extended version of GRASP-FS using Priority Queues to reduce resource consumption and processing time. As an additional contribution, we provide a comprehensive analysis of the most suitable parameters for our RASPQ-FS. Our results reveal that GRASPQ-FS can speed up feature selection up to 90% over GRASP-FS, without compromising F1-Score. Also, we observed that a priority queue with 50 solutions saved 50% in execution time while increasing the F1-Score by 4.5%.
- Feature transformation and reduction for text classificationPublication . J. Ferreira, Artur; Figueiredo, MarioText classification is an important tool for many applications, in su pervised, semi-supervised, and unsupervised scenarios. In order to be processed by machine learning methods, a text (document) is usually represented as a bag-of-words (BoW). A BoW is a large vector of features (usually stored as floating point values), which represent the relative frequency of occurrence of a given word/term in each document. Typically, we have a large number of features, many of which may be non-informative for classification tasks and thus the need for feature transformation, reduction, and selection arises. In this paper, we propose two efficient algorithms for feature transformation and reduction for BoW-like representations. The proposed algorithms rely on simple statistical analysis of the input pattern, exploiting the BoW and its binary version. The algorithms are evaluated with support vector machine (SVM) and AdaBoost classifiers on standard benchmark datasets. The experimental results show the adequacy of the reduced/transformed binary features for text classification problems as well as the improvement on the test set error rate, using the proposed methods.
- Boosting of (very) weak learnersPublication . J. Ferreira, Artur; Figueiredo, MárioIn this paper we apply boosting to weak (binary) learners. The main idea is to combine the output of several simple learners in order to obtain a better classifier. As weak learners we consider generative classifiers and radial basis function classifiers.Our tests on synthetic data show that the proposed algorithm has good convergence properties. On benchmark data, boosting of these weak learners attains results close to the well-known Real AdaBoost algorithm (with decision trees) and support vector machines, constituting a low complexity competitive choice.
- Long distance real-time echo cancellationPublication . J. Ferreira, Artur; Marques, Paulo; Carvalho, HélderThis paper describes an implementation of a long distance echo canceller, operating on full-duplex with handsfree and in real-time with a single Digital Signal Processor (DSP). The proposed solution is based on short length adaptive filters centered on the positions of the most significant echoes, tracked by time delay estimators. To deal with double talking situations a speech detector is employed. The floating-point DSP TMS320C6713 from Texas Instruments is used with software written on the C++ programming language using some compiler optimizations to reduce execution time. The resulting algorithm enables long distance echo cancellation with low computational requirements. It reaches greater echo return, loss enhancement and shows faster convergence speed than the conventional solu tion. Our experimental results also approach the CCITT G.165 recommendation for echo cancellers.
- On the use of suffix arrays for memory-efficient Lempel-Ziv data compressionPublication . J. Ferreira, Artur; Oliveira, Arlindo L.; Figueiredo, MarioThe Lempel-Ziv 77 (LZ77) and LZ-Storer-Szymanski (LZSS) text compression algorithms use a sliding window over the sequence of symbols, with two sub-windows: the dictionary (symbols already encoded) and the look-ahead-buffer (LAB) (symbols not yet encoded). Binary search trees and suffix trees (ST) have been used to speedup the search of the LAB over the dictionary, at the expense of high memory usage [1]. A suffix array (SA) is a simpler, more compact data structure which uses (much) less memory [2,3] to hold the same information. The SA for a length m string is an array of integers ([1], ...[k], ...a[m]) that stores the lexicographic order of suffix k of the string; sub-string searching, as used in LZ77/LZSS, is done by searching the SA.
- Modifications and Improvements on Iris RecognitionPublication . J. Ferreira, Artur; Lourenço, André; Pinto, Bárbara; Tendeiro, JorgeIris recognition is a well-known biometric technique. John Daugman has proposed a method for iris recogni tion, which is divided into four steps: segmentation, normalization, feature extraction and matching. In this paper, we evaluate, modify and extend John Daugman’s method. We study the images of CASIA and UBIRIS databases to establish some modifications and extensions on Daugman’s algorithm. The major modification is on the computationally demanding segmentation stage, for which we propose a template matching approach. The extensions on the algorithm address the important issue of pre-processing, that depends on the image database, being especially important when we have a non infra-red red camera (e.g. a WebCam). For this typical scenario, we propose several methods for reflexion removal and pupil enhancement and isolation. The tests, carried out by our C# application on grayscale CASIA and UBIRIS images, show that our template matching based segmentation method is accurate and faster than the one proposed by Daugman. Our fast pre-processing algorithms efficiently remove reflections on images taken by non infra-red cameras.
- Suffix Arrays: A competitive choice for fast Lempel-Ziv CompressionPublication . J. Ferreira, Artur; Oliveira, Arlindo L.; Figueiredo, MarioLossless compression algorithms of the Lempel-Ziv (LZ) family are widely used in a variety of applications. The LZ encoder and decoder exhibit a high asymmetry, regarding time and memory requirements, with the former being much more demanding. Several techniques have been used to speed up the encoding process; among them is the use of suffix trees. In this paper, we explore the use of a simple data structure, named suffix array, to hold the dictionary of the LZ encoder, and propose an algorithm to search the dictionary. A comparison with the suffix tree based LZ encoder is carried out, showing that the compression ratios are roughly the same. The ammount of memory required by the suffix array is fixed, being much lower than the variable memory requirements of the suffix tree encoder, which depends on the text to encode. We conclude that suffix arrays are a very interesting option regarding the tradeoff between time, memory, and compression ratio, when compared with suffix trees, that make them preferable in some compression scenarios.
- Hybrid generative/discriminative training of radial basis function networksPublication . J. Ferreira, Artur; Figueiredo, MarioWe propose a new training algorithm for radial basics function networks (RBFN), which incorporates both generative (mixture-based) and discriminative (logistic) criteria. Our algorithm incorporates steps from the classical expectation-maximization algorithm for mixtures of Gaussians with a logistic regression step to update (in a discriminative way) the output weights. We also describe an incremental version of the algorithm, which is robust regarding initial conditions. Comparison of our approach with existing training algorithms, on (both synthetic and real) binary classification problems, shows that it achieves better performance.
- A machine learning driven methodology for alarm prediction towards self-healing in wireless networksPublication . Mata, Luís; Sousa, Marco; Vieira, Pedro; Queluz, Maria Paula; Rodrigues, AntónioAlthough Artificial Intelligence (AI) is already used by 5th Generation (5G) to support specific network functions, the increased complexity of 6th Generation (6G) will demand the adoption of extended AI capabilities to enhance network efficiency. Moreover, high network performance and availability at a sustainable cost will be crucial to emerging applications, such as autonomous vehicles and smart cities. In this context, operators are expected to implement Self-Healing Operations (SHOs) to transition from reactive handling of network faults to a preventive approach, relying on statistical learning of network data. This paper proposes a Machine Learning (ML)-driven methodology to predict network faults using generic Fault Management (FM) data, enabling the implementation of preventive actions to avoid service degradation or failure. The evaluation of this methodology using live network data revealed statistical associations among certain network faults, considering both time and root-cause factors. Therefore, FM data and two ML models, namely Logistic Regression (LR) and Light Gradient Boosting Model (LGBM), were used to predict network faults, achieving a 93% success rate within a 60-minute anticipation period.
- A mutual information based discretization-selection techniquePublication . Ferreira, Artur; Figueiredo, MárioIn machine learning (ML) and data mining (DM) one often has to resort to data pre-processing techniques to achieve adequate data representations. Among these techniques, we find feature discretization (FD) and feature selection (FS), with many available methods for each one. The use of FD and FS techniques improves the data representation for ML and DM tasks. However, these techniques are usually applied in an independent way, that is, we may use a FD technique but not a FS technique or the opposite case. Using both FD and FS techniques in sequence, may not produce the most adequate results. In this paper, we propose a supervised discretization-selection technique; the discretization step is done in an incremental approach and keeps information regarding the features and the number of bits allocated per feature. Then, we apply a selection criterion based upon the discretization bins, yielding a discretized and dimensionality reduced dataset. We evaluate our technique on different typ es of data and in most cases the discretized and reduced version of the data is the most suited version, achieving better classification performance, as compared to the use of the original features.