Repository logo
 

Search Results

Now showing 1 - 10 of 24
  • Long distance real-time echo cancellation
    Publication . J. Ferreira, Artur; Marques, Paulo; Carvalho, Hélder
    This paper describes an implementation of a long distance echo canceller, operating on full-duplex with handsfree and in real-time with a single Digital Signal Processor (DSP). The proposed solution is based on short length adaptive filters centered on the positions of the most significant echoes, tracked by time delay estimators. To deal with double talking situations a speech detector is employed. The floating-point DSP TMS320C6713 from Texas Instruments is used with software written on the C++ programming language using some compiler optimizations to reduce execution time. The resulting algorithm enables long distance echo cancellation with low computational requirements. It reaches greater echo return, loss enhancement and shows faster convergence speed than the conventional solu tion. Our experimental results also approach the CCITT G.165 recommendation for echo cancellers.
  • On the suitability of suffix arrays for lempel-ziv data compression
    Publication . J. Ferreira, Artur; Oliveira, Arlindo L.; Figueiredo, Mario
    Lossless compression algorithms of the Lempel-Ziv (LZ) family are widely used nowadays. Regarding time and memory requirements, LZ encoding is much more demanding than decoding. In order to speed up the encoding process, efficient data structures, like suffix trees, have been used. In this paper, we explore the use of suffix arrays to hold the dictionary of the LZ encoder, and propose an algorithm to search over it. We show that the resulting encoder attains roughly the same compression ratios as those based on suffix trees. However, the amount of memory required by the suffix array is fixed, and much lower than the variable amount of memory used by encoders based on suffix trees (which depends on the text to encode). We conclude that suffix arrays, when compared to suffix trees in terms of the trade-off among time, memory, and compression ratio, may be preferable in scenarios (e.g., embedded systems) where memory is at a premium and high speed is not critical.
  • Suffix Arrays: A competitive choice for fast Lempel-Ziv Compression
    Publication . J. Ferreira, Artur; Oliveira, Arlindo L.; Figueiredo, Mario
    Lossless compression algorithms of the Lempel-Ziv (LZ) family are widely used in a variety of applications. The LZ encoder and decoder exhibit a high asymmetry, regarding time and memory requirements, with the former being much more demanding. Several techniques have been used to speed up the encoding process; among them is the use of suffix trees. In this paper, we explore the use of a simple data structure, named suffix array, to hold the dictionary of the LZ encoder, and propose an algorithm to search the dictionary. A comparison with the suffix tree based LZ encoder is carried out, showing that the compression ratios are roughly the same. The ammount of memory required by the suffix array is fixed, being much lower than the variable memory requirements of the suffix tree encoder, which depends on the text to encode. We conclude that suffix arrays are a very interesting option regarding the tradeoff between time, memory, and compression ratio, when compared with suffix trees, that make them preferable in some compression scenarios.
  • An unsupervised approach to feature discretization and selection
    Publication . J. Ferreira, Artur; Figueiredo, Mário A. T.
    Many learning problems require handling high dimensional data sets with a relatively small number of instances. Learning algorithms are thus confronted with the curse of dimensionality, and need to address it in order to be effective. Examples of these types of data include the bag-of-words representation in text classification problems and gene expression data for tumor detection/classification. Usually, among the high number of features characterizing the instances, many may be irrelevante (oreven detrimental) for the learning tasks. It ist hus clear that the reisaneed for adequate techniques for feature representation, reduction, and selection, to improve both the classification accuracy and the memory requirements. In this paper, we propose combined unsupervised feature discretization and feature selection techniques, suitable for médium and high-dimensional datas ets. The experimental results on several standard data sets, with both sparse and dense features, showthe efficiency of the proposed techniques as well as improvements over previous related techniques.
  • On the improvement of feature selection techniques: the fitness filter
    Publication . J. Ferreira, Artur; Figueiredo, Mario
    The need for feature selection (FS) techniques is central in many machine learning and pattern recognition problems. FS is a vast research field and therefore we now have many FS techniques proposed in the literature, applied in the context of quite different problems. Some of these FS techniques follow the relevance-redundancy (RR) framework to select the best subset of features. In this paper, we propose a supervised filter FS technique, named as fitness filter, that follows the RR framework and uses data discretization. This technique can be used directly on low or medium dimensional data or it can be applied as a post-processing technique to other FS techniques. Specifically, when used as a post-processing technique, it further reduces the dimensionality of the feature space found by common FS techniques and often improves the classification accuracy.
  • Feature Discretization with Relevance and Mutual Information Criteria
    Publication . J. Ferreira, Artur; Figueiredo, Mário A. T.
    Feature discretization (FD) techniques often yield adequate and compact representations of the data, suitable for machine learning and pattern recognition problems. These representations usually decrease the training time, yielding higher classification accuracy while allowing for humans to better understand and visualize the data, as compared to the use of the original features. This paper proposes two new FD techniques. The first one is based on the well-known Linde-Buzo-Gray quantization algorithm, coupled with a relevance criterion, being able perform unsupervised, supervised, or semi-supervised discretization. The second technique works in supervised mode, being based on the maximization of the mutual information between each discrete feature and the class label. Our experimental results on standard benchmark datasets show that these techniques scale up to high-dimensional data, attaining in many cases better accuracy than existing unsupervised and supervised FD approaches, while using fewer discretization intervals.
  • Hybrid generative/discriminative training of radial basis function networks
    Publication . J. Ferreira, Artur; Figueiredo, Mario
    We propose a new training algorithm for radial basics function networks (RBFN), which incorporates both generative (mixture-based) and discriminative (logistic) criteria. Our algorithm incorporates steps from the classical expectation-maximization algorithm for mixtures of Gaussians with a logistic regression step to update (in a discriminative way) the output weights. We also describe an incremental version of the algorithm, which is robust regarding initial conditions. Comparison of our approach with existing training algorithms, on (both synthetic and real) binary classification problems, shows that it achieves better performance.
  • Exploiting the bin-class histograms for feature selection on discrete data
    Publication . J. Ferreira, Artur; Figueiredo, Mário A. T.
    In machine learning and pattern recognition tasks, the use of feature discretization techniques may have several advantages. The discretized features may hold enough information for the learning task at hand, while ignoring minor fluctuations that are irrelevant or harmful for that task. The discretized features have more compact representations that may yield both better accuracy and lower training time, as compared to the use of the original features. However, in many cases, mainly with medium and high-dimensional data, the large number of features usually implies that there is some redundancy among them. Thus, we may further apply feature selection (FS) techniques on the discrete data, keeping the most relevant features, while discarding the irrelevant and redundant ones. In this paper, we propose relevance and redundancy criteria for supervised feature selection techniques on discrete data. These criteria are applied to the bin-class histograms of the discrete features. The experimental results, on public benchmark data, show that the proposed criteria can achieve better accuracy than widely used relevance and redundancy criteria, such as mutual information and the Fisher ratio.
  • Explainable machine learning for malware detection on Android applications
    Publication . Palma, Catarina; J. Ferreira, Artur; Figueiredo, Mário
    The presence of malicious software (malware), for example, in Android applications (apps), has harmful or irreparable consequences to the user and/or the device. Despite the protections app stores provide to avoid malware, it keeps growing in sophistication and diffusion. In this paper, we explore the use of machine learning (ML) techniques to detect malware in Android apps. The focus is on the study of different data pre-processing, dimensionality reduction, and classification techniques, assessing the generalization ability of the learned models using public domain datasets and specifically developed apps. We find that the classifiers that achieve better performance for this task are support vector machines (SVM) and random forests (RF). We emphasize the use of feature selection (FS) techniques to reduce the data dimensionality and to identify the most relevant features in Android malware classification, leading to explainability on this task. Our approach can identify the most relevant features to classify an app as malware. Namely, we conclude that permissions play a prominent role in Android malware detection. The proposed approach reduces the data dimensionality while achieving high accuracy in identifying malware in Android apps.
  • Incremental filter and wrapper approaches for feature discretization
    Publication . J. Ferreira, Artur; Figueiredo, Mário Alexandre Teles de
    Discrete data representations are necessary, or at least convenient, in many machine learning problems. While feature selection (FS) techniques aim at finding relevant subsets of features, the goal of feature discretization (FD) is to find concise (quantized) data representations, adequate for the learning task at hand. In this paper, we propose two incremental methods for FD. The first method belongs to the filter family, in which the quality of the discretization is assessed by a (supervised or unsupervised) relevance criterion. The second method is a wrapper, where discretized features are assessed using a classifier. Both methods can be coupled with any static (unsupervised or supervised) discretization procedure and can be used to perform FS as pre-processing or post-processing stages. The proposed methods attain efficient representations suitable for binary and multi-class problems with different types of data, being competitive with existing methods. Moreover, using well-known FS methods with the features discretized by our techniques leads to better accuracy than with the features discretized by other methods or with the original features. (C) 2013 Elsevier B.V. All rights reserved.