Loading...
5 results
Search Results
Now showing 1 - 5 of 5
- On the suitability of suffix arrays for lempel-ziv data compressionPublication . J. Ferreira, Artur; Oliveira, Arlindo L.; Figueiredo, MarioLossless compression algorithms of the Lempel-Ziv (LZ) family are widely used nowadays. Regarding time and memory requirements, LZ encoding is much more demanding than decoding. In order to speed up the encoding process, efficient data structures, like suffix trees, have been used. In this paper, we explore the use of suffix arrays to hold the dictionary of the LZ encoder, and propose an algorithm to search over it. We show that the resulting encoder attains roughly the same compression ratios as those based on suffix trees. However, the amount of memory required by the suffix array is fixed, and much lower than the variable amount of memory used by encoders based on suffix trees (which depends on the text to encode). We conclude that suffix arrays, when compared to suffix trees in terms of the trade-off among time, memory, and compression ratio, may be preferable in scenarios (e.g., embedded systems) where memory is at a premium and high speed is not critical.
- Suffix Arrays: A competitive choice for fast Lempel-Ziv CompressionPublication . J. Ferreira, Artur; Oliveira, Arlindo L.; Figueiredo, MarioLossless compression algorithms of the Lempel-Ziv (LZ) family are widely used in a variety of applications. The LZ encoder and decoder exhibit a high asymmetry, regarding time and memory requirements, with the former being much more demanding. Several techniques have been used to speed up the encoding process; among them is the use of suffix trees. In this paper, we explore the use of a simple data structure, named suffix array, to hold the dictionary of the LZ encoder, and propose an algorithm to search the dictionary. A comparison with the suffix tree based LZ encoder is carried out, showing that the compression ratios are roughly the same. The ammount of memory required by the suffix array is fixed, being much lower than the variable memory requirements of the suffix tree encoder, which depends on the text to encode. We conclude that suffix arrays are a very interesting option regarding the tradeoff between time, memory, and compression ratio, when compared with suffix trees, that make them preferable in some compression scenarios.
- Hybrid generative/discriminative training of radial basis function networksPublication . J. Ferreira, Artur; Figueiredo, MarioWe propose a new training algorithm for radial basics function networks (RBFN), which incorporates both generative (mixture-based) and discriminative (logistic) criteria. Our algorithm incorporates steps from the classical expectation-maximization algorithm for mixtures of Gaussians with a logistic regression step to update (in a discriminative way) the output weights. We also describe an incremental version of the algorithm, which is robust regarding initial conditions. Comparison of our approach with existing training algorithms, on (both synthetic and real) binary classification problems, shows that it achieves better performance.
- Boosting of (very) weak learnersPublication . J. Ferreira, Artur; Figueiredo, MárioIn this paper we apply boosting to weak (binary) learners. The main idea is to combine the output of several simple learners in order to obtain a better classifier. As weak learners we consider generative classifiers and radial basis function classifiers.Our tests on synthetic data show that the proposed algorithm has good convergence properties. On benchmark data, boosting of these weak learners attains results close to the well-known Real AdaBoost algorithm (with decision trees) and support vector machines, constituting a low complexity competitive choice.
- On the use of suffix arrays for memory-efficient Lempel-Ziv data compressionPublication . J. Ferreira, Artur; Oliveira, Arlindo L.; Figueiredo, MarioThe Lempel-Ziv 77 (LZ77) and LZ-Storer-Szymanski (LZSS) text compression algorithms use a sliding window over the sequence of symbols, with two sub-windows: the dictionary (symbols already encoded) and the look-ahead-buffer (LAB) (symbols not yet encoded). Binary search trees and suffix trees (ST) have been used to speedup the search of the LAB over the dictionary, at the expense of high memory usage [1]. A suffix array (SA) is a simpler, more compact data structure which uses (much) less memory [2,3] to hold the same information. The SA for a length m string is an array of integers ([1], ...[k], ...a[m]) that stores the lexicographic order of suffix k of the string; sub-string searching, as used in LZ77/LZSS, is done by searching the SA.