Loading...
7 results
Search Results
Now showing 1 - 7 of 7
- Towards distance-based phylogenetic inference in average-case linear-timePublication . Crochemore, Maxime; Francisco, Alexandre P.; Pissis, Solon; Vaz, CátiaComputing genetic evolution distances among a set of taxa dominates the running time of many phylogenetic inference methods. Most of genetic evolution distance definitions rely, even if indirectly, on computing the pairwise Hamming distance among sequences or profiles. We propose here an average-case linear-time algorithm to compute pairwise Hamming distances among a set of taxa under a given Hamming distance threshold. This article includes both a theoretical analysis and extensive experimental results concerning the proposed algorithm. We further show how this algorithm can be successfully integrated into a well known phylogenetic inference method.
- PHYLOViZ Online: web-based tool for visualization, phylogenetic inference, analysis and sharing of minimum spanning treesPublication . Ribeiro-Gonçalves, Bruno; Francisco, Alexandre P.; Vaz, Cátia; Ramirez, Mário; Carrico, JoaoHigh-throughput sequencing methods generated allele and single nucleotide polymorphism information for thousands of bacterial strains that are publicly available in online repositories and created the possibility of generating similar information for hundreds to thousands of strains more in a single study. Minimum spanning tree analysis of allelic data offers a scalable and reproducible methodological alternative to traditional phylogenetic inference approaches, useful in epidemiological investigations and population studies of bacterial pathogens. PHYLOViZ Online was developed to allow users to do these analyses without software installation and to enable easy accessing and sharing of data and analyses results from any Internet enabled computer. PHYLOViZ Online also offers a RESTful API for programmatic access to data and algorithms, allowing it to be seamlessly integrated into any third party web service or software.
- Fast phylogenetic inference from typing dataPublication . Carrico, Joao; Crochemore, Maxime; Francisco, Alexandre; Pissis, Solon; Ribeiro-Gonçalves, Bruno; Vaz, CátiaBackground: Microbial typing methods are commonly used to study the relatedness of bacterial strains. Sequence based typing methods are a gold standard for epidemiological surveillance due to the inherent portability of sequence and allelic profle data, fast analysis times and their capacity to create common nomenclatures for strains or clones. This led to development of several novel methods and several databases being made available for many microbial species. With the mainstream use of High Throughput Sequencing, the amount of data being accumulated in these databases is huge, storing thousands of diferent profles. On the other hand, computing genetic evolution ary distances among a set of typing profles or taxa dominates the running time of many phylogenetic inference methods. It is important also to note that most of genetic evolution distance defnitions rely, even if indirectly, on computing the pairwise Hamming distance among sequences or profles. Results: We propose here an average-case linear-time algorithm to compute pairwise Hamming distances among a set of taxa under a given Hamming distance threshold. This article includes both a theoretical analysis and extensive experimental results concerning the proposed algorithm. We further show how this algorithm can be successfully integrated into a well known phylogenetic inference method, and how it can be used to speedup querying local phylogenetic patterns over large typing databases.
- Dynamic phylogenetic inference for sequence-based typing dataPublication . Francisco, Alexandre; Nascimento, Marta; Vaz, CátiaTyping methods are widely used in the surveillance of infectious diseases, outbreaks investigation and studies of the natural history of an infection. And their use is becoming standard, in particular with the introduction of High Throughput Sequencing (HTS). On the other hand, the data being generated is massive and many algorithms have been proposed for phylogenetic analysis of typing data, such as the goeBURST algorithm. These algorithms must however be run whenever new data becomes available starting from scratch. We address this issue proposing a dynamic version of goeBURST algorithm. Experimental results show that this new version is efficient on integrating new data and updating inferred evolutionary patterns, improving the update running time by at least one order of magnitude.
- Distance-based phylogenetic inference from typing data: a unifying viewPublication . Vaz, Cátia; Nascimento, Marta; Carrico, Joao; Rocher, Tatiana; Francisco, Alexandre P.Typing methods are widely used in the surveillance of infectious diseases, outbreaks investigation and studies of the natural history of an infection. Moreover, their use is becoming standard, in particular with the introduction of high-throughput sequencing. On the other hand, the data being generated are massive and many algorithms have been proposed for a phylogenetic analysis of typing data, addressing both correctness and scalability issues. Most of the distance-based algorithms for inferring phylogenetic trees follow the closest pair joining scheme. This is one of the approaches used in hierarchical clustering. Moreover, although phylogenetic inference algorithms may seem rather different, the main difference among them resides on how one defines cluster proximity and on which optimization criterion is used. Both cluster proximity and optimization criteria rely often on a model of evolution. In this work, we review, and we provide a unified view of these algorithms. This is an important step not only to better understand such algorithms but also to identify possible computational bottlenecks and improvements, important to deal with large data sets.
- Dynamic phylogenetic inference for sequence-based typing dataPublication . Francisco, Alexandre P.; Nascimento, Marta; Vaz, CátiaTyping methods are widely used in the surveillance of infectious diseases, outbreaks investigation and studies of the natural history of an infection. And their use is becoming standard, in particular with the introduction of High Throughput Sequencing (HTS). On the other hand, the data being generated is massive and many algorithms have been proposed for phylogenetic analysis of typing data, such as the goeBURST algorithm. These algorithms must however be run whenever new data becomes available starting from scratch. We address this issue proposing a dynamic version of goeBURST algorithm. Experimental results show that this new version is efficient on integrating new data and updating inferred evolutionary patterns, improving the update running time by at least one order of magnitude.
- phyloDB: a framework for large-scale phylogenetic analysis of sequence based typing dataPublication . Lourenço, Bruno; Vaz, Cátia; Coimbra, Miguel E.; Francisco, Alexandre P.PHYLODB is a modular and extensible framework for large-scale phylogenetic analyses of sequence based typing data, which are essential for understanding epidemics evolution. It relies on the Neo4j graph database for data storage and processing, providing a schema and an API for representing and querying phylogenetic data. Custom algorithms are also supported, allowing users to perform heavy computations directly over the data, and to store results in the database. Multiple computation results are stored as multilayer networks, promoting and facilitating comparative analyses, as well as avoiding unnecessary ab initio computations. The experimental evaluation results showcase that PHYLODB is efficient and scalable with respect to both API operations and algorithms execution.