Repository logo
 

Search Results

Now showing 1 - 7 of 7
  • PHYLOViZ Online: web-based tool for visualization, phylogenetic inference, analysis and sharing of minimum spanning trees
    Publication . Ribeiro-Gonçalves, Bruno; Francisco, Alexandre P.; Vaz, Cátia; Ramirez, Mário; Carrico, Joao
    High-throughput sequencing methods generated allele and single nucleotide polymorphism information for thousands of bacterial strains that are publicly available in online repositories and created the possibility of generating similar information for hundreds to thousands of strains more in a single study. Minimum spanning tree analysis of allelic data offers a scalable and reproducible methodological alternative to traditional phylogenetic inference approaches, useful in epidemiological investigations and population studies of bacterial pathogens. PHYLOViZ Online was developed to allow users to do these analyses without software installation and to enable easy accessing and sharing of data and analyses results from any Internet enabled computer. PHYLOViZ Online also offers a RESTful API for programmatic access to data and algorithms, allowing it to be seamlessly integrated into any third party web service or software.
  • Fast phylogenetic inference from typing data
    Publication . Carrico, Joao; Crochemore, Maxime; Francisco, Alexandre; Pissis, Solon; Ribeiro-Gonçalves, Bruno; Vaz, Cátia
    Background: Microbial typing methods are commonly used to study the relatedness of bacterial strains. Sequence based typing methods are a gold standard for epidemiological surveillance due to the inherent portability of sequence and allelic profle data, fast analysis times and their capacity to create common nomenclatures for strains or clones. This led to development of several novel methods and several databases being made available for many microbial species. With the mainstream use of High Throughput Sequencing, the amount of data being accumulated in these databases is huge, storing thousands of diferent profles. On the other hand, computing genetic evolution ary distances among a set of typing profles or taxa dominates the running time of many phylogenetic inference methods. It is important also to note that most of genetic evolution distance defnitions rely, even if indirectly, on computing the pairwise Hamming distance among sequences or profles. Results: We propose here an average-case linear-time algorithm to compute pairwise Hamming distances among a set of taxa under a given Hamming distance threshold. This article includes both a theoretical analysis and extensive experimental results concerning the proposed algorithm. We further show how this algorithm can be successfully integrated into a well known phylogenetic inference method, and how it can be used to speedup querying local phylogenetic patterns over large typing databases.
  • Distance-based phylogenetic inference from typing data: a unifying view
    Publication . Vaz, Cátia; Nascimento, Marta; Carrico, Joao; Rocher, Tatiana; Francisco, Alexandre P.
    Typing methods are widely used in the surveillance of infectious diseases, outbreaks investigation and studies of the natural history of an infection. Moreover, their use is becoming standard, in particular with the introduction of high-throughput sequencing. On the other hand, the data being generated are massive and many algorithms have been proposed for a phylogenetic analysis of typing data, addressing both correctness and scalability issues. Most of the distance-based algorithms for inferring phylogenetic trees follow the closest pair joining scheme. This is one of the approaches used in hierarchical clustering. Moreover, although phylogenetic inference algorithms may seem rather different, the main difference among them resides on how one defines cluster proximity and on which optimization criterion is used. Both cluster proximity and optimization criteria rely often on a model of evolution. In this work, we review, and we provide a unified view of these algorithms. This is an important step not only to better understand such algorithms but also to identify possible computational bottlenecks and improvements, important to deal with large data sets.
  • GrapeTree: visualization of core genomic relationships among 100,000 bacterial pathogens
    Publication . Zhou, Zhemin; Alikhan, Nabil-Fareed; Sergeant, Martin; Luhmann, Nina; Vaz, Cátia; Francisco, Alexandre; Carrico, Joao; Achtman, Mark
    Current methods struggle to reconstruct and visualize the genomic relationships of large numbers of bacterial genomes. GrapeTree facilitates the analyses of large numbers of allelic profiles by a static "GrapeTree Layout" algorithm that supports interactive visualizations of large trees within a web browser window. GrapeTree also implements a novel minimum spanning tree algorithm (MSTree V2) to reconstruct genetic relationships despite high levels of missing data. GrapeTree is a stand-alone package for investigating phylogenetic trees plus associated metadata and is also integrated into EnteroBase to facilitate cutting edge navigation of genomic relationships among bacterial pathogens.
  • PHYLOViZ 2.0: providing scalable data integration and visualization for multiple phylogenetic inference methods
    Publication . Nascimento, Marta; Sousa, Adriano; Ramirez, Mario; Francisco, Alexandre; Carrico, Joao; Vaz, Cátia
    High Throughput Sequencing provides a cost effective means of generating high resolution data for hundreds or even thousands of strains, and is rapidly superseding methodologies based on a few genomic loci. The wealth of genomic data deposited on public databases such as Sequence Read Archive/European Nucleotide Archive provides a powerful resource for evolutionary analysis and epidemiological surveillance. However, many of the analysis tools currently available do not scale well to these large datasets, nor provide the means to fully integrate ancillary data. Here we present PHYLOViZ 2.0, an extension of PHYLOViZ tool, a platform independent Java tool that allows phylogenetic inference and data visualization for large datasets of sequence based typing methods, including Single Nucleotide Polymorphism ( SNP) and whole genome/core genome Multilocus Sequence Typing (wg/cgMLST) analysis. PHYLOViZ 2.0 incorporates new data analysis algorithms and new visualization modules, as well as the capability of saving projects for subsequent work or for dissemination of results.
  • PHYLOViZ: phylogenetic inference and data visualization for sequence based typing methods
    Publication . Francisco, Alexandre P.; Vaz, Cátia; Monteiro, Pedro T.; Melo-Cristino, José; Ramirez, Mario; Carrico, Joao
    Background: With the decrease of DNA sequencing costs, sequence-based typing methods are rapidly becoming the gold standard for epidemiological surveillance. These methods provide reproducible and comparable results needed for a global scale bacterial population analysis, while retaining their usefulness for local epidemiological surveys. Online databases that collect the generated allelic profiles and associated epidemiological data are available but this wealth of data remains underused and are frequently poorly annotated since no user-friendly tool exists to analyze and explore it. Results: PHYLOViZ is platform independent Java software that allows the integrated analysis of sequence-based typing methods, including SNP data generated from whole genome sequence approaches, and associated epidemiological data. goeBURST and its Minimum Spanning Tree expansion are used for visualizing the possible evolutionary relationships between isolates. The results can be displayed as an annotated graph overlaying the query results of any other epidemiological data available. Conclusions: PHYLOViZ is a user-friendly software that allows the combined analysis of multiple data sources for microbial epidemiological and population studies. It is freely available at http://www.phyloviz.net.
  • An ontology and a REST API for sequence based microbial typing data
    Publication . Almeida, João; Tiple, João; Ramirez, Mário; Melo-Cristino, José; Vaz, Cátia; Francisco, Alexandre P.; Carrico, Joao
    In the Microbial typing field, the need to have a common understanding of the concepts described and the ability to share results within the community is an increasingly important requisite for the continued development of portable and accurate sequence-based typing methods. These methods are used for bacterial strain identification and are fundamental tools in Clinical Microbiology and Bacterial Population Genetics studies. In this article we propose an ontology designed for the microbial typing field, focusing on the widely used Multi Locus Sequence Typing methodology, and a RESTful API for accessing information systems based on the proposed ontology. This constitutes an important first step to accurately describe, analyze, curate, and manage information for microbial typing methodologies based on sequence based typing methodologies, and allows for the future integration with data analysis Web services.