Repository logo
 

Search Results

Now showing 1 - 5 of 5
  • NGSPipes: fostering reproducibility and scalability in biosciences
    Publication . Dantas, Bruno; Fleitas, Camenelias; Almeida, Alexandre; Forja, João; Francisco, Alexandre; Simão, José; Vaz, Cátia
    Biosciences have been revolutionised by NGS technologies in last years, leading to new perspectives in medical, industrial and environmental applications. And although our motivation comes from biosciences, the following is true for many areas of science: published results are usually hard to reproduce, delaying the adoption of new methodologies and hindering innovation. Even if data and tools are freely available, pipelines for data analysis are in general barely described and their setup is far from trivial. NGSPipes addresses these issues reducing the efforts necessary to define, build and deploy pipelines, either at a local workstation or in the cloud. NGSPipes framework is freely available at http://ngspipes.github.io/.
  • Fast phylogenetic inference from typing data
    Publication . Carrico, Joao; Crochemore, Maxime; Francisco, Alexandre; Pissis, Solon; Ribeiro-Gonçalves, Bruno; Vaz, Cátia
    Background: Microbial typing methods are commonly used to study the relatedness of bacterial strains. Sequence based typing methods are a gold standard for epidemiological surveillance due to the inherent portability of sequence and allelic profle data, fast analysis times and their capacity to create common nomenclatures for strains or clones. This led to development of several novel methods and several databases being made available for many microbial species. With the mainstream use of High Throughput Sequencing, the amount of data being accumulated in these databases is huge, storing thousands of diferent profles. On the other hand, computing genetic evolution ary distances among a set of typing profles or taxa dominates the running time of many phylogenetic inference methods. It is important also to note that most of genetic evolution distance defnitions rely, even if indirectly, on computing the pairwise Hamming distance among sequences or profles. Results: We propose here an average-case linear-time algorithm to compute pairwise Hamming distances among a set of taxa under a given Hamming distance threshold. This article includes both a theoretical analysis and extensive experimental results concerning the proposed algorithm. We further show how this algorithm can be successfully integrated into a well known phylogenetic inference method, and how it can be used to speedup querying local phylogenetic patterns over large typing databases.
  • Dynamic phylogenetic inference for sequence-based typing data
    Publication . Francisco, Alexandre; Nascimento, Marta; Vaz, Cátia
    Typing methods are widely used in the surveillance of infectious diseases, outbreaks investigation and studies of the natural history of an infection. And their use is becoming standard, in particular with the introduction of High Throughput Sequencing (HTS). On the other hand, the data being generated is massive and many algorithms have been proposed for phylogenetic analysis of typing data, such as the goeBURST algorithm. These algorithms must however be run whenever new data becomes available starting from scratch. We address this issue proposing a dynamic version of goeBURST algorithm. Experimental results show that this new version is efficient on integrating new data and updating inferred evolutionary patterns, improving the update running time by at least one order of magnitude.
  • GrapeTree: visualization of core genomic relationships among 100,000 bacterial pathogens
    Publication . Zhou, Zhemin; Alikhan, Nabil-Fareed; Sergeant, Martin; Luhmann, Nina; Vaz, Cátia; Francisco, Alexandre; Carrico, Joao; Achtman, Mark
    Current methods struggle to reconstruct and visualize the genomic relationships of large numbers of bacterial genomes. GrapeTree facilitates the analyses of large numbers of allelic profiles by a static "GrapeTree Layout" algorithm that supports interactive visualizations of large trees within a web browser window. GrapeTree also implements a novel minimum spanning tree algorithm (MSTree V2) to reconstruct genetic relationships despite high levels of missing data. GrapeTree is a stand-alone package for investigating phylogenetic trees plus associated metadata and is also integrated into EnteroBase to facilitate cutting edge navigation of genomic relationships among bacterial pathogens.
  • PHYLOViZ 2.0: providing scalable data integration and visualization for multiple phylogenetic inference methods
    Publication . Nascimento, Marta; Sousa, Adriano; Ramirez, Mario; Francisco, Alexandre; Carrico, Joao; Vaz, Cátia
    High Throughput Sequencing provides a cost effective means of generating high resolution data for hundreds or even thousands of strains, and is rapidly superseding methodologies based on a few genomic loci. The wealth of genomic data deposited on public databases such as Sequence Read Archive/European Nucleotide Archive provides a powerful resource for evolutionary analysis and epidemiological surveillance. However, many of the analysis tools currently available do not scale well to these large datasets, nor provide the means to fully integrate ancillary data. Here we present PHYLOViZ 2.0, an extension of PHYLOViZ tool, a platform independent Java tool that allows phylogenetic inference and data visualization for large datasets of sequence based typing methods, including Single Nucleotide Polymorphism ( SNP) and whole genome/core genome Multilocus Sequence Typing (wg/cgMLST) analysis. PHYLOViZ 2.0 incorporates new data analysis algorithms and new visualization modules, as well as the capability of saving projects for subsequent work or for dissemination of results.