Repository logo
 

Search Results

Now showing 1 - 10 of 20
  • Computing RF Tree Distance over Succinct Representations
    Publication . Branco, António Pedro; Vaz, Cátia; Francisco, Alexandre P.
    There are several tools available to infer phylogenetic trees, which depict the evolutionary relationships among biological entities such as viral and bacterial strains in infectious outbreaks or cancerous cells in tumor progression trees. These tools rely on several inference methods available to produce phylogenetic trees, with resulting trees not being unique. Thus, methods for comparing phylogenies that are capable of revealing where two phylogenetic trees agree or differ are required. An approach is then proposed to compute a similarity or dissimilarity measure between trees, with the Robinson–Foulds distance being one of the most used, and which can be computed in linear time and space. Nevertheless, given the large and increasing volume of phylogenetic data, phylogenetic trees are becoming very large with hundreds of thousands of leaves. In this context, space requirements become an issue both while computing tree distances and while storing trees. We propose then an efficient implementation of the Robinson–Foulds distance over tree succinct representations. Our implementation also generalizes the Robinson–Foulds distances to labelled phylogenetic trees, i.e., trees containing labels on all nodes, instead of only on leaves. Experimental results show that we are able to still achieve linear time while requiring less space. Our implementation in C++ is available as an open-source tool.
  • TypOn: the microbial typing ontology
    Publication . Vaz, Cátia; Francisco, Alexandre P.; Silva, Mickael; Jolley, Keith A.; Bray, James E.; Pouseele, Hannes; Rothganger, Joerg; Ramirez, Mário; Carriço, João A.
    Bacterial identification and characterization at subspecies level is commonly known as Microbial Typing. Currently, these methodologies are fundamental tools in Clinical Microbiology and bacterial population genetics studies to track outbreaks and to study the dissemination and evolution of virulence or pathogenicity factors and antimicrobial resistance. Due to advances in DNA sequencing technology, these methods have evolved to become focused on sequence-based methodologies. The need to have a common understanding of the concepts described and the ability to share results within the community at a global level are increasingly important requisites for the continued development of portable and accurate sequence-based typing methods, especially with the recent introduction of Next Generation Sequencing (NGS) technologies. In this paper, we present an ontology designed for the sequence-based microbial typing field, capable of describing any of the sequence-based typing methodologies currently in use and being developed, including novel NGS based methods. This is a fundamental step to accurately describe, analyze, curate, and manage information for microbial typing based on sequence based typing methods.
  • On the analysis of compensation correctness
    Publication . Vaz, Cátia; Ferreira, Carla
    One fundamental idea of service-oriented computing is that applications should be developed by composing already available services. Due to the long running nature of service interactions, a main challenge in service composition is ensuring correctness of transaction recovery. In this paper, we use a process calculus suitable for modelling long running transactions with a recovery mechanism based on compensations. Within this setting, we discuss and formally state correctness criteria for compensable processes compositions, assuming that each process is correct with respect to transaction recovery. Under our theory, we formally interpret self-healing compositions, that can detect and recover from faults, as correct compositions of compensable processes. Moreover, we develop an automated verification approach and we apply it to an illustrative case study.
  • Towards distance-based phylogenetic inference in average-case linear-time
    Publication . Crochemore, Maxime; Francisco, Alexandre P.; Pissis, Solon; Vaz, Cátia
    Computing genetic evolution distances among a set of taxa dominates the running time of many phylogenetic inference methods. Most of genetic evolution distance definitions rely, even if indirectly, on computing the pairwise Hamming distance among sequences or profiles. We propose here an average-case linear-time algorithm to compute pairwise Hamming distances among a set of taxa under a given Hamming distance threshold. This article includes both a theoretical analysis and extensive experimental results concerning the proposed algorithm. We further show how this algorithm can be successfully integrated into a well known phylogenetic inference method.
  • NGSPipes: fostering reproducibility and scalability in biosciences
    Publication . Dantas, Bruno; Fleitas, Camenelias; Almeida, Alexandre; Forja, João; Francisco, Alexandre; Simão, José; Vaz, Cátia
    Biosciences have been revolutionised by NGS technologies in last years, leading to new perspectives in medical, industrial and environmental applications. And although our motivation comes from biosciences, the following is true for many areas of science: published results are usually hard to reproduce, delaying the adoption of new methodologies and hindering innovation. Even if data and tools are freely available, pipelines for data analysis are in general barely described and their setup is far from trivial. NGSPipes addresses these issues reducing the efforts necessary to define, build and deploy pipelines, either at a local workstation or in the cloud. NGSPipes framework is freely available at http://ngspipes.github.io/.
  • PHYLOViZ Online: web-based tool for visualization, phylogenetic inference, analysis and sharing of minimum spanning trees
    Publication . Ribeiro-Gonçalves, Bruno; Francisco, Alexandre P.; Vaz, Cátia; Ramirez, Mário; Carrico, Joao
    High-throughput sequencing methods generated allele and single nucleotide polymorphism information for thousands of bacterial strains that are publicly available in online repositories and created the possibility of generating similar information for hundreds to thousands of strains more in a single study. Minimum spanning tree analysis of allelic data offers a scalable and reproducible methodological alternative to traditional phylogenetic inference approaches, useful in epidemiological investigations and population studies of bacterial pathogens. PHYLOViZ Online was developed to allow users to do these analyses without software installation and to enable easy accessing and sharing of data and analyses results from any Internet enabled computer. PHYLOViZ Online also offers a RESTful API for programmatic access to data and algorithms, allowing it to be seamlessly integrated into any third party web service or software.
  • Fast phylogenetic inference from typing data
    Publication . Carrico, Joao; Crochemore, Maxime; Francisco, Alexandre; Pissis, Solon; Ribeiro-Gonçalves, Bruno; Vaz, Cátia
    Background: Microbial typing methods are commonly used to study the relatedness of bacterial strains. Sequence based typing methods are a gold standard for epidemiological surveillance due to the inherent portability of sequence and allelic profle data, fast analysis times and their capacity to create common nomenclatures for strains or clones. This led to development of several novel methods and several databases being made available for many microbial species. With the mainstream use of High Throughput Sequencing, the amount of data being accumulated in these databases is huge, storing thousands of diferent profles. On the other hand, computing genetic evolution ary distances among a set of typing profles or taxa dominates the running time of many phylogenetic inference methods. It is important also to note that most of genetic evolution distance defnitions rely, even if indirectly, on computing the pairwise Hamming distance among sequences or profles. Results: We propose here an average-case linear-time algorithm to compute pairwise Hamming distances among a set of taxa under a given Hamming distance threshold. This article includes both a theoretical analysis and extensive experimental results concerning the proposed algorithm. We further show how this algorithm can be successfully integrated into a well known phylogenetic inference method, and how it can be used to speedup querying local phylogenetic patterns over large typing databases.
  • On the expressive power of primitives for compensation handling
    Publication . Lanese, Ivan; Vaz, Cátia; Ferreira, Carla
    Modern software systems have frequently to face unexpected events, reacting so to reach a consistent state. In the field of concurrent and mobile systems (e.g., for web services) the problem is usually tackled using long running transactions and compensations: activities programmed to recover partial executions of long running transactions. We compare the expressive power of different approaches to the specification of those compensations. We consider (i) static recovery, where the compensation is statically defined together with the transaction, (ii) parallel recovery, where the compensation is dynamically built as parallel composition of compensation elements and (iii) general dynamic recovery, where more refined ways of composing compensation elements are provided. We define an encoding of parallel recovery into static recovery enjoying nice compositionality properties, showing that the two approaches have the same expressive power. We also show that no such encoding of general dynamic recovery into static recovery is possible, i.e. general dynamic recovery is strictly more expressive.
  • Towards compensation correctness in interactive systems
    Publication . Vaz, Cátia; Ferreira, Carla
    One fundamental idea of service-oriented computing is that applications should be developed by composing already available services. Due to the long running nature of service interactions, a main challenge in service composition is ensuring correctness of failure recovery. In this paper, we use a process calculus suitable for modelling long running transactions with a recovery mechanism based on compensations. Within this setting, we discuss and formally state correctness criteria for compensable processes compositions, assuming that each process is correct with respect to failure recovery. Under our theory, we formally interpret self-healing compositions, that can detect and recover from failures, as correct compositions of compensable processes.
  • Dynamic phylogenetic inference for sequence-based typing data
    Publication . Francisco, Alexandre; Nascimento, Marta; Vaz, Cátia
    Typing methods are widely used in the surveillance of infectious diseases, outbreaks investigation and studies of the natural history of an infection. And their use is becoming standard, in particular with the introduction of High Throughput Sequencing (HTS). On the other hand, the data being generated is massive and many algorithms have been proposed for phylogenetic analysis of typing data, such as the goeBURST algorithm. These algorithms must however be run whenever new data becomes available starting from scratch. We address this issue proposing a dynamic version of goeBURST algorithm. Experimental results show that this new version is efficient on integrating new data and updating inferred evolutionary patterns, improving the update running time by at least one order of magnitude.