Name: | Description: | Size: | Format: | |
---|---|---|---|---|
128.93 KB | Adobe PDF |
Advisor(s)
Abstract(s)
Motivation and challenges: Next-Generation Sequencing (NGS) technologies are greatly increasing the amount of genomic computer data, revolutionizing the biosciences field and leading to the development of more complex NGS Data Analysis techniques [2]. These techniques, known as pipelines or workflows, consist of running and refining a series of intertwined computational analysis and visualization tasks on large amounts of data. These pipelines involve the use of multiple software tools and data resources in a staged fashion, with the output of one tool being passed as input to the next one. To simplify the design and execution of biomedical workflows by end users, especially those that use multiple software tools and data resources, a number of scientific workflow systems have been developed over the past decade. Examples include Galaxy [1] and Swift [3]. However, most of these scientific workflow systems cannot be easily deployed and most of the times are only available to users with access to specialized IT support. There are two main issues to address in the design of an execution environment to these pipelines. First, due to the complexity of configuring and parametrizing pipelines, the use of NGS Data Analysis techniques is not an easy task for a user without IT knowledge. Second, knowing input data can be as much as terabytes and petabytes, pipelines execution require, in general, a great amount of computational resources.
Description
Keywords
NGS Data Analysis techniques Pipelines
Citation
FORJA, João; [et al] – NGS4Cloud : Cloud-based NGS Data Processing. In Iforum 2016. Lisboa, Portugal: Inforum, 2016. Pp. 1-2.