Developing and reusing bioinformatics data analysis pipelines using scientific workflow systems - Nantes Université Access content directly
Journal Articles Computational and Structural Biotechnology Journal Year : 2023

Developing and reusing bioinformatics data analysis pipelines using scientific workflow systems

Abstract

Data analysis pipelines are now established as an effective means for specifying and executing bioinformatics data analysis and experiments. While scripting languages, particularly Python, R and notebooks, are popular and sufficient for developing small-scale pipelines that are often intended for a single user, it is now widely recognized that they are by no means enough to support the development of large-scale, shareable, maintainable and reusable pipelines capable of handling large volumes of data and running on high performance computing clusters. This review outlines the key requirements for building large-scale data pipelines and provides a mapping of existing solutions that fulfill them. We then highlight the benefits of using scientific workflow systems to get modular, reproducible and reusable bioinformatics data analysis pipelines. We finally discuss current workflow reuse practices based on an empirical study we performed on a large collection of workflows.
Fichier principal
Vignette du fichier
1-s2.0-S2001037023001010-main.pdf (2.3 Mo) Télécharger le fichier
Origin : Publication funded by an institution
Licence : CC BY NC ND - Attribution - NonCommercial - NoDerivatives

Dates and versions

hal-04037221 , version 1 (20-03-2023)
hal-04037221 , version 2 (31-03-2023)

Identifiers

Cite

Marine Djaffardjy, George Marchment, Clémence Sebe, Raphael Blanchet, Khalid Bellajhame, et al.. Developing and reusing bioinformatics data analysis pipelines using scientific workflow systems. Computational and Structural Biotechnology Journal, 2023, 21, pp.2075-2085. ⟨10.1016/j.csbj.2023.03.003⟩. ⟨hal-04037221v2⟩
21 View
26 Download

Altmetric

Share

Gmail Facebook Twitter LinkedIn More