Nextflow is both a reactive workflow framework and a domain-specific language (DSL). It is gaining lots of tracking in bioinformatics thanks in large part to the nf-core open source community that develops and publishes reusable workflows for many use cases.
To start learning nextflow, I worked through Andrew Severin’s excellent Creating a NextFlow workflow tutorial. (The tutorial follows the older DSL1 specification of nextflow, but only a few small modifications were needed to run it under DSL2.)
The DSL2 code I wrote is here and these are notes I took while working through the tutorial:
To make a variable a pipeline parameter prepend it with
params.
, then specify them in the command line:main.nf
:#! /usr/bin/env nextflow params.query="file.fasta" println "Querying file $params.query"
shell command
:nextflow run main.nf --query other_file.fasta
The
-log
argument directs logging to the specified file.nextflow -log nextflo.log run main.nf
To clean up intermediate files automatically upon workflow completion, use the
cleanup
parameter within a profile.profiles { standard { cleanup = true } debug { cleanup = false } }
- By convention the
standard
profile is implicitly used when no other
profile is specified by the user. - Cleaning up intermediate files precludes the use of
-resume
.
- By convention the
The
nextflow.config
file sets the global parameters, e.g.- process
- manifest
- executor
- profiles
- docker
- singularity
- timeline
- report
- etc
Contents of the
work
folder for a nextflow task:.command.begin
is the begin script if you have one.command.err
is useful when it crashes..command.run
is the full nextflow pipeline that was run, this is helpful when trouble shooting a nextflow error rather than the script error..command.sh
shows what was run..exitcode
will have the exit code in it.
Displaying help messages
main.nf
def helpMessage() { log.info """ Usage: The typical command for running the pipeline is as follows: nextflow run main.nf --query QUERY.fasta --dbDir "blastDatabaseDirectory" --dbName "blastPrefixName" Mandatory arguments: --query Query fasta file of sequences you wish to BLAST --dbDir BLAST database directory (full path required) [...] """ } // Show help message if (params.help) { helpMessage() exit 0 }
shell command
:nextflow run main.nf --help
The
publishDir
directive accepts arguments likemode
andpattern
to fine tune its behavior, e.g.output: file("${label}/short_summary.specific.*.txt") publishDir "${params.outdir}/BUSCOResults/${label}/", mode: 'copy', pattern: "${label}/short_summary.specific.*.txt"
DSL2 allows piping, e.g.
workflow { res = Channel .fromPath(params.query) .splitFasta(by: 1, file:true) | runBlast res.collectFile(name: 'blast_output_combined.txt', storeDir: params.outdir) }
Add a timeline report to the output with
timeline { enabled = true file = "$params.outdir/timeline.html" }
(in
nextflow.config
).Add a detailed execution report with
report { enabled = true file = "$params.outdir/report.html" }
(in
nextflow.config
).Include a profile-specific configuration file
nextflow.config
profiles { slurm { includeConfig './configs/slurm.config' } }
configs/slurm.config
process { executor = 'slurm' clusterOptions = '-N 1 -n 16 -t 24:00:00' }
and use it via
nextflow run main.nf -profile slurm
Similarly, refer to a test profile, specified in a separate file:
nextflow.config
test { includeConfig './configs/test.config' }
Adding a manifest to
nextflow.config
manifest { name = 'isugifNF/tutorial' author = 'Andrew Severin' homePage = 'www.bioinformaticsworkbook.org' description = 'nextflow bash' mainScript = 'main.nf' version = '1.0.0' }
Using a
label
for a process allows granular control of a process’ configurationmain.nf
process runBlast { label 'blast' }
nextflow.config
process { executor = 'slurm' clusterOptions = '-N 1 -n 16 -t 02:00:00' withLabel: blast { module = 'blast-plus' } }
- The
label
has to be placed before theinput
section.
- The
Loading a
module
specifically for a processprocess runBlast { module = 'blast-plus' publishDir "${params.outdir}/blastout" input: path queryFile from queryFile_ch . . . // these three dots mean I didn't paste the whole process. }
Enabling
docker
in thenextflow.config
docker { docker.enabled = true }
- The docker container can be specified in the process, e.g.
container = 'ncbi/blast'
or
container = `quay.io/biocontainers/blast/2.2.31--pl526he19e7b1_5`
- We can include additional options to pass to the container as well:
containerOptions = "--bind $launchDir/$params.outdir/config:/augustus/config"
projectDir
refers to the directory where the main workflow script is located. (It used to be calledbaseDir
.)Refering to local directories from within a docker container: create a channel
- Working in containers, we need a way to pass the database file location directly into the runBlast process without the need of the local path.
Repeating a process over each element of a channel with
each
: input repeatersTurning a queue channel into a value channel, which can be used multiple times.
- A value channel is implicitly created by a process when it is invoked with a simple value.
- A value channel is also implicitly created as output for a process whose inputs are all value channels.
- A queue channel can be converted into a value channel by returning a single value, using e.g.
first
,last
,collect
,count
,min
,max
,reduce
,sum
, etc. For example: therunBlast
process receives three inputs in the following example:- the
queryFile_ch
queue channel, with multiple sequences. - the
dbDir_ch
value channel, created by calling.first()
, which is reused for all elements ofqueryFile_ch
- the
dbName_ch
value channel, which is also reused for all elements ofqueryFile_ch
- the
workflow { channel.fromPath(params.dbDir).first() .set { dbDir_ch } channel.from(params.dbName).first() .set { dbName_ch } queryFile_ch = channel .fromPath(params.query) .splitFasta(by: 1, file:true) res = runBlast(queryFile_ch, dbDir_ch, dbName_ch) res.collectFile(name: 'blast_output_combined.txt', storeDir: params.outdir) }
Additional resources
This work is licensed under a Creative Commons Attribution 4.0 International License.