Nextflow is both a reactive workflow framework and a domain-specific language (DSL). It is gaining lots of tracking in bioinformatics thanks in large part to the nf-core open source community that develops and publishes reusable workflows for many use cases.
To start learning nextflow, I worked through Andrew Severin’s excellent Creating a NextFlow workflow tutorial. (The tutorial follows the older DSL1 specification of nextflow, but only a few small modifications were needed to run it under DSL2.)
The DSL2 code I wrote is here and these are notes I took while working through the tutorial:
To make a variable a pipeline parameter prepend it with
params., then specify them in the command line:main.nf:#! /usr/bin/env nextflow params.query="file.fasta" println "Querying file $params.query"shell command:nextflow run main.nf --query other_file.fastaThe
-logargument directs logging to the specified file.nextflow -log nextflo.log run main.nfTo clean up intermediate files automatically upon workflow completion, use the
cleanupparameter within a profile.profiles { standard { cleanup = true } debug { cleanup = false } }- By convention the
standardprofile is implicitly used when no other
profile is specified by the user. - Cleaning up intermediate files precludes the use of
-resume.
- By convention the
The
nextflow.configfile sets the global parameters, e.g.- process
- manifest
- executor
- profiles
- docker
- singularity
- timeline
- report
- etc
Contents of the
workfolder for a nextflow task:.command.beginis the begin script if you have one.command.erris useful when it crashes..command.runis the full nextflow pipeline that was run, this is helpful when trouble shooting a nextflow error rather than the script error..command.shshows what was run..exitcodewill have the exit code in it.
Displaying help messages
main.nfdef helpMessage() { log.info """ Usage: The typical command for running the pipeline is as follows: nextflow run main.nf --query QUERY.fasta --dbDir "blastDatabaseDirectory" --dbName "blastPrefixName" Mandatory arguments: --query Query fasta file of sequences you wish to BLAST --dbDir BLAST database directory (full path required) [...] """ } // Show help message if (params.help) { helpMessage() exit 0 }shell command:nextflow run main.nf --helpThe
publishDirdirective accepts arguments likemodeandpatternto fine tune its behavior, e.g.output: file("${label}/short_summary.specific.*.txt") publishDir "${params.outdir}/BUSCOResults/${label}/", mode: 'copy', pattern: "${label}/short_summary.specific.*.txt"DSL2 allows piping, e.g.
workflow { res = Channel .fromPath(params.query) .splitFasta(by: 1, file:true) | runBlast res.collectFile(name: 'blast_output_combined.txt', storeDir: params.outdir) }Add a timeline report to the output with
timeline { enabled = true file = "$params.outdir/timeline.html" }(in
nextflow.config).Add a detailed execution report with
report { enabled = true file = "$params.outdir/report.html" }(in
nextflow.config).Include a profile-specific configuration file
nextflow.configprofiles { slurm { includeConfig './configs/slurm.config' } }configs/slurm.configprocess { executor = 'slurm' clusterOptions = '-N 1 -n 16 -t 24:00:00' }and use it via
nextflow run main.nf -profile slurmSimilarly, refer to a test profile, specified in a separate file:
nextflow.configtest { includeConfig './configs/test.config' }Adding a manifest to
nextflow.configmanifest { name = 'isugifNF/tutorial' author = 'Andrew Severin' homePage = 'www.bioinformaticsworkbook.org' description = 'nextflow bash' mainScript = 'main.nf' version = '1.0.0' }Using a
labelfor a process allows granular control of a process’ configurationmain.nfprocess runBlast { label 'blast' }nextflow.configprocess { executor = 'slurm' clusterOptions = '-N 1 -n 16 -t 02:00:00' withLabel: blast { module = 'blast-plus' } }- The
labelhas to be placed before theinputsection.
- The
Loading a
modulespecifically for a processprocess runBlast { module = 'blast-plus' publishDir "${params.outdir}/blastout" input: path queryFile from queryFile_ch . . . // these three dots mean I didn't paste the whole process. }Enabling
dockerin thenextflow.configdocker { docker.enabled = true }- The docker container can be specified in the process, e.g.
container = 'ncbi/blast'or
container = `quay.io/biocontainers/blast/2.2.31--pl526he19e7b1_5`- We can include additional options to pass to the container as well:
containerOptions = "--bind $launchDir/$params.outdir/config:/augustus/config"projectDirrefers to the directory where the main workflow script is located. (It used to be calledbaseDir.)Refering to local directories from within a docker container: create a channel
- Working in containers, we need a way to pass the database file location directly into the runBlast process without the need of the local path.
Repeating a process over each element of a channel with
each: input repeatersTurning a queue channel into a value channel, which can be used multiple times.
- A value channel is implicitly created by a process when it is invoked with a simple value.
- A value channel is also implicitly created as output for a process whose inputs are all value channels.
- A queue channel can be converted into a value channel by returning a single value, using e.g.
first,last,collect,count,min,max,reduce,sum, etc. For example: therunBlastprocess receives three inputs in the following example:- the
queryFile_chqueue channel, with multiple sequences. - the
dbDir_chvalue channel, created by calling.first(), which is reused for all elements ofqueryFile_ch - the
dbName_chvalue channel, which is also reused for all elements ofqueryFile_ch
- the
workflow { channel.fromPath(params.dbDir).first() .set { dbDir_ch } channel.from(params.dbName).first() .set { dbName_ch } queryFile_ch = channel .fromPath(params.query) .splitFasta(by: 1, file:true) res = runBlast(queryFile_ch, dbDir_ch, dbName_ch) res.collectFile(name: 'blast_output_combined.txt', storeDir: params.outdir) }
Additional resources

This work is licensed under a Creative Commons Attribution 4.0 International License.