Advanced configuration

Overview

Teaching: 15 min
Exercises: 5 min

Questions

What functionality and modularity to Nextflow?

Objectives

Understand how to provide a timeline report.

Understand how to obtain a detailed report.

Understand configuration of the executor.

Understand how to use Slurm

We have got to a point where we hopefully have a working pipeline. There is now some configuration options to allow us to explore and modify the behaviour of the pipeline.

The nextflow.config was used earlier to set parameters for the Nextflow script. We can also use it to set a number of different options.

Timeline

To obtain a detailed timeline report add the following to the nextflow.config

timeline {
  enabled = true
  file = "$params.outdir/timeline.html"
}

Notice the user of $params.outdir that can be defined in the params section to a default value such as $PWD/out_dir.

The timeline will look something like:

Timeline of pipeline

Example timeline can be found timeline.html

Report

A detailed execution report can be created using:

report {
  enabled = true
  file = "$params.outdir/report.html"
}

Example can be found report.html

Executors

If we are using a job scheduler where user limits are in place we can define thise to stop Nextflow abusing the scheduler. For example to report the queueSize as 100 and submit 1 job every 10 seconds we would define the executor block as:

executor {
  queueSize = 100
  submitRateLimit = '10 sec'
}

Profiles

To use the executor block as described previously then a profile can be used to define a job scheduler. Within the nextflow.config file define:

profiles {
  slurm { includeConfig './configs/slurm.config' }
}

and within the ./configs/slurm.config define the Slurm settings to use:

process {
  executor = 'slurm'
  clusterOptions = '-A scwXXXX'
}

Where scwXXXX is the project code to use.

This can be used on the command line:

$ nextflow run main.nf -profile slurm

Or the whole definition can be defined within the process we can define the executor and cpus

  executor='slurm'
  cpus=2

You can also define the profile on the command line but add to existing profile such as using -profile slurm when running but setting cpus = 2 in the process. Note for MPI codes you would need to put clusterOptions = '-n 16' for a 16 tasks to use for MPI. Be careful not to override options such as clusterOptions that define the project code.

Manifest

A manifest can describe the workflow and provide a Github location. For example

manifest {
  name = 'ARCCA/intro_nextflow_example'
  author = 'Thomas Green'
  homePage = 'www.cardiff.ac.uk/arcca'
  description = 'Nextflow tutorial'
  mainScript = 'main.nf'
  version = '1.0.0'
}

Where the name is the location on Github and mainScript is the location of the file (default in main.nf).

Try:

$ nextflow run ARCCA/intro_nextflow_example --help

To update from remote locations you can run:

$ nextflow pull ARCCA/intro_nextflow_example

To see existing remote locations downloaded:

$ nextflow list

Finally, to print information about remote you can:

$ nextflow info ARCCA/intro_nextflow_example

Labels

Labels allow to select what the process can use from the nextflow.config or in our case the options in the Slurm profile in ./configs/slurm.config

process {
  executor = 'slurm'
  clusterOptions = '-A scw1001'
  withLabel: python { module = 'python' }
}

Defining a process with the above label 'python' and will load the python module.

Modules

Modules can also be defined in the process (rather than written in the script) with the module directive.

process doSomething {
  module = 'python'
  """
  python3 --version
  """
}

Conda

Conda is a userful software installation and management system and commonly used by various topics. There are a number of ways to use it.

Specify the packages.

process doSomething {
  conda 'bwa samtools multiqc'
  
  '''
  bwa ...
  '''

Specify an environment.

process doSomething {
  conda '/some/path/my-env.yaml'

  '''
  command ...
  '''

Specify a pre-existing installed environment.

process doSomething {
  conda '/some/path/conda/environment'
  
  '''
  command ...
  '''

It is recommended to use conda inside a profile due to there might be another way to access the software such as via docker or singularity.

profiles {
  conda {
    process.conda = 'samtools'
  }

  docker {
    process.container = 'biocontainers/samtools'
    docker.enabled = true
  }
}

Generate DAG

From the command line a Directed acyclic graph (DAG) can show the dependencies in a nice way. Run nextflow with:

$ nextflow run main.cf -with-dag flowchart.png

The flowchart.png will be created and can be viewed.

Hopefully the following page has helped you understand the options to dig deeper into your pipeline and maybe make it more portable by using labels to select what to do on a platform. Lets move onto running Nextflow on Hawk.

Key Points

Much functionality is available but had to be turned on to use it.

previous episode

Introduction to Nextflow

next episode