scbirlab/nf-promotermap

scbirlab/nf-promotermap maps Illumina sequences to bacterial genomes and calls peaks.

Table of contents

Processing steps
Requirements
Quick start
Inputs
Outputs
Issues, problems, suggestions
Further help

Processing steps

The pipeline carries out the following steps, given a sample sheet (see below):

Downloads reference genome and annotations from NCBI
Trims adapter sequences from Illumina reads using cutadapt
Aligns to reference genome using either bowtie2 or minimap2
Plot gene start coverage with deeptools.
Call peaks across all samples with MACS3.
Annotate peaks with nearest genes.
Generate FASTA of peak sequences.
Calculate coverage of each peak for each bin.
Calculate coverage variance across bins.
Calculate per-base coverage within each peak for each bins and mean and variance across bins.

Work in progress

Identify elements associated with strength and variance.
Identify common sequence motifs in those elements.

Other steps

Get FASTQ quality metrics with fastqc.
Calculate coverage and other with samtools.
Compile the logs of processing steps into an HTML report with multiqc.

Requirements

Software

You need to have Nextflow and either Anaconda, Singularity, or Docker installed on your system.

Crick users (and other HPC users)

If you're at the Crick or your shared cluster has it already installed, try:

module load Nextflow Singularity

Everyone else: installing Nextflow

Otherwise, if it's your first time using Nextflow on your system and you have Conda installed, you can install it using conda:

conda install -c bioconda nextflow

You may need to set the NXF_HOME environment variable. For example,

mkdir -p ~/.nextflow
export NXF_HOME=~/.nextflow

To make this a permanent change, you can do something like the following:

mkdir -p ~/.nextflow
echo "export NXF_HOME=~/.nextflow" >> ~/.bash_profile
source ~/.bash_profile

Quick start

Make a sample sheet (see below) and, optionally, a nextflow.config file in the directory where you want the pipeline to run. Then run Nextflow.

nextflow run scbirlab/nf-promotermap -latest

If you want to run a particular tagged version of the pipeline, such as v0.0.3, you can do so using

nextflow run scbirlab/nf-promotermap -r v0.0.3

For help, use nextflow run scbirlab/nf-promotermap --help.

The first time you run the pipeline, the software dependencies in environment.yml will be installed. This may take several minutes.

Inputs

The following parameters are required:

sample_sheet: path to a CSV with information about the samples and FASTQ files to be processed
fastq_dir: path to where FASTQ files are stored
control_label: the bin ID (from sample sheet) of background controls

The following parameters have default values which can be overridden if necessary.

inputs = "inputs" : The folder containing your inputs.
outputs = "outputs" : The folder to containing the pipeline outputs.
trim_qual = 5: Minimum base-call quality for trimming.
min_length = 9: Discard reads shorter than this number of bases after trimming.
mapper = "bowtie2": Alignment tool.

The parameters can be provided either in the nextflow.config file or on the nextflow run command.

Here is an example of the nextflow.config file:

params {
    sample_sheet = "/path/to/sample-sheet.csv"
    inputs = "/path/to/inputs"
    fastq_dir = "/path/to/fastq"
    control_label = "U" // bin_id of your background control

    mapper = "minimap2"
}

Alternatively, you can provide the parameters on the command line:

nextflow run scbirlab/nf-promotermap \
    --sample_sheet /path/to/sample-sheet.csv \
    --inputs /path/to/inputs \
    --fastq_dir /path/to/fastq \
    --control_label U \
    --mapper minimap2

Sample sheet

The sample sheet is a CSV file providing information about which FASTQ files belong to which sample.

The file must have a header with the column names below (in any order), and one line per sample to be processed. You can have additional columns eith extra information if you like.

expt_id: Unique name of a peak-calling experiment. Peaks will be called across all samples with the same experiment ID.
sample_id: Unique name of the sample within an experiment. FASTQ files under the same sample ID will be combined.
bin_id: Unique name of a bin within an experiment. Sample IDs under the same bin will be pooled before coverage analysis.
fastq_pattern: Partial filename that matches at least both R1 and R2 FASTQ files for a sample in the fastq_dir (defined above).
genome_accession: The NCBI assembly accession number for the genome for alignment and annotation. This number starts with "GCF_" or "GCA_".
adapter_read1_3prime: the 3' adapter on the forward read to trim. The adapter itself and sequences downstream will be removed.
adapter_read2_3prime: the 3' adapter on the reverse read to trim. The adapter itself and sequences downstream will be removed.
adapter_read1_5prime: the 5' adapter on the forward read to trim. The adapter itself and sequences upstream will be removed.
adapter_read2_5prime: the 5' adapter on the reverse read to trim. The adapter itself and sequences upstream will be removed.

Here is an example of the sample sheet:

expt_id	sample_id	bin_id	fastq_pattern	genome_accession	adapter_read1_3prime	adapter_read2_3prime	adapter_read1_5prime	adapter_read2_5prime
expt-01	01-Unsorted	U	G5512A22_R	GCF_904425475.1	ATTAACCTCCTAATCGTGCGT	CTACCGCCTTGCTGCTGCGT	ACGCAGCAGCAAGGCGG	ACGCACGATTAGGA
expt-01	01-Red1	Red1	G5512A23_R	GCF_904425475.1	ATTAACCTCCTAATCGTGCGT	CTACCGCCTTGCTGCTGCGT	ACGCAGCAGCAAGGCGG	ACGCACGATTAGGA

Cutadapt format

Example inputs

You cna find some examples in the test directory of this repository.

Outputs

Outputs are saved in the directory specified by --outputs (outputs by default). They are organised into these directories:

bigwig: Coverage bigwig files
coverage: Coverage of peaks per bin
genome: Reference genomes and annotations
mapped: BAM files of mapped Illumina reads
multiqc: HTML reports from the outputs of intermediate steps
peaks: peak calls
samtools: Coverage and other metrics
trimmed: Trimming logs and FASTQ files.

Issues, problems, suggestions

If you run into problems not covered here, add to the issue tracker.

Further help

Here are the help pages of the software used by this pipeline.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.github/workflows		.github/workflows
modules		modules
test		test
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
main.nf		main.nf
nextflow.config		nextflow.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scbirlab/nf-promotermap

Processing steps

Work in progress

Other steps

Requirements

Software

Crick users (and other HPC users)

Everyone else: installing Nextflow

Quick start

Inputs

Sample sheet

Cutadapt format

Example inputs

Outputs

Issues, problems, suggestions

Further help

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

scbirlab/nf-promotermap

Processing steps

Work in progress

Other steps

Requirements

Software

Crick users (and other HPC users)

Everyone else: installing Nextflow

Quick start

Inputs

Sample sheet

Cutadapt format

Example inputs

Outputs

Issues, problems, suggestions

Further help

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages