How to convert a loop to a Job-array in LSF cluster

253 Views Asked by LDT At 05 October 2022 at 18:48

I have 100 files, and I want to parallelise my submission to save time instead of running jobs one by one. How can I change this script to a Job-array in LSF using bsub submission system and run 10 jobs at every time?

#BSUB -J ExampleJob1         #Set the job name to "ExampleJob1"
#BSUB -L /bin/bash           #Uses the bash login shell to initialize the job's execution environment.
#BSUB -W 2:00                #Set the wall clock limit to 2hr
#BSUB -n 1                   #Request 1 core
#BSUB -R "span[ptile=1]"     #Request 1 core per node.
#BSUB -R "rusage[mem=5000]"  #Request 5000MB per process (CPU) for the job
#BSUB -M 5000                #Set the per process enforceable memory limit to 5000MB.
#BSUB -o Example1Out.%J      #Send stdout and stderr to "Example1Out.[jobID]"

path=./home/

for each in *.bam 
do 
samtools coverage ${each} -o ${each}_coverage.txt
done

Thank you for your time; any help is appreciated. I am a starter at LSF and am quite confused.

Original Q&A

There are 1 best solutions below

Steve On 06 October 2022 at 00:03 BEST ANSWER

You tagged your question with nextflow, so I will provide a minimal (untested) solution using Nextflow by enabling the LSF executor. By using Nextflow, we can abstract away the underlying job submission system and focus on writing the pipeline however trivial. I think this approach is preferable, but it does place a dependency on Nextflow. I think it's a small one and maybe it's overkill for your current requirements, but Nextflow comes with other benefits, like being able to modify and resume when those requirements inevitably change.

Contents of main.nf:

params.bam_files = './path/to/bam_files/*.bam'
params.publish_dir = './results'


process samtools_coverage {

    tag { bam.baseName }

    publishDir "${params.publish_dir}/samtools/coverage", mode: 'copy'

    cpus 1
    memory 5.GB
    time 2.h

    input:
    path bam

    output:
    path "${bam.baseName}_coverage.txt"

    """
    samtools coverage \\
        -o "${bam.baseName}_coverage.txt" \\
        "${bam}"
    """
}

workflow {

    bam_files = Channel.fromPath( params.bam_files )

    samtools_coverage( bam_files )
}

Contents of nextflow.config:

process {

    executor = 'lsf'
}

Run using:

nextflow run main.nf

Note also:

LSF supports both per-core and per-job memory limit. Nextflow assumes that LSF works in the per-core memory limits mode, thus it divides the requested memory by the number of requested cpus.

This is not required when LSF is configured to work in per-job memory limit mode. You will need to specified that adding the option perJobMemLimit in Scope executor in the Nextflow configuration file.

How to convert a loop to a Job-array in LSF cluster

There are 1 best solutions below

Related Questions in PARALLEL-PROCESSING

Related Questions in CLUSTER-COMPUTING

Related Questions in HPC

Related Questions in LSF

Related Questions in NEXTFLOW

Trending Questions

Popular # Hahtags

Popular Questions