How to run a sample one by one in snakemake

41 Views Asked by At

I have a snakemake pipeline that runs a very memory intensive script in a HPC envt. In order to NOT clog up the HPC, I would like to run this script one by one for each sample.

Right now this is what I have in the rules, and the script is called for each sample all at once.

Is there a workaround to call the script one by one for each sample ?

Edited to add - I have other rules in the pipeline such as fastqc which i can run on all the samples together without issues. Its just this one rule that has to run one sample at a time.

rule all:
expand(join(RESULTSDIR,"out","{sample}","aligned/inter.hic"),sample=SAMPLES),

rule call_script:
input:
        R1=join(RESULTSDIR,"out_trim","{sample}_trim.R1.fastq.gz"),
        R2=join(RESULTSDIR,"out_trim","{sample}_trim.R2.fastq.gz"), 
params: #all the parameters
output: 
        hic=join(RESULTSDIR,'out',"{sample}","aligned/inter.hic"),
shell:
         #call script - but script is called for each {sample} at the same time. 
Need this script to be called on one sample at a time

rule fastqc:
      # okay to run on all samples in parallel
2

There are 2 best solutions below

0
kEks On

you can specify the number of threads a rule requires. If the number exceeds what is provided by "--cores" it will be scaled down.

So you could set it to a value higher than the number of cores you have available, ensuring that only one instance of the rule is running at a given time.

rule call_script:  
    threads: 100
0
Troy Comi On

One issue with using threads to limit the number of jobs is in an HPC environment if you start using a scheduler, that rule will create a job requesting 100 cores. Limiting the total number of jobs will throttle the rest of your jobs. I think the easiest thing to do is use a custom resource to limit how many instances you want running. I use this frequently for say, downloading data or creating giant temp files.

rule call_script:
resources:
    script_limit=1

A better name would probably indicate why you are limiting the execution, e.g. large intermediate files. Then when you execute you set how many jobs you want to run at once:

snakemake --resources script_limit=1

Other rules can share this resource (say another rule should also be limited) and you can come up with complicated shares, maybe one is twice as "expensive" as another so its script_limit=2. If you want to release more jobs, you can modify that during execution snakemake --resources script_limit=3.

Finally note that if you do not specify a limit on the command line or profile, it is effectively ignored.