I have a snakemake pipeline that runs a very memory intensive script in a HPC envt. In order to NOT clog up the HPC, I would like to run this script one by one for each sample.
Right now this is what I have in the rules, and the script is called for each sample all at once.
Is there a workaround to call the script one by one for each sample ?
Edited to add - I have other rules in the pipeline such as fastqc which i can run on all the samples together without issues. Its just this one rule that has to run one sample at a time.
rule all:
expand(join(RESULTSDIR,"out","{sample}","aligned/inter.hic"),sample=SAMPLES),
rule call_script:
input:
R1=join(RESULTSDIR,"out_trim","{sample}_trim.R1.fastq.gz"),
R2=join(RESULTSDIR,"out_trim","{sample}_trim.R2.fastq.gz"),
params: #all the parameters
output:
hic=join(RESULTSDIR,'out',"{sample}","aligned/inter.hic"),
shell:
#call script - but script is called for each {sample} at the same time.
Need this script to be called on one sample at a time
rule fastqc:
# okay to run on all samples in parallel
you can specify the number of threads a rule requires. If the number exceeds what is provided by "--cores" it will be scaled down.
So you could set it to a value higher than the number of cores you have available, ensuring that only one instance of the rule is running at a given time.