What is the best way to use expand() with one unknown variable in Snakemake?

52 Views Asked by At

I am currently using Snakemake for a bioinformatics project. Given a human reference genome (hg19) and a bam file, I want to be able to specify that there will be multiple output files with the same name but different extensions. Here is my code

rule gridss_preprocess:
        input:
                ref=config['ref'],
                bam=config['bamdir'] + "{sample}.dedup.downsampled.bam",
                bai=config['bamdir'] + "{sample}.dedup.downsampled.bam.bai"
        output:
                expand(config['bamdir'] + "{sample}.dedup.downsampled.bam{ext}", ext = config['workreq'], sample = "{sample}")

Currently config['workreq'] is a list of extensions that start with "."

For example, I want to be able to use expand to indicate the following files

S1.dedup.downsampled.bam.cigar_metrics
S1.dedup.downsampled.bam.computesamtags.changes.tsv
S1.dedup.downsampled.bam.coverage.blacklist.bed
S1.dedup.downsampled.bam.idsv_metrics

I want to be able to do this for multiple sample files, S_. Currently I am not getting an error when I try to do a dry run. However, I am not sure if this will run properly.

Am I doing this right?

1

There are 1 best solutions below

1
Eric C. On

expand() defines a list of files. If you're using two parameters, the cartesian product will be used. Thus, your rule will define as output ALL files with your extension list for ALL samples. Since you define a wildcard in your input, I think that what you want is all files with your extension for ONE sample. And this rule will be executed as many times as the number of samples.

You're mixing up wildcards and placeholders for the expand() function. You can define a wildcard inside an expand() by doubling the brackets:

rule all:
    input:  expand(config['bamdir'] + "{sample}.dedup.downsampled.bam{ext}", ext = config['workreq'], sample=SAMPLELIST)

rule gridss_preprocess:
    input:
            ref=config['ref'],
            bam=config['bamdir'] + "{sample}.dedup.downsampled.bam",
            bai=config['bamdir'] + "{sample}.dedup.downsampled.bam.bai"
    output:
            expand(config['bamdir'] + "{{sample}}.dedup.downsampled.bam{ext}", ext = config['workreq'])

This expand function will expand in list

  • {sample}.dedup.downsampled.bam.cigar_metrics
  • {sample}.dedup.downsampled.bam.computesamtags.changes.tsv
  • {sample}.dedup.downsampled.bam.coverage.blacklist.bed
  • {sample}.dedup.downsampled.bam.idsv_metrics

and thus define the wildcard sample to match the files in the input.