Snakemake: Handle paired reads in a directory

Question

Snakemake: Handle paired reads in a directory

37 Views Asked by Anna At 06 March 2024 at 07:41

There is a directory with paired reads, the names and number of pairs are unknown, the possible file extensions are .fastq or .fastq.gz.

I need advice on how best to get paired reads in a directory for processing in the snakemake pipeline.

Original Q&A

There are 1 best solutions below

**Cade** · Accepted Answer · 2024-03-06T18:09:32.570000

There are multiple ways to go about this. My preferred method is to use a CSV to specify read paths. An added benefit of this is that you can add metadata to each set of reads that you can then access in your workflow.

Here is an example for mapping reads for three samples: A, B, C

Directory structure:

.
├── Snakefile
├── data
│   ├── genome
│   │   └── ref.fa
│   └── reads
│       ├── a_R1.fastq.gz
│       ├── a_R2.fastq.gz
│       ├── b_R1.fastq
│       ├── b_R2.fastq
│       ├── c_R1.fq.gz
│       └── c_R2.fq.gz
└── sample_reads.csv

sample_reads.csv:

sample_id,read1,read2
a,data/reads/a_R1.fastq.gz,data/reads/a_R2.fastq.gz
b,data/reads/b_R1.fastq,data/reads/b_R2.fastq
c,data/reads/c_R1.fq.gz,data/reads/c_R2.fq.gz

Snakefile:

import pandas as pd

sample_reads = pd.read_csv("./sample_reads.csv")


rule all:
    """
    Get all sample_ids from dataframe and use in expand to make sam 
    files for each sample.
    """
    input:
        expand(
            "results/alignments/{sample}.sam",
            sample=sample_reads["sample_id"].tolist(),
        ),


rule align_reads:
    """
    Use sample wildcard to lookup read paths in pandas dataframe.
    Need to use input function to access sample wildcard value.
    """
    input:
        read_1=lambda wc: sample_reads[sample_reads["sample_id"] == wc.sample][
            "read1"
        ].item(),
        read_2=lambda wc: sample_reads[sample_reads["sample_id"] == wc.sample][
            "read2"
        ].item(),
        ref="data/genome/ref.fa",
    output:
        sam="results/alignments/{sample}.sam",
    shell:
        "bwa mem {input.ref} {input.read_1} {input.read_2} > {output.sam}"

Snakemake: Handle paired reads in a directory

There are 1 best solutions below

Related Questions in BIOINFORMATICS

Related Questions in SNAKEMAKE

Trending Questions

Popular # Hahtags

Popular Questions