I am trying to create snakemake rule that take in input my fastq files and return in output a .sam file for each fastq file.
I have a file like this:
FILE TYPE SM LB ID PU PL
xfgh.fastq.gz Single IND1 IND1 IND1 Platform Illumina
IND2.fastq.gz Single IND2 IND2 IND2 Platform Illumina
zfgv.fastq.gz Single IND3 IND3 IND3 Platform Illumina
IND4_P1.fastq.gz Single IND4 IND4 IND4 Platform Illumina
So I did something like that.
I open my dataframe with pandas:
pd.read_csv("info_file.txt") and I stock in a list the columns file SM and ID
and i create my rule:
rule all:
input:
sam_file = expand("ALIGNEMENT/{sm}/{id}.sam", sm = info_df["SM"], id = info_df["ID"])
rule alignement:
input:
fastq_files = "PATH/TO/{fastq}"
output:
sam_file = "ALIGNEMENT/{sm}/{id}.sam"
I know input and output need to have the same wildcards but does there exist a method to have my input from the columns "FILES" of my file.txt and in output a path like that : "ALIGNEMENT/{sm}/{id}.sam" where {sm} and {id} are SM and ID columns of my file.txt
I also want to launch one rule per files.
If any one can help me thanks you
From the above it seems to me that you want to add
zipto theexpandfunction in ruleall. With zip you pair wildcards as they appear in your input lists, without it you get all combinations of{id}and{sm}.Then to get the input fastq file in rule
alignment, you need to query the info dataframe to get the FILE corresponding to a givenid. You can do this with a lambda function or write a dedicated function to use as input.Here's my take on it: