Is there any to generate PSSM matrix from PSI BLAST using the python package BioPython? Indeed, I have 8000 sequences in .fasta file. Every sequence length is also long?
I am using this below code:
for fasta in files:
alignment = AlignIO.read(fasta, "fasta")
summary_align = AlignInfo.SummaryInfo(alignment)
consensus = summary_align.dumb_consensus()
my_pssm = summary_align.pos_specific_score_matrix(consensus, chars_to_ignore = ['N', '-'])
file_pssm = fasta+"pssm"
with open(file_pssm) as f:
f.write(my_pssm)
Is there any better way to do it? The matrix which is shown consisting of 0 and 1 only. I need actual PSSM scoring values (which are in normalized form)
I used the following code to extract PSSM matrices from a fasta file:
This should generate a bunch of files in the output directory, each containing the PSSM matrix of the corresponding sequence + some other information. You camn use the code below to extract the sequence and the PSSM matrix from and reutrn it as numpy array:
Do know that you need to have the BLAST+ command line tool installed and its bin path set to be able to run the code. Do also note that the PSI-BLAST tool generates sometimes two pssm files for some sequence, so you would need to keep track of sequences that've you already extracted.