How to deal with the peptide sequences that have atypical amino acids in the seuqnces?

75 Views Asked by S.EB At 05 March 2023 at 11:33

I am not a bioinformatician and my question may sound basic.

I have some issues with RDKit The issue: there are some sequences that have X in the antimicrobial peptide sequence. Seems that RDKit cannot process these cases. For example the following sequences: seq = 'HFXGTLVNLAKKIL', 'HFLGXLVNLAKKIL', 'HFLGTLVNXAKKIL', 'fPVXLfPXXL', 'SRWPSPGRPRPFPGRPKPIFRPRPXNXYAPPXPXDRW'...], and the Chem.MolFromSequence(seq[i]) returns None for these cases.

My question is how do deal with this kind of sequence?

Original Q&A

There are 1 best solutions below

Tarquinius On 06 March 2023 at 08:34

Let me explain the reason for the output of None

As you can see in this list of abbreviations for peptide sequences the letter "X" stands for "unknown". Basically the real amino acid could not be discovered there. Therefore RDKit can not create a mol object of your data, because parts of it are unknown.

RETURNS:

a Mol object, None on failure.

Source of quote above

Since RDKit's managing of this case is logically reasonable you have to answer your question yourself: "How do I deal with unknown amino acids?". You need a preprocessing of those sequences and maybe replace the "X" with something else, or delete that sequence entirely from your dataframe. But this depends on your own usecase.

How to deal with the peptide sequences that have atypical amino acids in the seuqnces?

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in BIOINFORMATICS

Related Questions in FINGERPRINT

Related Questions in RDKIT

Related Questions in CHEMINFORMATICS

Trending Questions

Popular # Hahtags

Popular Questions