I'm having trouble getting my program to work under all circumstances and anyone who has expertise in biology and coding should be able to tell me where I'm going wrong. I am attempting to create a program that asks a few questions about biological molecules. First, it asks the user if the DNA/mRNA strand is in the 5' to 3' direction. Then it will ask if the molecule is DNA or RNA. If it is DNA, it will then ask whether we are going to read the template or coding strand in order to find the resulting mRNA. Then the program will take the mRNA molecule and read it in the 5' to 3' direction and determine the amino acid sequence. The problem is that the program seems to work for the mRNA regardless of direction, but breaks when reading the DNA molecule in certain directions. I have attached an image showing a few examples so you can see where it fails. I am trying to get the correct MetPheIle amino acid sequence from all 6 conditions. I will also attach a rough picture in paint of the overview in case the code is confusing.
Here is my code:
#rules for converting any DNA strand to its complementary RNA strand
def complement_base(base):
if base == 'A':
return 'U'
elif base == 'T':
return 'A'
elif base == 'C':
return 'G'
elif base == 'G':
return 'C'
else:
return ''
#converts dna strands to mRNA so they can be transcribed
def convert_to_mrna(dna_strand, is_template_strand, reverse_sequence=False):
if is_template_strand:
mrna_strand = ''.join(complement_base(base) for base in dna_strand)
else:
mrna_strand = dna_strand.replace('T', 'U')
return mrna_strand
#takes the mRNA sequence and sets rules for start and stop points and how to read the strand
def translate_mrna_to_amino_acid(mrna_strand, codon_table):
start_codon = "AUG"
stop_codons = {"UAA", "UAG", "UGA"}
amino_acid_sequence = ""
translating = False
index = 0
while index < len(mrna_strand):
codon = mrna_strand[index:index + 3]
if codon == start_codon:
translating = True
if translating:
if codon in stop_codons:
break
amino_acid = codon_table.get(codon, "-")
amino_acid_sequence += amino_acid
index += 3
return amino_acid_sequence
#function for getting the mRNA sequence by input type, and reversing it if it is initially in the 3' to 5' direction because the RNA is translated in the 5' to 3' direction
def get_mrna_sequence(reverse_sequence=False):
valid_bases = {'A', 'U', 'C', 'G'}
while True:
option = input("Enter '1' to input mRNA sequence manually, '2' to upload a file: ")
if option == "1":
mrna_sequence = input("Enter the mRNA sequence (only A, U, C, G): ").upper()
if all(base in valid_bases for base in mrna_sequence):
if reverse_sequence:
mrna_sequence = mrna_sequence[::-1] # Reverse the sequence
return mrna_sequence
else:
print("Invalid sequence! Please use only A, U, C, and G.")
elif option == "2":
file_name = input("Enter the file name with the mRNA sequence: ")
try:
with open(file_name, "r") as file:
mrna_sequence = file.read().replace("\n", "").upper()
if all(base in valid_bases for base in mrna_sequence):
if reverse_sequence:
mrna_sequence = mrna_sequence[::-1] # Reverse the sequence
return mrna_sequence
else:
print("Invalid sequence in file! Please use only A, U, C, and G.")
except FileNotFoundError:
print("File not found!")
else:
print("Invalid option!")
return mrna_sequence
#similiar to the code above but is the DNA sequence that will be converted to mRNA when thymine is replaced with uracil
def get_dna_sequence():
valid_bases = {'A', 'T', 'C', 'G'}
while True:
option = input("Enter '1' to input DNA sequence manually, '2' to upload a file: ")
if option == "1":
dna_sequence = input("Enter the DNA sequence (only A, T, C, G): ").upper()
if all(base in valid_bases for base in dna_sequence):
return dna_sequence
else:
print("Invalid sequence! Please use only A, T, C, and G.")
elif option == "2":
file_name = input("Enter the file name with the DNA sequence: ")
try:
with open(file_name, "r") as file:
dna_sequence = file.read().replace("\n", "").upper()
if all(base in valid_bases for base in dna_sequence):
return dna_sequence
else:
print("Invalid sequence in file! Please use only A, T, C, and G.")
except FileNotFoundError:
print("File not found!")
else:
print("Invalid option!")
#the main function and codon table to translate sequence. It asks a few questions, 1)Is the molecule on the 5' to 3' direction? 2) is it RNA or DNA? 3) If DNA, is it the coding or template strand? 4) will the sequence be entered manually or a text file? 5) enter the dna sequence. And then is attempting to give the amino acid sequence from this.
def main():
direction = input("Is the molecule in the 5' to 3' direction? (yes/no): ").lower()
molecule_type = input("Is the molecule DNA or RNA? ").lower()
if molecule_type == 'dna':
sequence_type = input("Is it the template strand or the coding strand? ").lower()
if sequence_type == 'template strand':
dna_sequence = get_dna_sequence()
mrna_sequence = convert_to_mrna(dna_sequence, is_template_strand=True, reverse_sequence=(direction == 'no'))
elif sequence_type == 'coding strand':
dna_sequence = get_dna_sequence()
mrna_sequence = convert_to_mrna(dna_sequence, is_template_strand=False)
else:
print("Invalid input!")
elif molecule_type == 'rna':
mrna_sequence = get_mrna_sequence(reverse_sequence=(direction == 'no'))
else:
print("Invalid input!")
# Example codon table mapping
codon_table = {
"UUU": "Phe", "UUC": "Phe", "UUA": "Leu", "UUG": "Leu",
"CUU": "Leu", "CUC": "Leu", "CUA": "Leu", "CUG": "Leu",
"AUU": "Ile", "AUC": "Ile", "AUA": "Ile", "AUG": "Met",
# ... other codons and their respective amino acids
}
if mrna_sequence:
resulting_amino_acids = translate_mrna_to_amino_acid(mrna_sequence, codon_table)
print("Resulting amino acid sequence:", resulting_amino_acids)
if __name__ == "__main__":
main()
Tests :
5' 3' Template AAATCAGATAAACAT -> metpheile FAIL
3'5' template TACAAATAGACTAAA -> metpheile PASS
5'3' coding ATGTTTATCTGATTT -> metpheile PASS
3'5' coding TTTAGTCTATTTGTA -> metpheile FAIL
5'3' mrna AUGUUUAUCUGAUUU -> metpheile PASS
3'S'mrna UUUAGUCUAUUUGUA -> metpheile PASS
The mRNA tests work so there must be some problem with reversing the DNA sequences. The 5' to 3' coding strand and the 5' to 3' mRNA strand should be the same with T replaced with U. The 3' to 5' coding strand should be reverse with T replaced with U and something isn't right in my code, either i'm not reversing the strand correctly or im calling the wrong function at the wrong time. I am new to this so I may be having trouble with how to reverse and translate. The 5' to 3' template will give a resulting mRNA molecule in the 3'to 5' direction, and i should have to reverse the resulting mRNA strand, and you can see this one failed too. The 3' to 5' template should give a 5' to 3' mRNA strand, and this one passed, so I have deduced its a problem with the reversing function but I'm not sure where to put it. I have tried to reverse it under the get_mRNA_sequence function but failed. I know this is a lot, but help would be greatly appreciated. If there is any problem with my understanding of DNA or RNA that would be appreciated too. Thank you!
The problem is in both the
convert_to_mrnafunction andmainfunction.In the
mainfunction, these two conditions herecan't tell the difference between a
coding strandin the5' -> 3'direction and acoding strandin the3' -> 5'direction. To differentiate between the two, you can rewrite these two conditions into one:In the
convert_to_mrnafunction, you currently aren't using the argument toreverse_sequenceat all:For
3' -> 5'coding strands, or5' -> 3'template strands, you need to do a reverse complement transcription to get a5' -> 3'mRNA sequence ready for translation. That means you have to reverse the sequence before or after performing base complement substitution. The easiest way to do this is with an exclusive or (XOR) check given the information about the given DNA sequence;[whether to reverse the sequence] = [DNA is a template strand] XOR [DNA is in the 3' -> 5' direction].XORis implemented in Python given 2 boolean operands using the^operator, so you only need to add two lines. In the following, the sequence reversal is implemented before converting to mRNA (you could also instead reverse the sequence after conversion to mRNA):Anyway, I would re-think how you name some of these parameters and variables, as naming variables properly helps a lot when you want to focus on developing a correct algorithm.
For example, I wouldn't name that parameter to
convert_to_mrnaasreverse_sequence; whether or not you actually reverse the sequence is dependent also on whether the DNA sequence is a template or coding strand. You should name it something likeis_3_to_5instead.