perl check for valid DNA sequence with regex

1.1k Views Asked by ic23oluk At 14 June 2017 at 09:41

I want to write a subroutine that takes a FASTA file as an argument and prints out the sequence (without the header). The subroutine should check if the sequence contains any other letters than DNA bases (A, T, G, C).

Here's my code:

scalar_sequence ("sequence.fa");

sub scalar_sequence {
    my $file = $_[0];
    my $sequence;
    open (READ, $file) || die "Cannot open $file: $!.\n";
    while (<READ>){
        if (/^>/){
            next;
        } 
        if (/^[ATCG]/){
            $sequence .= $_;
        } else {
            die "invalid sequence\n";
        }
    }
    print $sequence, "\n";
}

When I run this code, I get 'invalid sequence' as output. When I leave the 'else' out, it prints out the sequence even when the sequence contains another letter.

What's the problem?

Thanks in advance!

Original Q&A

There are 1 best solutions below

mkHun On 14 June 2017 at 09:46 BEST ANSWER

The problem is here /^[ATCG]/ this line should be /^[ATCG]+$/

Your code should be

chomp;  
next if (/^>/); # skip for header
next if(/^\s*$/);  #skip for empty line
if (/^[ATCG]+$/){
        $sequence .= $_;
    } else {
        die "invalid sequence\n";
    }

You are only consider the beginning of the line start wit A or T or G or C. You should expand the matches.

perl check for valid DNA sequence with regex

There are 1 best solutions below

Related Questions in REGEX

Related Questions in PERL

Related Questions in BIOINFORMATICS

Related Questions in DNA-SEQUENCE

Related Questions in BIOPERL

Trending Questions

Popular # Hahtags

Popular Questions