Removing text from a fasta gene name between two characters

185 Views Asked by Alexis Brown At 01 July 2022 at 17:44

I have a large codon alignment that has a variety of gene names in the headers. The headers are in the following format:

>ENST00000357033.DMD.-1 | CODON | REFERENC

I want to modify all of the headers in the fasta to exclude all characters after the first "." and before the first "|". Desired outcome:

>ENST00000357033 | CODON | REFERENC

I've tried a few sed commands, no dice. Any advice? I'm averse to using awk, since I'd like to keep the formatting of the alignment and awk scares me.

Thank you!

Original Q&A

There are 2 best solutions below

Pierre On 01 July 2022 at 22:30 BEST ANSWER

sed '/^>/s/\.[^ ]* / /'

for each line starting with a '>' replace 'dot' followed by some char different from spaces followed by a space, by a space.

RARE Kpop Manifesto On 02 July 2022 at 19:08

no neeed to be scared by awk:

mawk NF=NF FS='[.][^ ]+' OFS=    

>ENST00000357033 | CODON | REFERENC

Removing text from a fasta gene name between two characters

There are 2 best solutions below

Related Questions in BIOINFORMATICS

Related Questions in FASTA

Related Questions in SEQUENCE-ALIGNMENT

Trending Questions

Popular # Hahtags

Popular Questions