Removing text from a fasta gene name between two characters

185 Views Asked by At

I have a large codon alignment that has a variety of gene names in the headers. The headers are in the following format:

>ENST00000357033.DMD.-1 | CODON | REFERENC

I want to modify all of the headers in the fasta to exclude all characters after the first "." and before the first "|". Desired outcome:

>ENST00000357033 | CODON | REFERENC

I've tried a few sed commands, no dice. Any advice? I'm averse to using awk, since I'd like to keep the formatting of the alignment and awk scares me.

Thank you!

2

There are 2 best solutions below

0
Pierre On BEST ANSWER
sed '/^>/s/\.[^ ]* / /'

for each line starting with a '>' replace 'dot' followed by some char different from spaces followed by a space, by a space.

0
RARE Kpop Manifesto On

no neeed to be scared by awk:

mawk NF=NF FS='[.][^ ]+' OFS=    

>ENST00000357033 | CODON | REFERENC