Better splitting of mutliallelic sites then bcftools norm --m-any

927 Views Asked by At

I am trying to split the multiallelic sites of my VCF. I used bcftools norm --m-any. However, the result is not really reasonable to me. Here's an example.

Let's say, I have this multiallelic site:

REF     ALT     GT1     GT2     GT3
A       C,G     1/2     0/2     0/1

After splitting I get these two:

REF     ALT     GT1     GT2     GT3
A       C       1/0     0/0     0/1
A       G       0/1     0/1     0/0

So, the results for the "unused" ALT allele for a specific row is just set to REF. Is there a way to change this behavior, since I don't think it's reasonable to do it this way, at least for my analysis. I would like my result to be more like this:

REF     ALT     GT1     GT2     GT3          GT1     GT2     GT3
A       C       1/.     0/.     0/1    or    ./.     ./.     0/1
A       G       ./1     0/1     0/.          ./.     0/1     ./.

Or similar. At least I don't want to have REF where there was an ALT before.

1

There are 1 best solutions below

3
ekerde On

Have you try bcftools norm -a . ?

You can also check the --atom-overlaps option: 'Alleles missing because of an overlapping variant can be set either to missing (.) or to the star alele (*), as recommended by the VCF specification.'