Modify VCF FORMAT field from GT to DS manually?

163 Views Asked by At

I have a VCF file with FORMAT field as GT but providing the dosage (DS) data. Having the dosage data is actually what I intend to have but the software that is supposed to get the file as it's input confuses and raises the error:

Exception in thread "main" java.lang.IllegalArgumentException: VCF record format error: 1       131079320     BHD0100437271      G       A       .       PASS    .       GT
at vcf.VcfRecGTParser.ninthTabPos(VcfRecGTParser.java:87)
at vcf.VcfHeader.isDiploid(VcfHeader.java:73)
at vcf.RefIt.<init>(RefIt.java:130)
at vcf.RefIt.create(RefIt.java:97)
at vcf.RefTargSlidingWindow.refIt(RefTargSlidingWindow.java:122)
at vcf.RefTargSlidingWindow.<init>(RefTargSlidingWindow.java:81)
at vcf.RefTargSlidingWindow.instance(RefTargSlidingWindow.java:70)
at main.Main.slidingWindow(Main.java:129)
at main.Main.main(Main.java:107)

Is it technically correct to do the modification manually and change the FORMAT column into GT:DS or DS ??

1

There are 1 best solutions below

1
Daniel King On

If you just have dosages, you could try changing the format field to GP. You can’t have GT:GP unless you have two values (a GT and a GP) for each sample.

I think dosage is usually encoded as GP for genotype probabilities. You should probably take a quick look at the VCF 4.3 spec to make sure you’ve got the right syntax. https://samtools.github.io/hts-specs/VCFv4.3.pdf

The tool you’re using might not support dosages though. The VCF format supports a very broad set of datasets. Most tools are designed to work with just a few kinds of datasets.