how to modify individual codes in a vcf file

274 Views Asked by Khaleesi95 At 05 May 2023 at 18:29

I have genotypes of over 20k individuals in a vcf file got after imputation. I'll give you an example of the aspect of this vcf file, with only 7 samples:

#CHROM   POS       ID            REF   ALT    QUAL    FILTER     FORMAT      INFO    0_0_473294.CEL      0_0_347293_v2.CEL       0_0_9588393_RS.CEL        0_0_999444_rp.CEL       0_0_26:9494949.CEL     0_0_237485_RS_rp.CEL    0_0_27:484848.CEL
16       11781     rs549521730    G     C       .       PASS    IMPUTED       GP

So, starting from column 10, genotypes of individuals start. Now, I need to modify individual code of this vcf file, so as to have a vcf file with the following aspect:

#CHROM   POS       ID            REF   ALT    QUAL    FILTER     FORMAT      INFO    473294     347293       9588393        999444       9494949     237485     484848
 16     11781     rs549521730    G     C       .       PASS    IMPUTED       GP

Therefore, I need only serial numbers, without the flanking stuff, like .CEL, _RS, 26:, and so on.

Do you know a tool, like bcftools, being able to re-annotate sample codes of a vcf file? Or is it possible to do it in bash? Thank you!

Original Q&A

There are 2 best solutions below

rndy On 03 June 2023 at 20:07

If I'm reading your question correctly it looks like you just want to change the column names?

It looks like there are a lot of different formats to the column sample names; How you go about converting those to just the number you want will depend on the specifics but will probably involve regex. I'm not sure your example has enough info to answer that part.

I'd recommend something like making a single-line header text file (header.txt), making a new vcf file from it (output.vcf), and appending all but the header line of the input vcf file (input.vcf) to the new file.

cp header.txt output.vcf
tail -n +2 input.vcf >> output.vcf

ekerde On 28 June 2023 at 21:37

If you are not comfortable with unix commands, I'll recommend you to use bcftools reheader (to modify the header of a vcf). To change sample names, the command line is:

bcftools reheader --samples <new names file> -o <output> <input>

how to modify individual codes in a vcf file

There are 2 best solutions below

Related Questions in BASH

Related Questions in VCF-VARIANT-CALL-FORMAT

Related Questions in BCFTOOLS

Trending Questions

Popular # Hahtags

Popular Questions