How can I perform ADMIXTURE analysis from a genlight object in R?

122 Views Asked by At

I have a sequencing dataset generated by Diversity Arrays that I’ve been analysing with the dartR package in R. My data is in a genlight object and after filtering has 1920 SNPs, 23 individuals and 4 populations. I want to investigate the admixture between these populations and plot those in a bar graph. I'm new to analyzing genetic data and unfortunately I'm a bit stuck with the analysis.

I have used the functions within the dartR package for almost everything, to calculate Fst values, network analysis, etc. But to my knowledge they don’t have a function to do admixture analysis. Therefore, I converted my genlight object into a geno format to use the snmf function (sparse Non-Negative Matrix Factorization; Frichot et al., 2014) within the R Bioconductor package LEA (Frichot, 2015) to estimate the number of genetic clusters within the dataset. However, the results differ quite a lot between a regularization parameter (alpha) of 100 or 1000. In the vignette on the function sNMF, alpha is explained as the value of the regularization parameter (by default: 10), which penalizes intermediate ancestry proportions. Because there’s no specific rule of choosing alpha I want to compare the admixture coefficients with a likelihood-based method like ADMIXTURE.

However, I’m having some trouble finding out which function to use. Reading online, there are different packages and functions like Plink and ADMIXTOOLS. Would anyone have a recommendation which method to use?

I have been trying with both and haven’t been able to make it work yet. For the Plink function I have downloaded the Plink.exe files and placed those in my working directory. However, when I try to convert my genlight object into a vcf format using

gl2vcf(gl5, plink_path = getwd(), outfile = "gl_vcf", outpath=getwd()) 

And then using

% plink --file hapmap --recode12 --out hapmap 
% admixture hapmap3.ped 3 

The gl2vcf() function gives me the error Error in system(..., intern = T) : “…” not found

Despite this error I do have two newly created files: gl_plink_temp.map and gl_plink_temp.ped in my working directory file. But for admixture analysis I believe I need the *.bed, *.bim and *.fam files.

I’m trying to follow https://gaworkshop.readthedocs.io/en/latest/contents/07_admixture/admixture.html and use the info on admixture from /projects1/tools/admixture_1.3.0/admixture-manual.pdf.

If anyone has any suggestions to perform a likelihood-based method (preferably admixture) in R to estimate admixture coefficients from a genlight object and plot the outcome that would be greatly appreciated!

Thanks in advance, Chiara

0

There are 0 best solutions below