I am working with 20 different genes, that all have the same three domains : NC (non-cytoplasmic), C (cytoplasmic) and TM (transmembrane). I extracted the number of substituions that occurred on each of these genes, and for each domain.
What I want to understand is if the TM domain is less likely to mutate than other domains.
Because genes have a different number of substituions, and domains have a different length, I first, divided the number of substitutions by the domain length, to obtain the proportion of substituions per domain (for each gene). So for each gene I have three values. Then, for each gene, I summed these three values to have a scaling factor (lets call it SF). And then, for each gene, I divided the proportion of substituions per domain by this scaling factor. Here is the result (Y-axis is normalized proportion of substitution per domain, X-axis are the domains, colors are the different genes) :
But now I don't know what statistic to use to look at if there is significantly less substituions in the TM domain than in other domains ... Wilcoxon or t-test don't seems appropriate because I work with normalized proportions.
Thank you for any help or suggestions !
All the best,
Max
