get significantly different groups from dunn test in R

1.7k Views Asked by At

In R, I compare groups with the dunn.test. Here is some example data, where "type" is the grouping variable:

my_table <- data.frame ("type" = c (rep ("low", 5), rep ("mid", 5), rep ("high", 5)),
                        "var_A" = rnorm (15),
                        "var_B" = c (rnorm (5), rnorm (5, 4, 0.1), rnorm (5, 12, 2)) 
                        )

I want to compare the variables var_A and var_B among the three groups with the dunn.test (), which puts out the following results:

library (dunn.test)
dunn.test (my_table$var_A, my_table$type)
>  Kruskal-Wallis rank sum test
>
> data: x and group
> Kruskal-Wallis chi-squared = 6.08, df = 2, p-value = 0.05
>
>
> Comparison of x by group                            
> (No adjustment)                                
> Col Mean-|
> Row Mean |       high        low
> ---------+----------------------
>      low |   0.919238
>          |     0.1790
>          |
>      mid |   0.989949   0.070710
>          |     0.1611     0.4718
>
> alpha = 0.05
> Reject Ho if p <= alpha/2

and

dunn.test (my_table$var_B, my_table$type)
> Kruskal-Wallis rank sum test
>
> data: x and group
> Kruskal-Wallis chi-squared = 12.5, df = 2, p-value = 0
>
>
> Comparison of x by group                            
> (No adjustment)                                
> Col Mean-|
> Row Mean |       high        low
> ---------+----------------------
>      low |   3.535533
>          |    0.0002*
>          |
>      mid |   1.767766  -1.767766
>          |     0.0385     0.0385
>
> alpha = 0.05
> Reject Ho if p <= alpha/2

I understand that for var_A, I cannot see any significant differences between the three groups. For var_B, the groups "low" and "high" differ significantly. When presenting the results, I could choose a table like

library (tidyverse)
data.frame ("low" = my_table %>%
                filter (type == "low") %>%
                select (c ("var_A", "var_B")) %>%
                sapply (mean) %>%
                round (digits = 2),
            "mid" = my_table %>%
                filter (type == "mid") %>%
                select (c ("var_A", "var_B")) %>%
                sapply (mean) %>%
                round (digits = 2),
            "high" = my_table %>%
                filter (type == "high") %>%
                select (c ("var_A", "var_B")) %>%
                sapply (mean) %>%
                round (digits = 2 )
                )


>             low    mid   high
> var_A      0.14  -0.10   0.74
> var_B     -0.41   3.97  11.44

What I'd like to achieve is to add characters in order to indicate the results of the dunn.test. This could look something like

>               low         mid         high 
> var_A     0.14  a    -0.10  a      0.74  a
> var_B    -0.41  a     3.97 ab     11.44  b

So, my long but short question is: how can I tell the dunn.test function to put out the grouping-characters (eg. "a", "ab" or "b"). Or is there a workaround to get the desired charaters?

1

There are 1 best solutions below

0
On

Maybe the kruskal() function in the agricolae package might get what you're looking for. Among the output is 'groups' which contain letters corresponding to group. Package details say that post-hoc is done using Fishers LSD though, not Dunn test. But can include p.adj argument for multiple comparisons adjustments

library(tidyverse)
library(agricolae)
library(reshape2)

my_table <- data.frame ("type" = c (rep ("low", 5), rep ("mid", 5), rep ("high", 5)),
                        "var_A" = rnorm (15),
                        "var_B" = c (rnorm (5), rnorm (5, 4, 0.1), rnorm (5, 12, 2)) 
)

# melt in order to use lapply 
my_MeltedTable = melt(my_table, id.vars='type')

# apply kruskal(value,type) across two levels of variable (var_A and var_B)
results = lapply(split(my_MeltedTable[,c("type", "value")], my_MeltedTable$variable), 
       function(x) kruskal(x$value, x$type, p.adj="bon"))

# the grouping information you'd like will be found in
results$var_A$group
results$var_B$group

Probably a way to pull out the things you need from within the lapply() but I don't know how, so here is how I got the table required:

# create empty df for results
resTable <- data.frame(matrix(ncol = 6, nrow = 2))

# results$means contains means of variable per group
# assign col names from row names in results
colnames(resTable) = row.names(results$var_A$means)

# pull out means for var_A & round to 2 digits & transpose as are rows
resTable[1,1:3] = round(digits = 2, t(results$var_A$means[,1])) 
# pull out means for var_B & round to 2 digits & transpose 
resTable[2,1:3] = round(digits = 2, t(results$var_B$means[,1])) 

# results$group contains letters denoting  of variable per group
resTable[1,4:6] = t(results$var_A$group[,2]) # pull out stat grouping for varA
resTable[2,4:6] = t(results$var_B$group[,2]) # pull out stat grouping for varB

resTable = resTable[,c(2,5,3,6,1,4)] # re-order cols
rownames(resTable) = c("var_A", "var_B") # name rows
colnames(resTable) = c("low", " ","med", " ", "high","") # name cols

And after all that long-windedness!

        low    med    high  
var_A  0.12 a 0.40 a -0.76 a
var_B -0.45 b 3.99 c 11.46 a