Am I following the correct procedures with the dunn.test function?

1.4k Views Asked by At

I tested differences among sampling sites in terms of abundance values using kruskal.test. However, I want to determine the multiple differences between sites.

The dunn.test function has the option to use a vector data with a categorical vector or use the formula expression as lm.

I write the function in the way to use in a data frame with many columns, but I have not found an example that confirms my procedures.

library(dunn.test)

df<-data.frame(a=runif(5,1,20),b=runif(5,1,20), c=runif(5,1,20))

kruskal.test(df)

dunn.test(df)

My results were:

Kruskal-Wallis chi-squared = 6.02, df = 2, p-value = 0.04929  

Kruskal-Wallis chi-squared = 6.02, df = 2, p-value = 0.05  

      Comparison of df by group                           

      Between 1 and 2   2.050609, 0.0202
      Between 1 and 3  -0.141421, 0.4438
      Between 2 and 3  -2.192031, 0.0142
1

There are 1 best solutions below

0
On BEST ANSWER

I took a look at your code, and you are close. One issue is that you should be specifying a method to correct for multiple comparisons, using the method argument.

Correcting for Multiple Comparisons

For your example data, I'll use the Benjamini-Yekutieli variant of the False Discovery Rate (FDR). The reasons why I think this is a good performer for your data are beyond the scope of StackOverflow, but you can read more about it and other correction methods here. I also suggest you read the associated papers; most of them are open-access.

library(dunn.test)

set.seed(711) # set pseudorandom seed

df <- data.frame(a = runif(5,1,20),
                 b = runif(5,1,20), 
                 c = runif(5,1,20))

dunn.test(df, method = "by") # correct for multiple comparisons using "B-Y" procedure

# Output
data: df and group
Kruskal-Wallis chi-squared = 3.62, df = 2, p-value = 0.16


                           Comparison of df by group                           
                             (Benjamini-Yekuteili)                             
Col Mean-|
Row Mean |          1          2
---------+----------------------
       2 |   0.494974
         |     0.5689
         |
       3 |  -1.343502  -1.838477
         |     0.2463     0.1815

alpha = 0.05
Reject Ho if p <= alpha/2

Interpreting the Results

The first row in each cell provides the Dunn's pairwise z test statistic for each comparison, and the second row provides your corrected p-values.

Notice that, once corrected for multiple comparisons, none of your pairwise tests are significant at an alpha of 0.05, which is not surprising given that each of your example "sites" was generated by exactly the same distribution. I hope this has been helpful. Happy analyzing!

P.S. In the future, you should use set.seed() if you're going to construct example dataframes using runif (or any other kind of pseudorandom number generation). Also, if you have other questions about statistical analysis, it's better to ask at: https://stats.stackexchange.com/