Calculate 95 % credible interval for multiple columns in data frame R

2.8k Views Asked by At

I need to get/calculate the 95 % credible interval for my data. My data consists of ten columns and over 5000 rows. Here is some example data.

data <- data.frame(A = c(-7.595932, -6.451768, -4.682111, -8.781488, -4.251690), 
                   B = c(0.8324450, 0.9451657, 0.8773759, 0.6044753, 0.6553995),
                   C = c(22.747480, 15.477470, 18.745407, 9.622865, 21.137619), 
                   D = c(-11.684762, -13.474299, -9.783277, -7.747501, -12.352081))

I am just not sure which function to use since I get different results each time and it only works with one column at a time. I have tried the following functions:

ci(data$`A`, confidence = 0.95)  ## R package gmodels

and

CI(data$`A`, confidence = 0.95) ##R package Rmisc

Have anyone else experienced the same problem?

3

There are 3 best solutions below

2
user2974951 On BEST ANSWER

If you want a credible interval (from Bayesian statistics) this requires some additional tuning, choice of prior and likelihood. There are some defaults already in some functions, so you may get away with it, but you should really know what you are doing, before blindly applying such concepts. Here is an example for demonstration purposes.

library(bayestestR)

data <- data.frame(A = c(-7.595932, -6.451768, -4.682111, -8.781488, -4.251690), 
                   B = c(0.8324450, 0.9451657, 0.8773759, 0.6044753, 0.6553995),
                   C = c(22.747480, 15.477470, 18.745407, 9.622865, 21.137619), 
                   D = c(-11.684762, -13.474299, -9.783277, -7.747501, -12.352081))

sapply(data,ci,ci=0.95)

        A         B         C        D        
CI      95        95        95       95       
CI_low  -8.662932 0.6095677 10.20833 -13.36208
CI_high -4.294732 0.9383867 22.58649 -7.951079
0
Allan Cameron On

It's not clear if this is what you are looking for, but you can get a print-out of the mean of each variable with the 95% confidence interval for the mean like this:

lapply(data, function(x) {
   paste0(round(mean(x), 2), " (95% CI: ",
   paste(round(sort(mean(x) + c(1.96, -1.96) * sd(x)/sqrt(length(x))), 2),
         collapse = " to "), ")")
 } )

#> $A
#> [1] "-6.35 (95% CI: -8.03 to -4.67)"
#>
#> $B
#> [1] "0.78 (95% CI: 0.65 to 0.91)"
#>
#> $C
#> [1] "17.55 (95% CI: 12.98 to 22.11)"
#>
#> $D
#> [1] "-11.01 (95% CI: -12.99 to -9.03)"
1
denis On

The two functions give you actually the same thing:

library(gmodels)
library(Rmisc)
 CI(data$A)
    upper      mean     lower 
-3.975568 -6.352598 -8.729627 
 ci(data$A, confidence = 0.95)
  Estimate   CI lower   CI upper Std. Error 
-6.3525978 -8.7296274 -3.9755682  0.8561414 

To apply it on all columns, use lapply or sapply:

> sapply(data,CI)
              A         B        C          D
upper -3.975568 0.9648266 24.01143  -8.198957
mean  -6.352598 0.7829723 17.54617 -11.008384
lower -8.729627 0.6011180 11.08091 -13.817811