Calculate 95 % credible interval for multiple columns in data frame R

2.8k Views Asked by paula456 At 17 December 2020 at 10:21

I need to get/calculate the 95 % credible interval for my data. My data consists of ten columns and over 5000 rows. Here is some example data.

data <- data.frame(A = c(-7.595932, -6.451768, -4.682111, -8.781488, -4.251690), 
                   B = c(0.8324450, 0.9451657, 0.8773759, 0.6044753, 0.6553995),
                   C = c(22.747480, 15.477470, 18.745407, 9.622865, 21.137619), 
                   D = c(-11.684762, -13.474299, -9.783277, -7.747501, -12.352081))

I am just not sure which function to use since I get different results each time and it only works with one column at a time. I have tried the following functions:

ci(data$`A`, confidence = 0.95)  ## R package gmodels

and

CI(data$`A`, confidence = 0.95) ##R package Rmisc

Have anyone else experienced the same problem?

Original Q&A

There are 3 best solutions below

user2974951 On 17 December 2020 at 10:58 BEST ANSWER

If you want a credible interval (from Bayesian statistics) this requires some additional tuning, choice of prior and likelihood. There are some defaults already in some functions, so you may get away with it, but you should really know what you are doing, before blindly applying such concepts. Here is an example for demonstration purposes.

library(bayestestR)

data <- data.frame(A = c(-7.595932, -6.451768, -4.682111, -8.781488, -4.251690), 
                   B = c(0.8324450, 0.9451657, 0.8773759, 0.6044753, 0.6553995),
                   C = c(22.747480, 15.477470, 18.745407, 9.622865, 21.137619), 
                   D = c(-11.684762, -13.474299, -9.783277, -7.747501, -12.352081))

sapply(data,ci,ci=0.95)

        A         B         C        D        
CI      95        95        95       95       
CI_low  -8.662932 0.6095677 10.20833 -13.36208
CI_high -4.294732 0.9383867 22.58649 -7.951079

Allan Cameron On 17 December 2020 at 10:55

It's not clear if this is what you are looking for, but you can get a print-out of the mean of each variable with the 95% confidence interval for the mean like this:

lapply(data, function(x) {
   paste0(round(mean(x), 2), " (95% CI: ",
   paste(round(sort(mean(x) + c(1.96, -1.96) * sd(x)/sqrt(length(x))), 2),
         collapse = " to "), ")")
 } )

#> $A
#> [1] "-6.35 (95% CI: -8.03 to -4.67)"
#>
#> $B
#> [1] "0.78 (95% CI: 0.65 to 0.91)"
#>
#> $C
#> [1] "17.55 (95% CI: 12.98 to 22.11)"
#>
#> $D
#> [1] "-11.01 (95% CI: -12.99 to -9.03)"

denis On 17 December 2020 at 10:57

The two functions give you actually the same thing:

library(gmodels)
library(Rmisc)
 CI(data$A)
    upper      mean     lower 
-3.975568 -6.352598 -8.729627 
 ci(data$A, confidence = 0.95)
  Estimate   CI lower   CI upper Std. Error 
-6.3525978 -8.7296274 -3.9755682  0.8561414

To apply it on all columns, use lapply or sapply:

> sapply(data,CI)
              A         B        C          D
upper -3.975568 0.9648266 24.01143  -8.198957
mean  -6.352598 0.7829723 17.54617 -11.008384
lower -8.729627 0.6011180 11.08091 -13.817811

Calculate 95 % credible interval for multiple columns in data frame R

There are 3 best solutions below

Related Questions in R

Related Questions in DATAFRAME

Related Questions in MULTIPLE-COLUMNS

Related Questions in CONFIDENCE-INTERVAL

Related Questions in CREDIBLE-INTERVAL

Trending Questions

Popular # Hahtags

Popular Questions