I am looping a few frequency tables with the freq() command in summarytools and printing the results. In doing so, I noticed that when I am trying to save the freq() object without missing values and convert it to a data frame, the total observations still keeps the missing values.
# Create a vector with 10 observations of "smoker"
smoker <- c("yes", "no", "yes", NA, NA, NA, "yes", "no", "yes", "no")
# Create a DataFrame using the vector
df <- data.frame(smoker)
library(summarytools)
library(dplyr)
# Create a frequency table without missing values
freq(df$smoker, report.nas = FALSE)
# Try to save this table into a data frame
table <- as.data.frame(freq(df$smoker, report.nas = FALSE)) # OR
table <- df %>% freq(smoker, report.nas = FALSE) %>% as.data.frame()
table
The results should look like this (missing values excluded, n=7):
Freq % % Cum.
no 3 42.86 42.86
yes 4 57.14 100.00
Total 7 100.00 100.00
But after saving it to a data.frame, it looks like this (missing values added back on, with total n=10):
Freq % Valid % Valid Cum. % Total % Total Cum.
no 3 42.85714 42.85714 30 30
yes 4 57.14286 100.00000 40 70
<NA> 3 NA NA 30 100
Total 10 100.00000 100.00000 100 100
This seems like a bug but not sure if this is the expected outcome. Any thoughts on how to save this output as a data.frame? I'm hoping to loop the data frame and add kable styling.
Using
report.nasonly affects the printing of theNAvalues, not the storage of them. If we store thefreqobject assee:You can see it prints the values as desired:
But it stores them with the
NAvalues:So you will still need to subset to get what you want, this approach is simply using
!is.na()on the percent valid column: