I'm trying to write a for loop to create tables using sjt.xtab() so it iterates through every variation in a dataframe. Ideally this would be generalizable to all other dataframes too so a function would probably be better.
I have a dataframe called df_mod:
'data.frame': 849 obs. of 17 variables:
$ amazon : Factor w/ 4 levels "0","1","2","3": 2 2 2 1 3 2 3 3 2 2 ...
$ manhattan : Factor w/ 2 levels "Manhattan","Other": 2 2 2 1 2 2 2 2 2 2 ...
$ income : Factor w/ 5 levels "$25-49k","$50 - 74k",..: 2 4 4 1 3 2 3 5 5 2 ...
$ phone : Factor w/ 2 levels "0","1": 2 2 2 2 2 1 2 2 2 2 ...
$ gender : Factor w/ 2 levels "0","1": 1 1 1 1 1 2 2 2 1 1 ...
$ age : Factor w/ 6 levels "18-24","25-34",..: 2 2 3 6 5 6 3 2 3 3 ...
$ education : Factor w/ 3 levels "College","Graduate",..: 2 3 3 1 3 1 1 2 2 2 ..
and tried:
xtab_list <- list()
# Iterate through the columns in the dataframe
for (i in 1:ncol(df_mod)) {
for (j in i+1:ncol(df_mod)) {
# Calculate the contingency table for each pair of columns
xtab <- sjt.xtab(df_mod[,names(df_mod)[i]], df_mod[,names(df_mod)[j]])
# Append the contingency table to the list
xtab_list[[paste0(names(df_mod)[i], names(df_mod)[j])]] <- xtab
}
}
and receive this error:
Error in `[.data.frame`(df_mod, , names(df_mod)[j]) :
undefined columns selected
Also tried it without the names(df_mod) but received the same error.
It works when when I write out individual columns so it's not the column type (all factors):
sjt.xtab(df_mod$gender, df_mod$education)
so I'm not sure what I'm doing wrong, especially since it's bad coding to do each one by one and I'd much rather do it once properly. Thank you!
The code in the original post fails to produce expected results due to a subtle error in the line:
It executes as "j in i plus (1:ncol(df_mod))." That is, R evaluates the
:operator before the binary+operator. This is documented in R: Operator Syntax and Precedence in the R documentation.What was originally intended would be written as:
For example, when
iis 1, the originalfor()loop forjiterates from 2 toncol(df_mod)+ 1, which points to a nonexistent column.Example: pairs of columns in mtcars
We can loop through the
mtcarsdata frame to generate pairs of columns that are needed for a set of cross tabs. For now we'll ignore the underlying data types to illustrate how to generate the combinations of columns via a nest of 2for()loops.Since a crosstab of a variable with itself is not particularly helpful, we'll end the
for(i in ...)loop atncol(mtcars) - 1.We'll print the last 6 rows of messages to show how the sequence ends.
An Alternate Approach
Another way to solve this problem is to generate all the combinations of the desired variables, and process them in an
apply()function.We'll create a reproducible example with the
mtcarsdata frame.At this point we have a 2 row 10 column matrix that represents the unique combinations of the 5 categorical variables in the
mtcarsdata frame.Next, we'll use
lapply()to loop through the matrix and generate the cross tabs, saving them to a list calledtheTabs.The anonymous function within
lapply()takes three arguments. The first,x, is the sequence from 1 toncol()of the matrix containing the pairs of variables for which we will generate 2-way contingency tables. Thelapply()function will call the anonymous functionncol()times.The second argument,
y, represents the matrix of column combinations.The third argument,
zrepresents the data frame where the columns to be tabulated are stored.Finally, we print the first item in the list.
...and the output:
Hmm... those variable labels look a bit funky, so we'll add a
var.labelsargument to make the output easier to read....and the output:
Extracting the names of the factor columns in a data frame to use as
theColumnsis left as an exercise for the reader.