How to write for loop for sjt.xtab in R, a df of factors?

Question

How to write for loop for sjt.xtab in R, a df of factors?

69 Views Asked by user1366487 At 22 April 2023 at 18:12

I'm trying to write a for loop to create tables using sjt.xtab() so it iterates through every variation in a dataframe. Ideally this would be generalizable to all other dataframes too so a function would probably be better.

I have a dataframe called df_mod:

'data.frame':   849 obs. of  17 variables:
 $ amazon       : Factor w/ 4 levels "0","1","2","3": 2 2 2 1 3 2 3 3 2 2 ...
 $ manhattan    : Factor w/ 2 levels "Manhattan","Other": 2 2 2 1 2 2 2 2 2 2 ...
 $ income       : Factor w/ 5 levels "$25-49k","$50 - 74k",..: 2 4 4 1 3 2 3 5 5 2 ...
 $ phone        : Factor w/ 2 levels "0","1": 2 2 2 2 2 1 2 2 2 2 ...
 $ gender       : Factor w/ 2 levels "0","1": 1 1 1 1 1 2 2 2 1 1 ...
 $ age          : Factor w/ 6 levels "18-24","25-34",..: 2 2 3 6 5 6 3 2 3 3 ...
 $ education    : Factor w/ 3 levels "College","Graduate",..: 2 3 3 1 3 1 1 2 2 2 ..

and tried:

xtab_list <- list()

# Iterate through the columns in the dataframe
for (i in 1:ncol(df_mod)) {
  for (j in i+1:ncol(df_mod)) {
    # Calculate the contingency table for each pair of columns
    xtab <- sjt.xtab(df_mod[,names(df_mod)[i]], df_mod[,names(df_mod)[j]])
    # Append the contingency table to the list
    xtab_list[[paste0(names(df_mod)[i], names(df_mod)[j])]] <- xtab
  }
}

and receive this error:

Error in `[.data.frame`(df_mod, , names(df_mod)[j]) :
undefined columns selected

Also tried it without the names(df_mod) but received the same error.

It works when when I write out individual columns so it's not the column type (all factors):

sjt.xtab(df_mod$gender, df_mod$education)

so I'm not sure what I'm doing wrong, especially since it's bad coding to do each one by one and I'd much rather do it once properly. Thank you!

Original Q&A

There are 1 best solutions below

**Len Greski** · Accepted Answer · 2023-04-22T19:00:10.517000

The code in the original post fails to produce expected results due to a subtle error in the line:

for (j in i+1:ncol(df_mod)){ ... }

It executes as "j in i plus (1:ncol(df_mod))." That is, R evaluates the : operator before the binary + operator. This is documented in R: Operator Syntax and Precedence in the R documentation.

What was originally intended would be written as:

for (j in (i+1):ncol(df_mod)){ ... }

For example, when i is 1, the original for() loop for j iterates from 2 to ncol(df_mod) + 1, which points to a nonexistent column.

Example: pairs of columns in mtcars

We can loop through the mtcars data frame to generate pairs of columns that are needed for a set of cross tabs. For now we'll ignore the underlying data types to illustrate how to generate the combinations of columns via a nest of 2 for() loops.

Since a crosstab of a variable with itself is not particularly helpful, we'll end the for(i in ...) loop at ncol(mtcars) - 1.

for(i in 1:(ncol(mtcars) - 1)){
     for(j in (i+1):ncol(mtcars)){
          message(paste("i is:",names(mtcars)[i],"j is:",names(mtcars)[j]))
     }
}

We'll print the last 6 rows of messages to show how the sequence ends.

i is: vs j is: am
i is: vs j is: gear
i is: vs j is: carb
i is: am j is: gear
i is: am j is: carb
i is: gear j is: carb

An Alternate Approach

Another way to solve this problem is to generate all the combinations of the desired variables, and process them in an apply() function.

We'll create a reproducible example with the mtcars data frame.

# categorical variables in mtcars are vs, am, cyl, gear, carb
theColumns <- c("vs","am","cyl","gear","carb")

library(sjPlot)

# generate combinations of the categorical variables for xtabs 
theCombinations <- combn(theColumns,2)

At this point we have a 2 row 10 column matrix that represents the unique combinations of the 5 categorical variables in the mtcars data frame.

theCombinations

> theCombinations
     [,1] [,2]  [,3]   [,4]   [,5]  [,6]   [,7]   [,8]   [,9]   [,10] 
[1,] "vs" "vs"  "vs"   "vs"   "am"  "am"   "am"   "cyl"  "cyl"  "gear"
[2,] "am" "cyl" "gear" "carb" "cyl" "gear" "carb" "gear" "carb" "carb"
>

Next, we'll use lapply() to loop through the matrix and generate the cross tabs, saving them to a list called theTabs.

# for each column in theCombinations, run the xtab 
theTabs <- lapply(1:ncol(theCombinations),function(x,y,z){
     sjt.xtab(z[[y[1,x]]],z[[y[2,x]]])
},theCombinations,mtcars)

The anonymous function within lapply() takes three arguments. The first, x, is the sequence from 1 to ncol() of the matrix containing the pairs of variables for which we will generate 2-way contingency tables. The lapply() function will call the anonymous function ncol() times.

The second argument, y, represents the matrix of column combinations.

The third argument, z represents the data frame where the columns to be tabulated are stored.

Finally, we print the first item in the list.

# print the first table 
theTabs[[1]]

...and the output:

Hmm... those variable labels look a bit funky, so we'll add a var.labels argument to make the output easier to read.

# add some variable labels 
theTabs <- lapply(1:ncol(theCombinations),function(x,y,z){
     sjt.xtab(z[[y[1,x]]],z[[y[2,x]]], var.labels = c(y[1,x],y[2,x]))
},theCombinations,mtcars)

# print the first table 
theTabs[[1]]

...and the output:

Extracting the names of the factor columns in a data frame to use as theColumns is left as an exercise for the reader.

How to write for loop for sjt.xtab in R, a df of factors?

There are 1 best solutions below

Example: pairs of columns in mtcars

An Alternate Approach

Related Questions in R

Related Questions in FOR-LOOP

Related Questions in SJPLOT

Related Questions in CONTINGENCY

Trending Questions

Popular # Hahtags

Popular Questions