How to perform calculations on rows based on selected data frame columns in R

33 Views Asked by At

I have a data frame with mixed character and numeric values in different columns. I want to perform a calculation (similar to summary()) on values in rows but using only a selection of columns and put them into new columns. I have a vector of column names to use in columns <- colnames(df)[from_column:to_column]. Calculating columns with values in a given row is easy since there is rowSums()

df$n <- rowSums(!is.na(df[ , columns]))

similarly for mean with rowMeans()

df$mean <- rowMeans(df[ , columns], na.rm = TRUE) (I have NAs)

however, I want to get min(), median(), and max() as well, with

df$min <- min(df[ , columns], na.rm = TRUE)
df$median <- median(df[ , columns], na.rm = TRUE)
df$min <- max(df[ , columns], na.rm = TRUE)

but the min() and max() put the same value for all rows, and median() returns Error in median.default(df[, columns], na.rm = TRUE) : need numeric data which is even more intriguing for me as it is the same subset df[ , columns] and other functions compute the values but for median() they aren't numeric?!

Can anyone help me with the calculations and give a hint of what is wrong with median()?

Best regards, Marcin

0

There are 0 best solutions below