error in assigning classes to values from a data.frame in R

55 Views Asked by At

Sample of the data being entered as CSV sheet


coupe=data.frame(read.csv(file.choose()))

#UNITS AS PER GIRTH - I want to assign girth classes to each tree for further computation of value


coupe$g.class = NA

for (i in 1:(nrow(coupe))) {
  if (coupe$girth[i] < 60) {
    coupe$g.class[i] = 0.25
  } else if (coupe$girth[i] < 89) {
    coupe$g.class[i] = 0.5
  } else if (coupe$girth[i] < 119) {
    coupe$g.class[i] = 1
  } else if (coupe$girth[i] < 149) {
    coupe$g.class[i] = 2
  } else if (coupe$girth[i] < 179) {
    coupe$g.class[i] = 4
  } else if (coupe$girth[i] < Inf) {
    coupe$g.class[i] = 6
  }
}

Compiling this code gives me the error Error in if (coupe$girth[i] < 60) { : missing value where TRUE/FALSE needed

Any leads on what I am doing wrong? This same block ofcode compiled perfectly on another dataset (different coupe) but isn't doing so for the one I am working with presently

1

There are 1 best solutions below

0
jpsmith On

Since your desired g.class values aren't in a specific pattern, you could use findInterval to make your cuts, then recode them to your desired values.

If you didnt mind your variable being a factor, you could simply do:

df$g.class_fact <- factor(findInterval(df$girth, c(60, 89, 119, 149, 179)),
                          labels = c(0.25, 0.5, 1, 2, 4, 6))

#    species girth quality g.class_fact
# 1      sal   159       N            4
# 2      sal   179       N            6
# 3      sal    14       N         0.25
# 4      sal   195       N            6
# 5      sal   170       N            4
# 6      sal    50       N         0.25

If you wanted them as a numeric values, just wrap that in as.numeric(as.character(...)):

df$g.class_fact <- as.numeric(as.character(factor(findInterval(df$girth, c(60, 89, 119, 149, 179)),
                          labels = c(0.25, 0.5, 1, 2, 4, 6))))

You could also do it by matching on a reference dataframe (here I created two variables, g.class and g.class_ref, just for comparison, but can overwrite instead)

# reference values 
ref_values <- data.frame(interval = 0:5, newvals = c(0.25, 0.5, 1, 2, 4, 6))

df$g.class <- findInterval(df$girth, c(60, 89, 119, 149, 179))
# > df$g.class
#  [1] 4 5 0 5 4 0 2 0 5 5 4 2 2 5 5 2 3 2 1 0

df$g.class_ref <- ref_values$newvals[match(df$g.class, ref_values$interval)]

Output:

#    species girth quality g.class    g.class_ref
# 1      sal   159       N       4           4.00
# 2      sal   179       N       5           6.00
# 3      sal    14       N       0           0.25
# 4      sal   195       N       5           6.00
# 5      sal   170       N       4           4.00
# 6      sal    50       N       0           0.25
# 7      sal   118       N       2           1.00
# ...

And for a tidyverse approach, you could also use dplyr::mutate and dplyr::case_when to recode the values if you didn't want to create a reference data frame/use match:

dplyr::mutate(df, g.class_dplyr = dplyr::case_when(
  g.class == 0 ~ 0.25,
  g.class == 1 ~ 0.5,
  g.class %in% 2:3 ~ g.class - 1,
  g.class == 4 ~ g.class,
  g.class == 5 ~ 6
))

#    species girth quality g.class g.class_ref g.class_dplyr
# 1      sal   159       N       4        4.00          4.00
# 2      sal   179       N       5        6.00          6.00
# 3      sal    14       N       0        0.25          0.25
# 4      sal   195       N       5        6.00          6.00
# 5      sal   170       N       4        4.00          4.00
# ...

Data

set.seed(123)
df <- data.frame(species = "sal",
                 girth = sample(1:200, 20),
                 quality = "N")