How to str_detect a pattern in the pipe to relocate columns

53 Views Asked by At

I am struggling to relocate the descriptive statistics columns close to the original columns

iris2 <- iris %>% dplyr::select(c(2:4)) %>% dplyr::mutate(across(.cols = where(is.numeric), .fns = list(
        n_miss = ~ sum(is.na(.x)),
        mean   = ~ mean(.x, na.rm = TRUE),
        median = ~ median(.x, na.rm = TRUE),
        min    = ~ min(.x, na.rm = TRUE),
        max    = ~ max(.x, na.rm = TRUE)
      )))

As you can see I have the descriptive columns generated after computing descriptive stats are located at the end of the dataframe, and I would like to be ordered.

So I have 2 questions related; I will paste some of the syntax (none of them works):

#First: how to pass the pattern using str_detect or grep to relocate descriptive columns

iris2 <- iris2 %>% dplyr::relocate(str_detect(colnames, "Sepal.Width"), .after =   Sepal.Width)

# Second: would it be possible to do it for all columns, passing the name in a vector 

iris2 <- iris2 %>% dplyr::relocate(str_detect(colnames, c("Sepal.Width", "Petal.Width"), .after = list(Sepal.Width,  Petal.Width)))

The expected output would be something like this for every column (I paste for Sepal.Width)

expected_output <- structure(list(Sepal.Width = c(3.5, 3, 3.2, 3.1, 3.6, 3.9), Sepal.Width_mean = c(3.05733333333333, 
3.05733333333333, 3.05733333333333, 3.05733333333333, 3.05733333333333, 
3.05733333333333), Sepal.Width_median = c(3, 3, 3, 3, 3, 3), 
    Sepal.Width_min = c(2, 2, 2, 2, 2, 2), Sepal.Width_max = c(4.4, 
    4.4, 4.4, 4.4, 4.4, 4.4), Petal.Length = c(1.4, 1.4, 1.3, 
    1.5, 1.4, 1.7)), row.names = c(NA, 6L), class = "data.frame")

I haven't tried the arrange function because I expected to perform these operations in the pipe, but if not possible I will do it in two steps

Thank you!

2

There are 2 best solutions below

0
Onyambu On BEST ANSWER

Use

nms <- sub("_.*", "", names(iris2))
iris2[, order(ordered(nms, unique(nms)))]

If you want to pipe this use:

iris2 %>%
   select(order(str_remove(names(.), "_.*") %>%
          ordered(unique(.))))

And the structure looks as shown below:

'data.frame':   150 obs. of  18 variables:
 $ Sepal.Width        : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Sepal.Width_n_miss : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Sepal.Width_mean   : num  3.06 3.06 3.06 3.06 3.06 ...
 $ Sepal.Width_median : num  3 3 3 3 3 3 3 3 3 3 ...
 $ Sepal.Width_min    : num  2 2 2 2 2 2 2 2 2 2 ...
 $ Sepal.Width_max    : num  4.4 4.4 4.4 4.4 4.4 4.4 4.4 4.4 4.4 4.4 ...
 $ Petal.Length       : num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Length_n_miss: int  0 0 0 0 0 0 0 0 0 0 ...
 $ Petal.Length_mean  : num  3.76 3.76 3.76 3.76 3.76 ...
 $ Petal.Length_median: num  4.35 4.35 4.35 4.35 4.35 4.35 4.35 4.35 4.35 4.35 ...
 $ Petal.Length_min   : num  1 1 1 1 1 1 1 1 1 1 ...
 $ Petal.Length_max   : num  6.9 6.9 6.9 6.9 6.9 6.9 6.9 6.9 6.9 6.9 ...
 $ Petal.Width        : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Petal.Width_n_miss : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Petal.Width_mean   : num  1.2 1.2 1.2 1.2 1.2 ...
 $ Petal.Width_median : num  1.3 1.3 1.3 1.3 1.3 1.3 1.3 1.3 1.3 1.3 ...
 $ Petal.Width_min    : num  0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 ...
 $ Petal.Width_max    : num  2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 ...
0
r2evans On

One approach is a multi-string order, first ordering on the name itself, then on the portion after the _.

select(iris2, order(
  sub("_.*", "", colnames(iris2)),
  match(sub(".*_", "", colnames(iris2)), c("n_miss", "mean", "median", "min", "max"), nomatch=0))
) |>
  str()
# 'data.frame': 150 obs. of  18 variables:
#  $ Petal.Length       : num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#  $ Petal.Length_n_miss: int  0 0 0 0 0 0 0 0 0 0 ...
#  $ Petal.Length_mean  : num  3.76 3.76 3.76 3.76 3.76 ...
#  $ Petal.Length_median: num  4.35 4.35 4.35 4.35 4.35 4.35 4.35 4.35 4.35 4.35 ...
#  $ Petal.Length_min   : num  1 1 1 1 1 1 1 1 1 1 ...
#  $ Petal.Length_max   : num  6.9 6.9 6.9 6.9 6.9 6.9 6.9 6.9 6.9 6.9 ...
#  $ Petal.Width        : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
#  $ Petal.Width_n_miss : int  0 0 0 0 0 0 0 0 0 0 ...
#  $ Petal.Width_mean   : num  1.2 1.2 1.2 1.2 1.2 ...
#  $ Petal.Width_median : num  1.3 1.3 1.3 1.3 1.3 1.3 1.3 1.3 1.3 1.3 ...
#  $ Petal.Width_min    : num  0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 ...
#  $ Petal.Width_max    : num  2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 ...
#  $ Sepal.Width        : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#  $ Sepal.Width_n_miss : int  0 0 0 0 0 0 0 0 0 0 ...
#  $ Sepal.Width_mean   : num  3.06 3.06 3.06 3.06 3.06 ...
#  $ Sepal.Width_median : num  3 3 3 3 3 3 3 3 3 3 ...
#  $ Sepal.Width_min    : num  2 2 2 2 2 2 2 2 2 2 ...
#  $ Sepal.Width_max    : num  4.4 4.4 4.4 4.4 4.4 4.4 4.4 4.4 4.4 4.4 ...