How can I extract some information from variable labels using map and str_remove_all in R

54 Views Asked by At

I have a data frame of labelled variables imported using the haven package in R. For a subset of variables I want to make use of a part of the variable label. I have a good regex that will work, but I don't understand why the combination of map and str_remove_all is not working here.

#random variables
var1<-sample(seq(1,10,1), size=10, replace=T)
var2<-sample(seq(1,10,1), size=10, replace=T)
#Assign variable labels
library(labelled)
var_label(var1)<-"A long variable label - Some Info"
var_label(var2)<-"Another long variable label - Some Other Info"
#Make dataframe
df<-data.frame(var1, var2)
#Confirm variable labels
var_label(df)
#Try to remove relevant string from each
df %>% 
  var_label() %>% 
#Remove everything but what is desired
  map(., str_remove_all(., ".+ - "))

The out put is just NULL.

What is wrong with using map here. The input is a list and then I provide a function. So what is going on?

2

There are 2 best solutions below

0
Stibu On BEST ANSWER

The second argument of map() must be a function or a formula. So either one of these two works:

df %>% 
  var_label() %>% 
  map(., \(x) str_remove_all(x, ".+ - "))

df %>% 
  var_label() %>% 
  map(., ~str_remove_all(., ".+ - "))

The documentation of map() prefers the first version:

A formula, e.g. ~ .x + 1. You must use .x to refer to the first argument. Only recommended if you require backward compatibility with older versions of R.

0
Vraj Pithwa On

'map' is generally used to apply function to each element of list, however in this case 'var_label()' already return character vetor, not a list.

instead of using 'map' you can directly apply the 'str_remove_all' function to each element of character vector using 'mutate_all' from 'dplyr' package.

library(dplyr)
library(stringr)
# random variables
var1 <- sample(seq(1, 10, 1), size = 10, replace = TRUE)
var2 <- sample(seq(1, 10, 1), size = 10, replace = TRUE)
# Assign variable labels
library(labelled)
var_label(var1) <- "A long variable label - Some Info"
var_label(var2) <- "Another long variable label - Some Other Info"  
# Make dataframe
df <- data.frame(var1, var2)   
# Confirm variable labels
var_label(df)   
# Remove everything but what is desired
df <- df %>%
  mutate(across(everything(), ~str_remove_all(var_label(.), ".+ - "))) 
# Confirm the updated variable labels
var_label(df)