I have a list of dfs:
my_list <- list(structure(list(col1 = c("v1", "v2", "v3", "V2", "V1"), col2 = c("wood", NA, "water", NA, "water"), col3 = c("cup", NA, "fork", NA, NA), col4 = c(NA, "pear", "banana", NA, "apple")), class = "data.frame", row.names = c(NA, -5L)), structure(list(col1 = c("v1", "v2"), col2 = c("wood", NA), col4 = c(NA, "pear")), class = "data.frame", row.names = c(NA, -2L)), structure(list(col1 = c("v1", "v2", "v3", "V3"), col3 = c("cup", NA, NA, NA), col4 = c(NA, "pear", "banana", NA)), class = "data.frame", row.names = c(NA, -4L)))
my_list
[[1]]
col1 col2 col3 col4
1 v1 wood cup <NA>
2 v2 <NA> <NA> pear
3 v3 water fork banana
4 v2 <NA> <NA> <NA>
5 v1 water <NA> apple
[[2]]
col1 col2 col4
1 v1 wood <NA>
2 v2 <NA> pear
[[3]]
col1 col3 col4
1 v1 cup <NA>
2 v2 <NA> pear
3 v3 <NA> banana
4 v3 <NA> <NA>
I want to replace NA with "VAL" in col3 only, and only if col1 is v2 or v3.
I found solutions to replace NA in certain columns, but not in certain columns and other conditions (or only for a single df, not for a list of dfs.)
Note that col2 or col3 do not necessarily exist in all dfs.
I need a solution with lapply(list, function), ideally.
Desired output:
[[1]]
col1 col2 col3 col4
1 v1 wood cup <NA>
2 v2 <NA> VAL pear
3 v3 water fork banana
4 v2 <NA> VAL <NA>
5 v1 water <NA> apple
[[2]]
col1 col2 col4
1 v1 wood <NA>
2 v2 <NA> pear
[[3]]
col1 col3 col4
1 v1 cup <NA>
2 v2 VAL pear
3 v3 VAL banana
4 v3 VAL <NA>
In such cases
forloops can be much faster.Benchmark
Runs 80% faster, which is quite significant. Demonstrated on a list with just 1,000 elements.
Benchmark code
Edit
Add a
"col3"that is filled with"VAL"if none exists yet:Data: