I have a data frame that is z-score converted. I want to delete from the data frame (and convert to NA) only those values that are higher or equal to 4, without dropping any row or column. I would appreciate an answer.
Best
On
You can use the following code:
df <- data.frame(v1 = c(1,3,6,7,3),
v2 = c(2,1,4,6,7),
v3 = c(1,2,3,4,5))
df
#> v1 v2 v3
#> 1 1 2 1
#> 2 3 1 2
#> 3 6 4 3
#> 4 7 6 4
#> 5 3 7 5
is.na(df) <- df >= 4
df
#> v1 v2 v3
#> 1 1 2 1
#> 2 3 1 2
#> 3 NA NA 3
#> 4 NA NA NA
#> 5 3 NA NA
Created on 2022-07-10 by the reprex package (v2.0.1)
On
Though the solution by @Quinten is very concise, just add an approach in tidyverse
library(dplyr)
set.seed(123)
df <- data.frame(
x = sample(1:10, 7),
y = sample(1:10, 7)
)
df %>%
mutate(
across(.fns = ~ if_else(.x >= 4, NA_integer_, .x))
)
#> x y
#> 1 3 NA
#> 2 NA NA
#> 3 2 1
#> 4 NA 2
#> 5 NA 3
#> 6 NA NA
#> 7 1 NA
Created on 2022-07-10 by the reprex package (v2.0.1)
On
Here is one more. Using replace_with_na_all() from naniar package:
replace_with_na_all() when you want to replace ALL values that meet a condition across an entire dataset. The syntax here is a little different, and follows the rules for rlang’s expression of simple functions. This means that the function starts with ~, and when referencing a variable, you use .x.
https://cran.r-project.org/web/packages/naniar/vignettes/replace-with-na.htmllibrary(naniar)
library(dplyr)
df %>%
replace_with_na_all(condition = ~.x > 4)
v1 v2 v3
<dbl> <dbl> <dbl>
1 1 2 1
2 3 1 2
3 NA 4 3
4 NA NA 4
5 3 NA NA
On
In base R, we can use replace():
df <- replace(df, df > 4, NA_real_)
Output
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 NA NA 3 NA 1 3 1 1 NA NA
2 1 NA 2 NA NA 3 NA NA 2 0
3 NA 1 NA 2 2 1 NA NA 4 1
4 NA NA 0 NA NA NA 0 2 4 NA
5 NA 1 NA 3 0 NA 4 NA 2 3
6 0 3 NA 0 NA NA 1 1 NA 2
7 3 NA NA NA 2 2 NA 2 NA 4
8 NA 1 0 2 NA NA 2 NA NA NA
9 NA 3 NA 2 4 NA NA 0 1 3
10 1 3 NA 3 NA NA 3 4 NA NA
Or use replace in dplyr:
library(dplyr)
df %>%
mutate(across(everything(), ~ replace(.x, .x > 4, NA_real_)))
Data
set.seed(321)
df <- data.frame(replicate(10, sample(0:10, 10, rep = TRUE)))
you can simply use
df[df>=4] <- NAto achieve what you want.