How do I separate values of two variables stored in one column in R?

47 Views Asked by At

I have data that looks like this:

Column A Column B Column C
1 2 3 NA
4 5 6
7 8 9
10 11 12

I am trying to clean in it in R. Each column has to contain one number. In the case of row 1, number 3 belongs to column C but is stored in column B. Likewise, I have column B, the last row, where 12 belongs to column C but is stored in B.

How do I go about cleaning this in R?

I am new to R and haven't encountered a problem like this before.

2

There are 2 best solutions below

1
r2evans On

It looks like it was imported incorrectly, but if you don't have the ability to fix it there (preferred), here's a hack to fix that frame:

quux[] <- t(apply(quux, 1, \(z) setdiff(strsplit(paste(setdiff(z, c(NA, "")), collapse = " "), " ")[[1]], ""))) |>
  as.data.frame() |>
  lapply(type.convert, as.is = TRUE)
quux
#   Column A Column B Column C
# 1        1        2        3
# 2        4        5        6
# 3        7        8        9
# 4       10       11       12
str(quux)
# 'data.frame': 4 obs. of  3 variables:
#  $ Column A: int  1 4 7 10
#  $ Column B: int  2 5 8 11
#  $ Column C: int  3 6 9 12

Data

quux <- structure(list("Column A" = c(1L, 4L, 7L, 10L), "Column B" = c("2 3", "5", "8", "11 12"), "Column C" = c(NA, 6L, 9L, NA)), class = "data.frame", row.names = c(NA, -4L))
0
asd-tm On

Here is a tidyverse solution:

library(tidyverse)
mydf %>% 
  unite(col, everything(), sep = " ")  %>% 
  separate(col, into = paste0("Column", LETTERS[1:length(mydf)]),  sep = " ")

Result:

  ColumnA ColumnB ColumnC
1       1       2       3
2       4       5       6
3       7       8       9
4      10      11      12

For the purpose to avoid warnings you may enhance the script in the following way:

mydf[mydf == ""] <- NA

mydf %>% 
  unite(col, everything(), sep = " ", na.rm = T)  %>% 
  separate(col, into = paste0("Column", LETTERS[1:length(mydf)]),  sep = " ")

If you wish to cast types (and sure that there is no data except integers in your frame) you may add the following line:

... %>% 
  mutate(across(everything(), as.integer))

Used data:

mydf <- 
data.frame(ColumnA = c("1", "4", "7", "10"),
           ColumnB = c("2 3", "5", "8", "11 12"),
           ColumnC = c(NA, "6", "9", "")
           )