Splitting out my data when the length of each varies

Question

Splitting out my data when the length of each varies

42 Views Asked by Joe At 13 March 2024 at 10:43

I have a string which I want to split into a dataframe. As you can see from my example and data below I want to split my current data out from a single cell into a position for each number, just using NAs where we have no value. I've tried splitting this out using a gsub but as the length varies from one to three '-' in each cell I don't get the result I am looking for. I thought using unlist(strsplit()) would be good for this but wasnt able to split the original data up correctly, again due to the differing lengths.

mydat
c("1-2 1-1", "1-2 1-1 3-3", "1-1 2-1 4-1", "1-1")

newdat
one   two   three     four    five    six
1     2      1        1       NA       NA
1     2      1        1       3        3
1     1      2        1       4        1
1     1      NA       NA      NA       NA

Original Q&A

There are 3 best solutions below

Maël On 13 March 2024 at 10:57

With tidyr::separate_wider_delim, you can set too_few = "align_start" to deal with varying lengths:

library(tidyr)
library(stringr)

mydat <- data.frame(col1 = c("1-2 1-1", "1-2 1-1 3-3", "1-1 2-1 4-1", "1-1"))
mydat |> 
  separate_wider_delim(col1, delim = regex(" |-"), names = as.character(english::english(1:6)), 
                       too_few = "align_start")

#   one two three four five  six
# 1   1   2     1    1 <NA> <NA>
# 2   1   2     1    1    3    3
# 3   1   1     2    1    4    1
# 4   1   1  <NA> <NA> <NA> <NA>

Muhammad Uzair Kabeer On 13 March 2024 at 10:59

So, In order to do this you have to do the following steps:

Split each each string by space and then by '-'
then you have to find maximum length to the size of array
Prepare a matrix of the required size
Fill the matrix
Display the matrix by making it a dataframe.

If you follow all of the above steps your code should look something like below:

mydat <- c("1-2 1-1", "1-2 1-1 3-3", "1-1 2-1 4-1", "1-1")

# step 1
split_data <- lapply(strsplit(mydat, " "), function(x) {
  sapply(strsplit(x, "-"), function(y) y)
})

# step 2
max_length <- max(sapply(split_data, function(x) length(unlist(x))))

# step 3
aligned_data <- matrix(NA, nrow = length(mydat), ncol = max_length)

# step 4
for (i in seq_along(split_data)) {
  row_data <- unlist(split_data[[i]])
  aligned_data[i, seq_along(row_data)] <- row_data
}

# step 5
data <- as.data.frame(aligned_data)
colnames(data) <- c("one", "two", "three", "four", "five", "six")

print(data)

**jay.sf** · Accepted Answer · 2024-03-13T10:50:32.310000

Using strsplit and adapting the `length<-`s.

> fn <- \(x) {
+   s <- strsplit(x, '-|\\s')
+   t(sapply(s, `length<-`, max(lengths(s))) )|> 
+     as.data.frame() |> type.convert(as.is=TRUE)
+ }
> fn(x) |> setNames(c('one', 'two', 'three', 'four', 'five', 'six'))
  one two three four five six
1   1   2     1    1   NA  NA
2   1   2     1    1    3   3
3   1   1     2    1    4   1
4   1   1    NA   NA   NA  NA

Data:

> dput(x)
c("1-2 1-1", "1-2 1-1 3-3", "1-1 2-1 4-1", "1-1")

Splitting out my data when the length of each varies

There are 3 best solutions below

Related Questions in R

Related Questions in DATAFRAME

Related Questions in VECTOR

Related Questions in STRSPLIT

Trending Questions

Popular # Hahtags

Popular Questions