R: List with multiple headings - How to split by heading (unequal lines per heading)

88 Views Asked by At

I've a large file that looks like this:

Heading1
1 ABC
2 DEF
Heading2
1 GHI
2 JKL
3 MNO
Heading3
1 PQR
2 STU

The headings have always the same pattern, but the entries below each heading are different — different number of entries, no common pattern, different number of letters and/or words.

I want to split the one list into multiple lists, i.e. a new list for each heading. How can I solve this?

2

There are 2 best solutions below

8
Nate On BEST ANSWER

This is what I would do:

header_positions <- grepl("^Heading", test)
header_positions

grouping_index <- cumsum(header_positions)
grouping_index

li <- split(test[!header_positions], grouping_index[!header_positions])
li

setNames(li, test[header_positions]) # if you want to have fancy names :)

I think the cumsum(grepl(...)) pattern is very useful for this kind of list splitting tasking.

If you want to write out via writeLines() you need to convert the list elements to character vectors with unlist():

for(n in names(li)) {
  writeLines(unlist(li[[n]]), paste0(n, ".txt"))
} 

This is another helpful pattern to iterate over the names of a list, so you can access the names directly (for filenames) and use them to index the list (for the file contents).

0
Haci Duru On

Can you try this?

mylist = list("Heading1", "1 ABC", "2 DEF", "Heading2", "1 GHI", "2 JKL", "3 MNO", "Heading3", "1 PQR", "2 STU")
idx = unlist(lapply(mylist, function(x) as.numeric(regexpr("heading", x, ignore.case=T))))
idx[which(idx == -1)] = 0
idx = cumsum(idx)
myotherlist = vector("list", max(idx))
for (i in 1:length(mylist)) myotherlist[[idx[i]]] = append(myotherlist[[idx[i]]], mylist[i])