Following previous question,enter link description here I have extra informations with my data,I included the gene with the data. Since same gene were predicted as different enzyme, results were combined as "+" sign, but now I would like to split the results as given her below My dataframe look like following
df <-data.frame(Gene= c("A", "B", "C","D","E","F"),
G1=c("GH13_22+CBM4", "GH109+PL7+GH9","GT57", "AA3","",""),
G2=c("GH13_22","","GT57+GH15","AA3", "GT41","PL+PL2"),
G3=c("GH13", "GH1O9","", "CBM34+GH13+CBM48", "GT41","GH16+CBM4+CBM54+CBM32"))
and output if like this one down here
df2<-data.frame(Gene= c("A","A","B", "B","B","C","C","D","D","D","E","F","F","F","F"),
G1=c("GH13_22","CBM4","GH109","PL7","GH9","GT57","GT57","AA3","AA3","AA3","","","","",""),
G2=c("GH13_22","GH13_22","","","","GT57","GH15","AA3","AA3","AA3", "GT41","PL","PL2","",""),
G3=c("GH13","","GH1O9","GH1O9", "GH1O9","","","CBM34","GH13","CBM48", "GT41","GH16","CBM4","CBM54","CBM32"))
Kindly help
It was harder than I thought but here's a way.
The main idea is to use the function
str_split_fixedto split string and return a fixed number of separated values, with""padded if the input is too short. Note: I selected 4 here, but you can choose an upper bound much higher to accommodate for longer strings.This results in a data.frame with G1:G3 as column-matrix, i.e. each element is a matrix of size 1 x 4. Then, the remaining code
unnests the matrices to multiple elements in long format, replace empty strings with NAs, remove rows with only NAs, and thenfillthe remaining values by group: