I have some data that get imported to R like that (character string):
> dput(my_data)
c("S Leistung Sub Text Ergebnis Einheit Normal Auffällig Katalog Datum Zeit Kommentar ",
" APOA_S Apo A1 1.11 g/l 1.04 - 2.02 01 30.03.2023 06:56 ",
" ", "", " APOB_S Apo B 1.09 g/l 0.66 - 1.33 01 30.03.2023 06:56 ",
" ", "", " B-BA_E Basophile Granulozyten absolut 0.04 exp 9/l 0 - 0.1 01 27.03.2023 11:56 ",
" ", "", " B-DBB_E Differentialblutbild · 01 27.03.2023 11:45 ",
" ")
there are more lines, than displayed here. I need to bring it in the form of a table as follows:
S Leistung Sub Text Ergebnis Einheit ... ...
APOA_S Apo A1 1.11 g/l
APOB_S Apo B1 1.09 g/l
...
...
I only really need the column named "Leistung" and "Ergebnis", but an ouput with all of them is good, too!
Problem is that my data:
- it doesn't use a regular separator
- The only "seperator" I could identify are multiple blank spaces (2-15 spaces)
- However, 1 blank space can happen within a value (like "Apo A1") and should not be a separator.
- within the header, only 1 space serves as a seperator
- Fixed width doesn't work.
- the column named "Sub" is always empty.
Is there a way to separate it by (multiple) blank spaces? How to deal with the empty "Sub" column?
update
With missing data without a clear seperator, it is hard to assign what misses, but according to OP he needs not all data, so here a solution that get limited columns.
Update 2
Looking at the data - IF you always have the first 2 columns and the last three columns and either missing data is either a dot or missing at all we could do something like this to get the full data.
data
New
test.txtoriginal answer
I assume you read in a text file somehow, lets simulate that.
test.txtWhen we read your data you get more or less what you show as
my_dataFrom there we take two steps, I made some guesses here for the naming and the amount of data records I found in your rows. So I removed S, Sub and Kommentar and I guessed that Normal and Auffällig could be merged indicating the two values. You might adjust that if I was wrong there.
I assume these are the ones to "keep"
Then we grab your data, which starts without the first header line, we trim it left and right first and then split by more than one space.
Now add your headers
The final result