i want to see if there is an efficient way to reshape dataframe that is read from a tab separated csv file. the data consist of event codes and messages that are stacked up vertically with their timestamps. the data include a state column that specify whether an event occured (TRUE) or cleared (FALSE). i tried iterating through each row and update accordingly, but it is taking forever to complete.
every event code can occur and clear multiple times. every occur event (TRUE) is followed by clear event (FALSE) for every EventCode.
below example shows the format of the input file:
Timestamp State Data EventCode EventMsg Class
19-May-2023 16:10:09.301 FALSE 1 EventCode 1 EventCode 1 Message class 1
19-May-2023 16:10:09.300 FALSE 2 EventCode 2 EventCode 2 Message class 1
19-May-2023 16:10:09.299 TRUE 3 EventCode 1 EventCode 1 Message class 2
19-May-2023 16:10:09.298 FALSE 4 EventCode 4 EventCode 4 Message class 2
19-May-2023 16:10:09.297 FALSE 5 EventCode 3 EventCode 3 Message class 2
19-May-2023 16:10:09.296 TRUE 6 EventCode 2 EventCode 2 Message class 1
19-May-2023 16:10:09.295 TRUE 7 EventCode 4 EventCode 4 Message class 2
19-May-2023 16:10:09.294 TRUE 8 EventCode 3 EventCode 3 Message class 2
19-May-2023 16:10:09.293 FALSE 0 EventCode 1 EventCode 1 Message class 2
19-May-2023 16:10:09.292 TRUE 9 EventCode 1 EventCode 1 Message class 2
below shows desired final format:
OccurTimestamp clearTimestamp Data EventCode EventMsg Class
19-05-2023 16:10:09.299 19-05-2023 16:10:09.301 3 EventCode 1 EventCode 1 Message class 1
19-05-2023 16:10:09.296 19-05-2023 16:10:09.300 6 EventCode 2 EventCode 2 Message class 1
19-05-2023 16:10:09.295 19-05-2023 16:10:09.298 7 EventCode 4 EventCode 3 Message class 2
19-05-2023 16:10:09.294 19-05-2023 16:10:09.297 8 EventCode 3 EventCode 4 Message class 2
19-05-2023 16:10:09.292 19-05-2023 16:10:09.293 9 EventCode 1 EventCode 1 Message class 1
Based on your example, I assume that each EventCode only exists exactly two times (once with
State = Trueand once withstate = False). Then this should work:Edit: Based on your additional info, I think you still need to iterate through the events as your problem inherently requires to check following events for a clear event.
You can still use
as each group then only consists of the individual EventCodes preserving the order of your original dataframe. This avoids iterating over all other events when trying to find a corresponding clear event.
You can then iterate over each group via