I have two tables A and B. I want to join them based on their validity time intervals.
A has product quality (irregular times) and B has hourly settings during the production period. I need to create a table like C that includes the parameters p1 and p2 for all A's RefDates that fall in the time range of B's ValidFrom ValidTo.
A
RefDate result
'11-Oct-2017 00:14:00' 17
'11-Oct-2017 00:14:00' 19
'11-Oct-2017 00:20:00' 5
'11-Oct-2017 01:30:00' 25
'11-Oct-2017 01:30:00' 18
'11-Oct-2017 03:03:00' 28
B
ValidFrom ValidTo p1 p2
'11-Oct-2017 00:13:00' '11-Oct-2017 01:12:59' 2 1
'11-Oct-2017 01:13:00' '11-Oct-2017 02:12:59' 3 1
'11-Oct-2017 02:13:00' '11-Oct-2017 03:12:59' 4 5
'11-Oct-2017 03:13:00' '11-Oct-2017 04:12:59' 6 1
'11-Oct-2017 04:13:00' '11-Oct-2017 05:12:59' 7 9
I need to get something like this.
C
RefDate res p1 p2
'11-Oct-2017 00:14:00' 17 2 1
'11-Oct-2017 00:14:00' 19 2 1
'11-Oct-2017 00:20:00' 5 2 1
'11-Oct-2017 01:30:00' 25 3 1
'11-Oct-2017 01:30:00' 18 3 1
'11-Oct-2017 03:03:00' 28 4 5
I know how to do this in SQL and I think I have figured out how to do this row by row in MatLab but this is horribly slow. The data set is rather large. I just assume there must be a more elegant way that I just couldn't find.
Something that caused many of my approaches to fail is that the RefDate column is not unique.
edit: the real tables have thousands of rows and hundreds of variables.
C (in reality)
RefDate res res2 ... res200 p1 p2 ... p1000
11-Oct-2017 00:14:00 17 2 1
11-Oct-2017 00:14:00 19 2 1
11-Oct-2017 00:20:00 5 2 1
11-Oct-2017 01:30:00 25 3 1
11-Oct-2017 01:30:00 18 3 1
11-Oct-2017 03:03:00 28 4 5
This can actually be done in a single line of code. Assuming your
ValidTovalue always ends immediately before theValidFromin the next row (which it does in your example), you only need to use yourValidFromvalues. First, convert those and yourRefDatevalues to serial date numbers usingdatenum. Then use thediscretizefunction to bin theRefDatevalues using theValidFromvalues as the edges, which will give you the row index inBthat contains each time inA. Then use that index to extract thep1andp2values and append them toA:The above solution should work for any number of columns
pNinB.If there are any times in
Athat don't fall in any of the ranges inB, you will have to break the solution into multiple lines so you can check whether or not the index returned fromdiscretizecontainsNaNvalues. Assuming you want to exclude those rows fromC, this would be the new solution: