kdb/q Creating new columns based on the beginning of a grouping of rows

44 Views Asked by At

The table below has some information about running races. There are multiple rows that are attributed to the same race, and you can tell by looking at the FirstID and SecondID columns. The SourceID column connects the multiple rows together.

For example, row 2 is a continuation of row 1 because the SecondID of row 1 is the FirstID of row 2. An easy way to see this is by grouping by SourceID, which keeps the first FirstID value of the chain.

FirstID SecondID SourceID Race Style ...
123ABC 456DEF 123ABC Atlantic Timed
456DEF 789GHI 123ABC SubAtlantic
349UYH 286IWD 349UYH Pacific Untimed
286IWD 585YYY 349UYH SubPacific
input:([] FirstID:`123ABC`456DEF`349UYH`286IWD; SecondID:`456DEF`789GHI`286IWD`585YYY; SourceID: `123ABC`123ABC`349UYH`349UYH; Race: `Atlantic`SubAtlantic`Pacific`SubPacific; Style: `Timed``Untimed`)

I want to add a few more columns based on this information, namely ParentRace and IsTimed, so that the output looks like this:

FirstID SecondID SourceID Race Style ParentRace IsTimed
123ABC 456DEF 123ABC Atlantic Timed Atlantic Yes
456DEF 789GHI 123ABC SubAtlantic Atlantic Yes
349UYH 286IWD 349UYH Pacific Untimed Pacific No
286IWD 585YYY 349UYH SubPacific Pacific No

ParentRace is the value of the Race at the beginning of the chain (i.e. first row that begins a chain) and is applied to the rest of the chain, grouped by SourceID

IsTimed shows whether or not the beginning of the chain is `Timed and applies to the rest of the chain, grouped by SourceID

Does anyone know how to add these two columns based on the initial row of the chain? Let me know if you need me to clarify.

2

There are 2 best solutions below

2
James Little On
update ParentRace:first Race, isTimed:`No`Yes @ `Timed=first Style by SourceID from input
0
terrylynch On

Another approach that could work is to do dictionary lookups:

q)update ParentRace:(FirstID!Race)SourceID,isTimed:`Timed=(FirstID!Style)SourceID from input
FirstID SecondID SourceID Race        Style   ParentRace isTimed
----------------------------------------------------------------
123ABC  456DEF   123ABC   Atlantic    Timed   Atlantic   1
456DEF  789GHI   123ABC   SubAtlantic         Atlantic   1
349UYH  286IWD   349UYH   Pacific     Untimed Pacific    0
286IWD  585YYY   349UYH   SubPacific          Pacific    0

This assumes your FirstID values are unique