GraphX to create parent-child linkage in pyspark dataframe

41 Views Asked by At

I have the following table structure.

group_id id parent_id hierarchy_level
1 1 NULL 1
1 25 1 2
1 112 25 3
1 34 1 2
1 543 34 3
1 16 543 4
2 88 NULL 1
2 235 88 2
2 921 235 3
2 8 921 4

Eventually, I would like to get a structure which enables me to easily query the child_ids for a specific id I would analyze.

I don't have experience with graph libraries so I need a little guidance on what to look for. I saw GraphX has plenty of algorithms. I've tried to check if there is one that I could use but I don't understand much of it. As of now, I'm using dynamic loop which performs multiple joins depending on max hierarchy level. However, graphs seem like something created for this purpose and they look more fancy. Could someone more experienced give some advice?

0

There are 0 best solutions below