Python Pandas getting hierarchy path till top management

63 Views Asked by At

I have the below pandas DF

enter image description here

steps to create the DF

data=[['1','0','0','0','0'],['2','1','1','0','0|0'],['3','1','1','1','0|1'],['4','2','2','0','0|0|0'],['5','2','2','1','0|0|1'],['6','2','2','2','0|0|2'],['7','3','2','0','0|1|0'],['8','3','2','1','0|1|1'],['9','3','2','2','0|1|2'],['10','3','2','3','0|1|3'],['11','4','3','0','0|0|0|0'],['12','4','3','1','0|0|0|1'],['13','10','3','0','0|1|3|0']]
df = pd.DataFrame(data, columns=['eid','m_eid','level','path_variable','complete_path'])
df=df.drop('complete_path',axis=1)

Here:

eid = employee id

m_eid = manager id

level = level in org(0 being top boss)

path_variable = incremental number assigned to an employee based on there level this number resets for each manager(for example: eid[4,5,6,7,8,9,10] belong to same level 2 but eid[4,5,6] has same manager(m_eid=2) so there path_variable is 0,1,2 whereas eid[7,8,9,10] has a different manager(m_eid=3) so the path_variable restarts from 0)

i want to create a new column which shows the complete path till level 0 for each eid. Like shown below:

output

Complete path is concatenations of path_variable till level 0(top boss).

Its like path from root node to the edge node. For ex. lets take eid 10

enter image description here

there can be level skips between immediate managers. I m trying to avoid iterrows() due to performance constraints.

1

There are 1 best solutions below

4
mozway On BEST ANSWER

IIUC, you can build a directed graph with networkx, then find the shortest_path between each node and '0', then use that to map the path_variable:

import networkx as nx

G = nx.from_pandas_edgelist(df, source='m_eid', target='eid',
                            create_using=nx.DiGraph)

s = df.set_index('eid')['path_variable']

mapper = {n: '|'.join(s.get(x, '') for x in 
                      nx.shortest_path(G, source='0',
                                       target=n)[1:])
          for n in df['eid'].unique()
         }
df['complete_path'] = df['eid'].map(mapper)

Output:

   eid m_eid level path_variable complete_path
0    1     0     0             0             0
1    2     1     1             0           0|0
2    3     1     1             1           0|1
3    4     2     2             0         0|0|0
4    5     2     2             1         0|0|1
5    6     2     2             2         0|0|2
6    7     3     2             0         0|1|0
7    8     3     2             1         0|1|1
8    9     3     2             2         0|1|2
9   10     3     2             3         0|1|3
10  11     4     3             0       0|0|0|0
11  12     4     3             1       0|0|0|1
12  13    10     3             0       0|1|3|0

Graph:

organization graph, networkx graphviz

If you already have unique values in eid you could avoid the mapper and use:

df['complete_path'] = ['|'.join(s.get(x, '') for x in 
                       nx.shortest_path(G, source=n,
                                        target='0')[-2::-1])
                       for n in df['eid']]

To make it easier to understand, here is a more classical path with the nodes ids (not the path_variables):

mapper = {n: '|'.join(nx.shortest_path(G, source='0',
                                       target=n)[1:])
          for n in df['eid'].unique()
         }
df['complete_path'] = df['eid'].map(mapper)

Output:

   eid m_eid level path_variable complete_path
0    1     0     0             0             1
1    2     1     1             0           1|2
2    3     1     1             1           1|3
3    4     2     2             0         1|2|4
4    5     2     2             1         1|2|5
5    6     2     2             2         1|2|6
6    7     3     2             0         1|3|7
7    8     3     2             1         1|3|8
8    9     3     2             2         1|3|9
9   10     3     2             3        1|3|10
10  11     4     3             0      1|2|4|11
11  12     4     3             1      1|2|4|12
12  13    10     3             0     1|3|10|13