Groupby at least one match between two columns with Pandas only(not Networkx library)

46 Views Asked by At

I have a table.

+------+---------------+
| id   |   co_id       |
+------+---------------+
| 123  | abc           |   
| 123  | bcd           |
| 234  |abc            |  
| 123  |def            |
| 678  |urt            |
| 678  |urt            |
| 124  |xyz            |
| 600  |urt            |
| 700  |def            |
+------+---------------+

I use Pandas to groupby id and co_id, because they are many to many or 1 to many or many to 1 relation(at least one match between id and co_id). Goal is to group together all the same id and same co_id, output would be:

+------+---------------+------+
| id   |   co_id       |cat   |
+------+---------------+------+
|123   |abc            |group1|
|123   |bcd            |group1|
|234   |abc            |group1|
|123   |def            |group1|
|700   |def            |group1|
|678   |urt            |group2|
|678   |urt            |group2|
|600   |urt            |group2|
|124   |xyz            |group3|
+------+---------------+------+

I was going to group by id and co_id separately and join them together, but failed. My initial code as below, how can I achieve ideal output? Please advise.

import pandas as pd

df = pd.read_csv(r'diretory\sample.csv')

df_id = df.groupby('id').agg({'id': lambda x: list(x)}).reset_index()
df_coid = df.groupby('co_id').agg({'co_id': lambda x: list(x)}).reset_index()

0

There are 0 best solutions below