How i can find unique one one column(comma separated values) if we are applying group by two column?

47 Views Asked by At

I have data frame like this

 office      salary
| Column A | Column B |
| -------- | -------- |
| Montreal | a, b     |
| Cell 3   | b,c,d    |
| Montreal | b,c,d    |
| Montreal | c,d,e    |
| Toronto  | n,c      |

Output:

    office  salary
0   Montreal    a,b,c,d,e
1   Toronto     b,c,d,n,c

How i can do this in pandas?

I tried:

office = ['Montreal', 'Toronto', 'Montreal', 'Montreal','Toronto']
avg_salary = ["a,b","b,c,d","b,c,d","c,d,e","n,c"]
hr_lst  = list (zip(office, avg_salary))
#hr_lst

df = pd.DataFrame(hr_lst, columns = ['office', 'salary'])


# Display Original DataFrames
print("Created DataFrame:\n",df,"\n")
`your text`
# Finding unique values
res = df.groupby('office')['salary'].apply(lambda x: list(np.unique(x)))

# Display Result
print("Unique Values:\n",res)

Output:

 office      salary
| Column A | Column B |
| -------- | -------- |
| Montreal | a, b     |
| Cell 3   | b,c,d    |
| Montreal | b,c,d    |
| Montreal | c,d,e    |
| Toronto  | n,c      |


Unique Values:
 office
Montreal    [a,b, b,c,d, c,d,e]
Toronto            [b,c,d, n,c]
Name: salary, dtype: object
1

There are 1 best solutions below

0
jezrael On

Use lambda function with list comprehension with split, for remove duplicates use dict.fromkeys trick if ordering is important, last use join:

f = lambda x: ','.join(dict.fromkeys([z for y in x.str.split(',') for z in y]))
res = df.groupby('office')['salary'].apply(f).reset_index()

# Display Result
print("Unique Values:\n",res)
      office     salary
0  Montreal  a,b,c,d,e
1   Toronto    b,c,d,n

If ordering in ouput is not important use sets:

f = lambda x: ','.join(set([z for y in x.str.split(',') for z in y]))
res = df.groupby('office')['salary'].apply(f).reset_index()

# Display Result
print("Unique Values:\n",res)

      office     salary
0  Montreal  a,d,c,e,b
1   Toronto    n,d,c,b