How to Select First N Key-ordered Values of column within a grouping variable in Pandas DataFrame

Question

How to Select First N Key-ordered Values of column within a grouping variable in Pandas DataFrame

69 Views Asked by R_Student At 21 September 2023 at 19:31

I have a dataset:

import pandas as pd

data = [
    ('A', 'X'),
    ('A', 'X'),
    ('A', 'Y'),
    ('A', 'Z'),
    ('B', 1),
    ('B', 1),
    ('B', 2),
    ('B', 2),
    ('B', 3),
    ('B', 3),
    ('C', 'L-7'),
    ('C', 'L-9'),
    ('C', 'L-9'),
    ('T', 2020),
    ('T', 2020),
    ('T', 2025)
]

df = pd.DataFrame(data, columns=['ID', 'SEQ'])
print(df)

I want to create a key grouping ID and SEQ in order to select the first 2 rows of each different SEQ within each ID Group

For instance the ID A, has 3 distinct keys "A X", "A Y" and "A Z" in the order of the dataset the first two keys are "A X" and "A Y" thus I must select the first two rows (if available) of each thus

"A X", "A X", "A Y" why? because "A Z" is another key.

I've tried using the groupby and head functions, but I couldn't find a way to achieve this specific result. What can I try next?

(df
.groupby(['ID','SEQ'])
.head(2)
)

This code is returning the original dataset and I wonder if I can solve this problem using method chaining, as it is my preferred style in Pandas.

The final correct output is:

Original Q&A

There are 5 best solutions below

not_speshal On 21 September 2023 at 19:40

drop_duplicates and then use groupby to get the head of each "ID". Then merge with the original DataFrame to keep duplicate rows.

>>> df.drop_duplicates().groupby("ID").head(2).merge(df)

   ID   SEQ
0   A     X
1   A     X
2   A     Y
3   B     1
4   B     1
5   B     2
6   B     2
7   C   L-7
8   C   L-9
9   C   L-9
10  T  2020
11  T  2020
12  T  2025

Lfppfs On 21 September 2023 at 19:41

IIUC, you have to group by ID only, then select only unique rows (e.g. by using drop_duplicates) then you can use merge to retrieve those rows:

df = df.\
    merge(
        df.\
        drop_duplicates().\
        groupby(["ID"]).\
        head(2),
        on=["ID", "SEQ"],
        how="right"
    )

df
Out[16]: 
   ID   SEQ
0   A     X
1   A     X
2   A     Y
3   B     1
4   B     1
5   B     2
6   B     2
7   C   L-7
8   C   L-9
9   C   L-9
10  T  2020
11  T  2020
12  T  2025

rhug123 On 21 September 2023 at 19:57

Here is an option using pd.factorize() with groupby()

df.loc[df.groupby('ID')['SEQ'].transform(lambda x: pd.factorize(x)[0] <= 1)]

Output:

   ID   SEQ
0   A     X
1   A     X
2   A     Y
4   B     1
5   B     1
6   B     2
7   B     2
10  C   L-7
11  C   L-9
12  C   L-9
13  T  2020
14  T  2020
15  T  2025

Andrej Kesely On 21 September 2023 at 19:58

Try:

out = df.groupby("ID", group_keys=False).apply(
    lambda x: x[x["SEQ"].isin(x["SEQ"].unique()[:2])]
)
print(out)

Prints:

   ID   SEQ
0   A     X
1   A     X
2   A     Y
4   B     1
5   B     1
6   B     2
7   B     2
10  C   L-7
11  C   L-9
12  C   L-9
13  T  2020
14  T  2020
15  T  2025

**Ryder** · Accepted Answer · 2023-09-21T19:41:36.587000

Your approach of using groupby and then head(2) is on the right track for getting the first 2 rows of each different SEQ within each ID group.

However, the additional requirement is to get only the first 2 unique SEQ groups within each ID. To achieve this, you can:

Create a new column that has the rank of unique SEQ within each ID group. Use this rank to filter out the data. Finally, use your original approach to get the first 2 rows of each SEQ within each ID group. Here's a solution using method chaining:

result = (df
          .assign(rank=df.groupby('ID')['SEQ'].transform(lambda x: x.rank(method='dense')))
          .query('rank <= 2')
          .groupby(['ID', 'SEQ'])
          .head(2)
          .drop(columns=['rank'])
         )

print(result)

This should give you the desired output.

How to Select First N Key-ordered Values of column within a grouping variable in Pandas DataFrame

There are 5 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in CHAINING

Related Questions in METHOD-CHAINING

Trending Questions

Popular # Hahtags

Popular Questions