Can you group multiple rows all into one row by column value with Python using pandas?

652 Views Asked by At

How do I change this:

Date URL Description Category
2022-06-17 14:24:52 /XYBkLO public A
2022-06-17 14:24:52 /XYBkLO public B
2022-06-17 14:24:52 /XYBkLO public C
2022-06-17 14:25:05 /ZWrTVu public A
2022-06-17 14:25:05 /ZWrTVu public B
2022-06-17 14:25:05 /ZWrTVu public C

To this:

Date URL Description Category Date URL Description Category Date URL Description Category
2022-06-17 14:24:52 /XYBkLO public A 2022-06-17 14:24:52 /XYBkLO public B 2022-06-17 14:24:52 /XYBkLO public C
2022-06-17 14:25:05 /ZWrTVu public A 2022-06-17 14:25:05 /ZWrTVu public B 2022-06-17 14:25:05 /ZWrTVu public C

I would like to keep everything with the same URL on one row, but I don't know how I could implement this using pandas. Is there perhaps another way or another library I should use? Could really use some help

3

There are 3 best solutions below

2
Andrej Kesely On BEST ANSWER

You can try:

from itertools import cycle, count, islice
from collections import defaultdict


def fn(x):
    d = defaultdict(lambda: count(1))
    names = cycle(x.columns)
    vals = x.values.ravel()

    return pd.DataFrame(
        [vals],
        columns=[f"{n}.{next(d[n])}" for n in islice(names, len(vals))],
    )


x = df.groupby("URL").apply(fn).reset_index(drop=True)
print(x)

Prints:

                Date.1    URL.1 Description.1 Category.1               Date.2    URL.2 Description.2 Category.2               Date.3    URL.3 Description.3 Category.3
0  2022-06-17 14:24:52  /XYBkLO        public          A  2022-06-17 14:24:52  /XYBkLO        public          B  2022-06-17 14:24:52  /XYBkLO        public          C
1  2022-06-17 14:25:05  /ZWrTVu        public          A  2022-06-17 14:25:05  /ZWrTVu        public          B  2022-06-17 14:25:05  /ZWrTVu        public          C
0
ggsmith On

You can look into pandas transpose function. transpose

Not 100% sure it will fit your use case but it seems like a good place to start.

0
rhug123 On

Here is another way:

(pd.concat(
    [df.set_index('URL',drop=False) for _,df in df.groupby('Category')]
    ,axis=1)
.reset_index(drop=True))

Output:

    Date             URL    Description Category    Date             URL    Description Category  Date               URL    Description Category
0   6/17/2022 14:24 /XYBkLO public       A          6/17/2022 14:24 /XYBkLO public       B        6/17/2022 14:24   /XYBkLO public      C
1   6/17/2022 14:25 /ZWrTVu public       A          6/17/2022 14:25 /ZWrTVu public       B        6/17/2022 14:25   /ZWrTVu public      C