Data correction in Pandas / Python

68 Views Asked by Mike At 13 March 2024 at 15:20

I need to update Data of a subset of my df with values calculated out of a subset of my dataframe.

PERSNR	XYZ	DATE	VALUE
22222	a	Jan	0,8
22222	b	Jan	0,2
22222	a	Feb	0,8
22222	b	Feb	0,2

i have got a Dataframe with like 8000 PERSNR that looks like this, just with many more columns

For a subset, which i already know (i got a list of PERSNR to filter that) i need to add the Values together, ignoring XYZ

i Can do that with pivot_table and can get another df out of it.

The problem i have got is, that i need to update the values afterwards in my original DF.

My goal is to obtain the following without loosing any other data of my DF.

PERSNR	XYZ	DATE	VALUE
22222	a	Jan	1,0
22222	b	Jan	0,0

in Excel i would do a sumifs on my Value Column on my PERSNR, XYZ and DATE.

and then i would replace my values with vlookup in the Value column and overwrite Values with identifier XYZ "b" with zero. This will always be zero.

Any suggestions on how to do that?

thanks in advance

Mike

Original Q&A

There are 1 best solutions below

e-motta On 13 March 2024 at 15:38

Assuming a data frame like this:

data = {
    "PERSNR": [22222, 22222, 22222, 22222, 55555, 55555],
    "XYZ": ["a", "b", "a", "b", "a", "b"],
    "DATE": ["Jan", "Jan", "Feb", "Feb", "Jan", "Jan"],
    "VALUE": [0.8, 0.2, 0.8, 0.2, 0.8, 0.2],
}

   PERSNR XYZ DATE  VALUE
0   22222   a  Jan    0.8
1   22222   b  Jan    0.2
2   22222   a  Feb    0.8
3   22222   b  Feb    0.2
4   55555   a  Jan    0.8
5   55555   b  Jan    0.2

Merge two data frames: (i) the original minus column VALUE and (ii) a data frame where you group on PERSNR (assuming there will be more than one value for this; otherwise, this column isn't necessary) and DATE, then sum the VALUE in each group and reset the index.
Set column VALUE with 0 where df["XYZ"] == "b".
Use .loc to do the previous steps only for the selected PERSNR values.

selected_persnr = [22222]  # add all selected values here

df.loc[df["PERSNR"].isin(selected_persnr)] = pd.merge(
    df.drop(columns="VALUE"),
    df.groupby(["PERSNR", "DATE"])["VALUE"].sum().reset_index(),
)
df.loc[(df["PERSNR"].isin(selected_persnr)) & (df["XYZ"] == "b"), "VALUE"] = 0

   PERSNR XYZ DATE  VALUE
0   22222   a  Jan    1.0
1   22222   b  Jan    0.0
2   22222   a  Feb    1.0
3   22222   b  Feb    0.0
4   55555   a  Jan    0.8
5   55555   b  Jan    0.2

Data correction in Pandas / Python

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in DATA-CLEANING

Trending Questions

Popular # Hahtags

Popular Questions