Pandas dataframe.explode with columns of very large entries

46 Views Asked by MJK At 19 January 2024 at 21:52

Right now i am working with a excel sheet, which contains some columns which presents links to previous tasks and following tasks of an project shedule.

My goal is to analyze these links with the Panda library in Python, unfortunately these examples tends to fill columns with a large entry of values.

For a better Overview Jupyter is also included. As an IDE i use Visual Studio Code.

Here is an example how these columns looks, these are the first two entries of the table:

Vorgänger	Nachfolger
221;215;715EA+2 Tage;811EA+3 Tage;816;254;261;268;275;282;289;296;311;437;362EA+6 Tage;524;450;454;352EA+2 Tage;372EA+18 Tage;381EA+12 Tage;663;652;469EA+1 Tag;687;402EA+8 Tage;499;540;637EA+1 Tag;666;669;672;674;685EA+3 Tage;737EA+3 Tage;786;927EA+6 Ta...	54;20
26;944

I got several entries in both columns as far as i can see the first has the largest quantity in that table.

With the following code i read in the sheet:

import pandas as pd  
import numpy as np
from datetime import datetime

from pandas import * 

# Einlesen der Excel, 04.01.: 100 Einträge der Spalte Nachfolger von Analyse-DCMA-RKE Neu -07-2017.xlsx Sheet Tabelle2
excel_file = 'F:\\Google Drive\\Matthias\\Arbeit\\ISQM\\08 - Tool Menü - Skripte DD 19.12.2023\\Excel\\Gesamtauszug Primavera - dd 27.10.2023ohne3.xlsx'
xlsx = ExcelFile(excel_file)                
excel_Tabelle = xlsx.sheet_names[0]         
df  = xlsx.parse(excel_Tabelle)

For the following section i separate the columns i work on:

# Speichern Spalte Nachfolger und Vorgänger, Komplette/Inkomplette Tasks
df_1   = df['Nachfolger'].to_frame()
df_2   = df['Vorgänger'].to_frame()
df_3   = df['%'].to_frame()

After some cleaning and concatenations i use the the following code to split these entries into columns and save it into another frame

# Aufsplitten der Spalten
df_ms_1 = df_ms.assign(Vorgänger  =df_cpr['Vorgänger'].str.split(';'),  
                     Nachfolger =df_cpr['Nachfolger'].str.split(';')).explode('Vorgänger').explode('Nachfolger')
df_ms_1['Vorgänger'] = df_ms_1['Vorgänger'].fillna(0)
df_ms_1['Nachfolger'] = df_ms_1['Nachfolger'].fillna(0)

At the the first glance this operation worked well, the entries got separated into columns and are saved in the frame. But then i noticed not every entrie from the source table has been processed. For an example the first entries i showed in the table doesnt even appear in the frame df_ms_1 . The Frame starts practically at row 12 from the source and leaves the first 11 out.

The output Frame df_ms_1 in Excel looks like this:

| -- | ------ | ----------- | ---------- |

| 0 | 1000 | 0 | 0 |

| 1 | 1000 | 0 | 0 |

| 2 | 1000 | 0 | 0 |

| 3 | 1000 | 0 | 0 |

| 4 | 0 | 0 | 0 |

| 5 | 1000 | 0 | 0 |

| 6 | 1000 | 0 | 0 |

| 7 | 1000 | 0 | 0 |

| 8 | 1000 | 0 | 0 |

| 9 | 1000 | 0 | 0 |

| 10 | 1000 | 61 | 0 |

| 10 | 1000 | 67 | 0 |

| 10 | 1000 | 73 | 0 |

| 10 | 1000 | 78 | 0 |

| 10 | 1000 | 84 | 0 |

| 10 | 1000 | 134EA-1 Tag | 0 |

| 10 | 1000 | 97 | 0 |

| 10 | 1000 | 100 | 0 |

| 10 | 1000 | 103 | 0 |

| 10 | 1000 | 106 | 0 |

| 10 | 1000 | 107 | 0 |

| 10 | 1000 | 109 | 0 |

| 10 | 1000 | 110 | 0 |

The frame begins at 12th entry of the source table, it has been transformed from this column:

Vorgänger	Nachfolger
61;67;73;78;84;134EA-1 Tag;97;100;103;106;107;109;110

I took a look in the documentation of the Panda library, but dataframe.explode doesnt have a lot of options.

It appears to me this operation cant split columns with large quantity of values. So i have the following questions:

Did anyone experienced the same issues ?

Are there different approaches in Panda to solve this setup?

Original Q&A

Pandas dataframe.explode with columns of very large entries

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in PYTHON-3.X

Related Questions in EXCEL

Related Questions in DATAFRAME

Related Questions in PANDAS-EXPLODE

Trending Questions

Popular # Hahtags

Popular Questions