Properly sorting data into Parents/Children/Grandchildren in excel by python

43 Views Asked by At

I'm working on a python script to order equipment items based on their parent/child/grandchild hierarchies. There are assembly level items, and those items that make up the assembly have sub-assemblies themselves and so on and so forth until you get to items like bolts or plates that do not have further children (and do not have a unique drawing number).

  1. The code takes many different excel sheets from a folder location and combines all data onto one excel sheet on a RawData tab (functional) View of Folder with Equipment Names
  2. The code adds a "Parent" column that writes where the piece of equipment came from (what it is a part of, this directly corresponds to title of the excel sheet it's taken from). View of Data & Description of Parent Column
  3. The code loops through and sorts the first level of children under their parent item but this is where I am struggling. I can't understand why the children are being pasted in by the script after a given familial level is complete. All of the series -100 items are read prior to the drawing numbers pulling the grandchildren of the children beneath them. I've tried my best to show this with colors. This is what the code currently pulls and what I'm trying to get it to do. There was one point where I had gotten it in the correct order, but it only functioned for the first child and was completely broken for any grandchildren. To note, the code should be able to step into a child every time a drawing number is called. When it runs out of children for a given group or if there is no drawing number listed, it should step back up to the previous parent of that specific group. See Descriptive Image

I would note that I also had this functioning in VBA but it seemed to crash when I tried to go past the first child level. I think it's preferable to stay in python at this point if at all possible.

import os
import pandas as pd

# Define the folder path containing the Excel files
folder_path = r'C:\users\name\E3543\Excel'

# Get a list of all Excel files in the folder
file_list = [f for f in os.listdir(folder_path) if f.endswith('.xlsx')]

# Create an empty DataFrame for the raw data
raw_data = pd.DataFrame(columns=["DESCRIPTION", "DRAWING No.", "Parent"])

# Loop through the list of Excel files
for file_name in file_list:
    file_path = os.path.join(folder_path, file_name)
    df = pd.read_excel(file_path)  # Read each Excel file
    df["Parent"] = file_name.split('.')[0]  # Add a Parent column with the file name (without extension)
    raw_data = pd.concat([raw_data, df], ignore_index=True)  # Concatenate DataFrames

# Create a copy of the raw data for the "Families" sheet
families_data = raw_data.copy()

# Create a new DataFrame for the families in correct order
sorted_families_data = pd.DataFrame(columns=families_data.columns)

# Loop through the families and copy the first layer of children
for index, row in families_data.iterrows():
    if pd.isnull(row["DRAWING No."]):
        # Copy rows with empty Parent (first layer)
        sorted_families_data = sorted_families_data._append(row, ignore_index=True)
    else:
        # Copy the parent row only if it hasn't been copied before
        if row["DRAWING No."] not in sorted_families_data["DRAWING No."].tolist():
            sorted_families_data = sorted_families_data._append(row, ignore_index=True)
            
        # Copy rows with matching Drawing No. from Parent column (second level)
        children = families_data[families_data["Parent"] == row["DRAWING No."]]
        sorted_families_data = sorted_families_data._append(children, ignore_index=True)
        
        # Loop through the second level children and copy their children (third level)
        for _, child_row in children.iterrows():
            grandchildren = families_data[families_data["Parent"] == child_row["DRAWING No."]]
            sorted_families_data = sorted_families_data._append(grandchildren, ignore_index=True)

# Remove duplicates to get unique rows
sorted_families_data = sorted_families_data.drop_duplicates()

# Save the sorted families data to the same Excel file
output_file_path = os.path.join(folder_path, "Merged_Data.xlsx")
with pd.ExcelWriter(output_file_path) as writer:
    raw_data.to_excel(writer, sheet_name="RawData", index=False)
    sorted_families_data.to_excel(writer, sheet_name="Families", index=False)
0

There are 0 best solutions below