Advise on how to work with multiple OrderedDict instances distributed across different files

42 Views Asked by At

I want to create a script that processes files in a folder based on their names:.

  • There are 2 types of files: (alpha & beta)
  • All of the alpha files require the same manipulation (concatenation, filling blank cells, addition of a new column...)
  • All of the beta files require the same manipulation (concatenation, filling blank cells, storage of certain variable...)
  • Eventually, I'll need to merge alpha files with their corresponding beta files using a left merge.

My questions

  1. What's the best way to proceed with this task?
  2. Can we handle all alpha files simultaneously?
  3. Do we need to merge alpha files into one and beta files into another for further processing?
# IMPORT MODULES

import openpyxl as op
import pandas as pd
import xlsxwriter
import os
from xlsxwriter.workbook import Workbook
from openpyxl import Workbook, load_workbook
import glob


# MY FOLDER PATH

my_folder_path = r'C:\Users\machukovich\Desktop\working_folder'
list_of_files = os.listdir(my_folder_path)

print('This folder contains {len_folder} file(s): \n'.format(len_folder=len(list_of_files)))
for file_name in list_of_files:
    print(file_name)
    
# DEFINING THE ALPHA FILES

alpha_files = glob.glob(r'C:\Users\Edu\Desktop\working_folder\*_alpha.xlsx')
for file_name in alpha_files:
    print(file_name)
    
# DEFINING THE BETA FILES

beta_files = glob.glob(r'C:\Users\Edu\Desktop\working_folder\*_beta.xlsx')
for file_name in beta_files:
    print(file_name)
    
# FROM HERE IDK HOW TO CONTINUE ( THIS OPTION THROWS ERROR )


dfs_alpha = pd.read_excel(alpha_files, sheet_name=None, skiprows=3)

For more clarity, the output for my folder is the following:

This folder contains 4 file(s): 

first_file_alpha.xlsx
first_file_beta.xlsx
second_file_alpha.xlsx
second_file_beta.xlsx

0

There are 0 best solutions below