This code iterates through an excel spreadsheet and finds a specific column where energy is zero, then calculates the duration of that zero-energy-value period by calculating difference between first & last appearance of the continuous zero values.

The problem I'm experiencing is: when there are multiple back-to-back rows of zero, the code stalls and never provides an output.

I'm finding it difficult to pinpoint where the issue is. Can I get some help with this? Here is the sample data in the excel file. The issue is getting an output when we have more than a couple rows of zeroes. Note: Energy is in column 11, Start date is in column 3, and end date is in column 5 in the actual excel file as depicted in the code.

Start Date End Date Energy
1/1/2023 10:54 1/1/2023 11:56 60
1/1/2023 13:28 1/1/2023 13:35 0
1/1/2023 19:02 1/1/2023 19:30 0
1/1/2023 21:03 1/1/2023 21:20 0
1/1/2023 21:35 1/1/2023 21:56 0
1/1/2023 22:23 1/1/2023 22:25 0
1/2/2023 08:34 1/2/2023 08:56 0
1/2/2023 09:04 1/1/2023 09:16 0
1/2/2023 09:14 1/2/2023 09:23 0
1/2/2023 10:05 1/2/2023 10:17 53

import datetime
import openpyxl
import collections
from itertools import islice

#import pandas
from openpyxl.workbook import Workbook

cpsd = ("Excel file")
cpsd_op = openpyxl.load_workbook(cpsd)
cpsd_s1 = cpsd_op['Session-2024']
cpsd_dcfc1 = openpyxl.Workbook()
sheet_dcfc1 = cpsd_dcfc1["Sheet"]

# ^ pulls excel file in, we want to use openpyxl over pandas for excel, since it takes less time
# cpsd = session data

max_col_og = cpsd_s1.max_column
max_row_og = cpsd_s1.max_row
max_col_nw = sheet_dcfc1.max_column
max_row_nw = sheet_dcfc1.max_row
print(max_row_og, max_col_og)

for i in range(1, max_col_og+1):
    c = cpsd_s1.cell(row = 1, column= i)
    sheet_dcfc1.cell(row=1, column=i).value = c.value


for i in range(1, max_col_og+1):
        cell_obj = sheet_dcfc1.cell(row=1, column=i)
        print(cell_obj.value)

def del_empt_row (sheet):
    index_row = []
    for i in range(1, sheet.max_row):
        # define emptiness of cell
        if sheet.cell(i, 1).value is None:
            # collect indexes of rows
            index_row.append(i)

    # loop each index value
    for row_del in range(len(index_row)):
        sheet.delete_rows(idx=index_row[row_del], amount=1)
        # exclude offset of rows through each iteration
        index_row = list(map(lambda k: k - 1, index_row))


for j in range(1, max_row_og +1):
    for i in range(1, max_col_og +1):
        c = cpsd_s1.cell(row=j, column=1)
        if (c.value == "PP/ Charger 2"):
            k = cpsd_s1.cell(row=j, column=i)
            sheet_dcfc1.cell(row=j, column=i).value = k.value
            #print(k.value)

del_empt_row(sheet_dcfc1)

def enddate (sheet, row):
    #returns the end date of the last row with energy = 0
    for row2 in range(row, max_row_og + 1):
        if (sheet.cell(row=row2, column=10).value != 0):
            return [sheet.cell(row=row2-1, column=5).value,row2-1]
        else:
            enddate(sheet,row+1)

def consume(iterator, n):
    #allows us to skip the energy = 0 rows that have already been counted, since python is weird about iteration skipping
    #"Advance the iterator n-steps ahead. If n is none, consume entirely."
    # Use functions that consume iterators at C speed.
    if n is None:
        # feed the entire iterator into a zero-length deque
        collections.deque(iterator, maxlen=0)
    else:
        # advance to the empty slice starting at position n
        next(islice(iterator, n, n), None)


zero_time = datetime.datetime(2023, 1, 1, 00, 00, 00, 00)
tot_time = datetime.datetime(2023, 1, 1, 00, 00, 00, 00)
#print(tot_time)

range_x = enumerate(sheet_dcfc1.iter_rows())
for row_num, row in range_x:
# calculates total time for t-outage provided that there are no empty rows.
    print(row_num)
    if (row[9].value == 0):
        strt_date = row[2].value
        print(strt_date)
        strt_row = row_num
        end_date_arr = enddate(sheet_dcfc1,row_num+1)
        end_date = end_date_arr[0]
        print(end_date)
        time = end_date-strt_date
        consume(range_x, end_date_arr[1]-strt_row)
#        print(row_num)
        #print(str(row) + "does row change?")
        tot_time += time

#        print(time)

print(tot_time-zero_time)
# prints total time for t-outage provided that there are no empty rows.

1

There are 1 best solutions below

0
moken On

It would also be helpful to include some comments in code to explain what each section does.

From what I see working through the code, you are first adding the Headers from your original Sheet 'Session-2024' to a new Workbook Sheet sheet_dcfc1 with name 'Sheet' (then reading back / printing these headers).

Next follows a loop looking for a cell with value "PP/ Charger 2"
There is no mention of this part in your post and the table provided does not include columns/rows with this value. Also the loop is wrong.

for j in range(1, max_row_og +1):
    for i in range(1, max_col_og +1):
        c = cpsd_s1.cell(row=j, column=1)
        if (c.value == "PP/ Charger 2"):
            k = cpsd_s1.cell(row=j, column=i)
            sheet_dcfc1.cell(row=j, column=i).value = k.value

You are setting j to count the rows in the origin worksheet and i to count the columns then the line c = cpsd_s1.cell(row=j, column=1) . So this means it always looks in the same column (A) due to 'column=1'. Presumably this should be 'column=i'
If this is changed, then you're now looking in every cell from A1 to K11 (using your example data table). Is this value, "PP/ Charger 2" expected to be randomly in any column including those you have provided data for? If it is expected to be in a specific column then you should iterate just that column and not the whole used range. Please provide details on what/where/when of this value, as its existence seems important to the next sections. Without it the new worksheet will just contain the headers.

If the previous section does find that "PP/ Charger 2" value it copies that value to the same cell in the new Sheet.
The code then runs a delete which seems aims to remove empty rows given that presumably "PP/ Charger 2" could be on any row. While this delete achieves that it also removes the Headers previously written.
If it is only necessary write that value once down the Sheet for each occurrence in the original Sheet then maybe use Openpyxl 'append' so each will be written to the next unused row. Either way this value can be added to the new sheet row by row without the need to then delete empty rows. The exact implementation would depend on where this text is expected to be found and how it should be written to the new Sheet.

Then the code loops a list of all cells in all rows looking specifically in column index 9 for a value of 0, row[9].value == 0):
Presumably this is looking for the 0 value in the Energy column. However you state Energy is in column 11 or Column 'K' whereas row[9] is Column 'J'. And again why select the whole range and not just loop through Column K only?
Then the code looking for a value of 0, but this is the new Sheet range_x = enumerate(sheet_dcfc1.iter_rows()) , 'sheet_dcfc1' and all you have written to it is the Headers and the value "PP/ Charger 2" when it occurs, and then deleted the headers. There are no cells with a value of 0 ... It seems may be when you find the value "PP/ Charger 2" it's the whole row you want to copy across, not just that cell?

You need to better explain what your code is supposed to do, what the expected result should be.