I am using python and multiple libaries like pandas and scipy to prepare data so I can start deeper analysis. For the preparation purpose I am for instance creating new columns with the difference of two dates.
My code is providing the expected results but is really slow so I cannot use it for a table with like 80K rows. The run time would take ca. 80 minutes for the table just for this simple operation.
The problem is definitely related with my writing operation:
tableContent[6]['p_test_Duration'].iloc[x] = difference
Moreover python is providing a Warning:
complete code example for date difference:
import time
from datetime import date, datetime
tableContent[6]['p_test_Duration'] = 0
#for x in range (0,len(tableContent[6]['p_test_Duration'])):
for x in range (0,1000):
p_test_ZEIT_ANFANG = datetime.strptime(tableContent[6]['p_test_ZEIT_ANFANG'].iloc[x], '%Y-%m-%d %H:%M:%S')
p_test_ZEIT_ENDE = datetime.strptime(tableContent[6]['p_test_ZEIT_ENDE'].iloc[x], '%Y-%m-%d %H:%M:%S')
difference = p_test_ZEIT_ENDE - p_test_ZEIT_ANFANG
tableContent[6]['p_test_Duration'].iloc[x] = difference
Take away the loop, and apply the functions to the whole series.