I have a dataframe that contains a category column and a values column, both indexed by date:
date={2023-01 , 2023-02 , ... , 2023-01 , 2023-02 , ...} ciiu={'A2032' , 'A2032' , ... , 'B4030' , 'B4030' , ...} ics={0.04563 , 0.05632 , ... , 0.1123 , 0.1198 , ...}
All my values are between 0 and 1.
Knowing this, im trying to perform a forecast using a LSTM model, and then save those forecasted values on a new column back in my orignal dataframe.
I've tried multiple variations of my code, using chatgpt, anthropic and pilot (because im pretty new about neural networks and coding on python), but none of them suggestions worked. Each variations always retrieves an error that relates on the length pf the keys and values, so i guess all is working except for the part when im trying to store the data back
Sorry if there's any obvious mistakes, but i'd be gratefull about any correction or suggestions.
I let my code, where im trying to loop for each 'ciiu' category to perform the model, get the predictions and then save them back to my dataframe 'df_AB'
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
# AB DataFrame
df_AB = ics_T1
df_AB = df_AB[(df_AB['ciiu'] != 'O842202')]
# New column 'LSTM_forecasted_values' to the DataFrame
df_AB['LSTM_forecasted_values'] = np.nan
# Function to create sequences for input data
def create_sequences(data, seq_length):
X, y = [], []
for i in range(len(data) - seq_length):
X.append(data[i:i+seq_length])
y.append(data[i+seq_length])
return np.array(X), np.array(y)
# Length of the input sequences
seq_length = 20
# LSTM RNN
unique_ciiu = df_AB['ciiu'].unique()
for ciiu in unique_ciiu:
# Select data for the current ciiu category
data = df_AB[df_AB['ciiu'] == ciiu]['ics']
ciiu_data = df_AB[df_AB['ciiu'] == ciiu]
# Sequences for the entire data
X, y = create_sequences(data.values.reshape(-1, 1), seq_length)
# Number of train samples
num_train_samples = int(0.9 * len(X))
# Split the data
X_train, X_test = X[:num_train_samples], X[num_train_samples:]
y_train, y_test = y[:num_train_samples], y[num_train_samples:]
# Get the indices of rows where 'ciiu' matches the current category in the test set
indices_test = ciiu_data.iloc[-len(X_test):].index
# LSTM model
model = Sequential()
model.add(LSTM(units=100, input_shape=(X_train.shape[1], X_train.shape[2]), return_sequences=True))
model.add(LSTM(units=50))
model.add(Dense(units=1))
# Compile the model
model.compile(optimizer='adam', loss='mse')
# Train the model
model.fit(X_train, y_train, epochs=150, batch_size=100, verbose=0)
# Make predictions for the test data
predictions = model.predict(X_test)
# Use indices_test when assigning the predictions back to the DataFrame:
df_AB.loc[indices_test, 'LSTM_forecasted_values'] = predictions.ravel()