Unable to store predictions of a LSTM network back in my original dataframe

16 Views Asked by At

I have a dataframe that contains a category column and a values column, both indexed by date:

date={2023-01 , 2023-02 , ... , 2023-01 , 2023-02 , ...} ciiu={'A2032' , 'A2032' , ... , 'B4030' , 'B4030' , ...} ics={0.04563 , 0.05632 , ... , 0.1123 , 0.1198 , ...}

All my values are between 0 and 1.

Knowing this, im trying to perform a forecast using a LSTM model, and then save those forecasted values on a new column back in my orignal dataframe.

I've tried multiple variations of my code, using chatgpt, anthropic and pilot (because im pretty new about neural networks and coding on python), but none of them suggestions worked. Each variations always retrieves an error that relates on the length pf the keys and values, so i guess all is working except for the part when im trying to store the data back

Sorry if there's any obvious mistakes, but i'd be gratefull about any correction or suggestions.

I let my code, where im trying to loop for each 'ciiu' category to perform the model, get the predictions and then save them back to my dataframe 'df_AB'

    import pandas as pd
    import numpy as np
    from sklearn.model_selection import train_test_split
    from tensorflow.keras.models import Sequential
    from tensorflow.keras.layers import LSTM, Dense

# AB DataFrame
df_AB = ics_T1
df_AB = df_AB[(df_AB['ciiu'] != 'O842202')]

# New column 'LSTM_forecasted_values' to the DataFrame
df_AB['LSTM_forecasted_values'] = np.nan

# Function to create sequences for input data
def create_sequences(data, seq_length):
    X, y = [], []
    for i in range(len(data) - seq_length):
        X.append(data[i:i+seq_length])
        y.append(data[i+seq_length])
    return np.array(X), np.array(y)

# Length of the input sequences
seq_length = 20

# LSTM RNN
unique_ciiu = df_AB['ciiu'].unique()
for ciiu in unique_ciiu:
    # Select data for the current ciiu category
    data = df_AB[df_AB['ciiu'] == ciiu]['ics']
    ciiu_data = df_AB[df_AB['ciiu'] == ciiu]

    # Sequences for the entire data
    X, y = create_sequences(data.values.reshape(-1, 1), seq_length)

    # Number of train samples 
    num_train_samples = int(0.9 * len(X))

    # Split the data
    X_train, X_test = X[:num_train_samples], X[num_train_samples:]
    y_train, y_test = y[:num_train_samples], y[num_train_samples:]
    
    # Get the indices of rows where 'ciiu' matches the current category in the test set
    indices_test = ciiu_data.iloc[-len(X_test):].index

    # LSTM model
    model = Sequential()
    model.add(LSTM(units=100, input_shape=(X_train.shape[1], X_train.shape[2]), return_sequences=True))
    model.add(LSTM(units=50))
    model.add(Dense(units=1))

    # Compile the model
    model.compile(optimizer='adam', loss='mse')

    # Train the model
    model.fit(X_train, y_train, epochs=150, batch_size=100, verbose=0)

    # Make predictions for the test data
    predictions = model.predict(X_test)

    # Use indices_test when assigning the predictions back to the DataFrame:
    df_AB.loc[indices_test, 'LSTM_forecasted_values'] = predictions.ravel()
0

There are 0 best solutions below