I am using PyTorch's LSTM api, but have a bit of an issue. I'm using an LSTM for a dummy AI model. The task of the model is to return 1 if the previous number is less than the current one.
So for an array like [0.7, 0.3, 0.9, 0.99], the expected outputs are [1.0, 0.0, 1.0, 1.0]. The first output should be 1.0 no matter what.
I designed the following network to try this problem:
# network.py
import torch
N_INPUT = 1
N_STACKS = 1
N_HIDDEN = 3
LR = 0.001
class Network(torch.nn.Module):
# params: self
def __init__(self):
super(Network, self).__init__()
self.lstm = torch.nn.LSTM(
input_size=N_INPUT,
hidden_size=N_HIDDEN,
num_layers=N_STACKS,
)
self.linear = torch.nn.Linear(N_HIDDEN, 1)
self.relu = torch.nn.ReLU()
self.optim = torch.optim.Adam(self.parameters(), lr=LR)
self.loss = torch.nn.MSELoss()
# params: self, predicted, expecteds
def backprop(self, xs, es):
# perform backprop
self.optim.zero_grad()
l = self.loss(xs, torch.tensor(es))
l.backward()
self.optim.step()
return l
# params: self, data (as a python array)
def forward(self, dat):
out, _ = self.lstm(torch.tensor(dat))
out = self.relu(out)
out = self.linear(out)
return out
And I am calling this from this file:
# main.py
import network
import numpy as np
# create a new network
n: network.Network = network.Network()
# create some data
def rand_array():
# a bunch of random numbers
a = [[np.random.uniform(0, 1)] for i in range(1000)]
# now, our expected value is 0 if the previous number is greater, and 1 else
expected = [0.0 if a[i - 1][0] > a[i][0] else 1.0 for i in range(len(a))]
expected[0] = 1.0 # make the first element always just 1.0
return [a, expected]
# a bunch of random arrays
data = [rand_array() for i in range(1000)]
# 100 epochs
for i in range(100):
for i in data:
pred = n(i[0])
loss = n.backprop(pred, i[1])
print("Loss: {:.5f}".format(loss))
Now, when I run this program, I'm just getting a loss around 0.25, and it isn't really changing once it gets there. I think the model is just picking the average value of 0 and 1 (0.5) for each input.
This leads me to the belief that the model can't see the previous data; the data is just random numbers (the expected output is based on these random numbers, though), and the model can't remember what happened before.
What is my issue?
I don't see a hidden state issue. You're using a weird setup which is probably hindering learning.
First we want a proper dataset/dataloader
Now the model. Add an input projection layer, remove the unnecessary relu after the lstm module.
Now we set up training. We're predicting a binary output, so we want to use
BCEWithLogitsLossrather thanMSELoss. Using MSE for categorical prediction doesn't make sense.After training, test performance