PyTorch LSTM not using hidden layer

Question

PyTorch LSTM not using hidden layer

31 Views Asked by Andrew Baker At 19 March 2024 at 13:59

I am using PyTorch's LSTM api, but have a bit of an issue. I'm using an LSTM for a dummy AI model. The task of the model is to return 1 if the previous number is less than the current one.

So for an array like [0.7, 0.3, 0.9, 0.99], the expected outputs are [1.0, 0.0, 1.0, 1.0]. The first output should be 1.0 no matter what.

I designed the following network to try this problem:

# network.py

import torch

N_INPUT = 1
N_STACKS = 1
N_HIDDEN = 3

LR = 0.001


class Network(torch.nn.Module):

    # params: self
    def __init__(self):
        super(Network, self).__init__()

        self.lstm = torch.nn.LSTM(
            input_size=N_INPUT,
            hidden_size=N_HIDDEN,
            num_layers=N_STACKS,
        )

        self.linear = torch.nn.Linear(N_HIDDEN, 1)
        self.relu = torch.nn.ReLU()

        self.optim = torch.optim.Adam(self.parameters(), lr=LR)
        self.loss = torch.nn.MSELoss()

    # params: self, predicted, expecteds
    def backprop(self, xs, es):

        # perform backprop
        self.optim.zero_grad()
        l = self.loss(xs, torch.tensor(es))
        l.backward()
        self.optim.step()

        return l

    # params: self, data (as a python array)
    def forward(self, dat):

        out, _ = self.lstm(torch.tensor(dat))

        out = self.relu(out)
        out = self.linear(out)

        return out

And I am calling this from this file:

# main.py

import network

import numpy as np

# create a new network
n: network.Network = network.Network()


# create some data
def rand_array():

    # a bunch of random numbers
    a = [[np.random.uniform(0, 1)] for i in range(1000)]

    # now, our expected value is 0 if the previous number is greater, and 1 else
    expected = [0.0 if a[i - 1][0] > a[i][0] else 1.0 for i in range(len(a))]
    expected[0] = 1.0  # make the first element always just 1.0

    return [a, expected]


# a bunch of random arrays
data = [rand_array() for i in range(1000)]

# 100 epochs
for i in range(100):
    for i in data:

        pred = n(i[0])
        loss = n.backprop(pred, i[1])
        print("Loss: {:.5f}".format(loss))

Now, when I run this program, I'm just getting a loss around 0.25, and it isn't really changing once it gets there. I think the model is just picking the average value of 0 and 1 (0.5) for each input.

This leads me to the belief that the model can't see the previous data; the data is just random numbers (the expected output is based on these random numbers, though), and the model can't remember what happened before.

What is my issue?

Original Q&A

There are 1 best solutions below

**Karl** · Answer 1 · 2024-03-19T19:36:00.887000

I don't see a hidden state issue. You're using a weird setup which is probably hindering learning.

First we want a proper dataset/dataloader

import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
import numpy as np
import torch.nn.functional as F

class RandDataset(Dataset):
    def __init__(self, sequence_length):
        self.sequence_length = sequence_length
    
    def __len__(self):
        return 10000 # data is generated so length is arbitrary 
    
    def __getitem__(self, idx):
        sequence = torch.rand(self.sequence_length)
        labels = torch.ones_like(sequence)
        labels[1:] = sequence[:-1] < sequence[1:]
        
        sequence = sequence[None,:,None] # shape (1, sequence_length, 1)
        labels = labels[None,:] # shape (1, sequence_length)
        
        return sequence, labels

def collate_fn(batch):
    sequences = torch.cat([i[0] for i in batch])
    labels = torch.cat([i[1] for i in batch])
    return sequences, labels

dataset = RandDataset(1000)
dataloader = DataLoader(dataset, batch_size=32, collate_fn=collate_fn)

Now the model. Add an input projection layer, remove the unnecessary relu after the lstm module.

class LSTMModel(nn.Module):
    def __init__(self, d_in, d_proj, d_hidden, n_layers):
        super().__init__()
        
        self.input_layer = nn.Linear(d_in, d_proj)
        self.lstm = nn.LSTM(input_size=d_proj, hidden_size=d_hidden, num_layers=n_layers, batch_first=True)
        self.output_layer = nn.Linear(d_hidden, 1)
        
    def forward(self, x):
        x = self.input_layer(x)
        x = F.relu(x)
        
        x, _ = self.lstm(x)
        
        x = self.output_layer(x)
        return x

model = LSTMModel(1, 32, 64, 2)

Now we set up training. We're predicting a binary output, so we want to use BCEWithLogitsLoss rather than MSELoss. Using MSE for categorical prediction doesn't make sense.

device = 'cuda'

opt = torch.optim.Adam(model.parameters(), lr=1e-3)
loss_function = nn.BCEWithLogitsLoss()

epochs = 1

model.to(device);

for epoch in range(epochs):
    for i, batch in enumerate(dataloader):
        seqs, labs = batch
        
        seqs = seqs.to(device)
        labs = labs.to(device)
        
        preds = model(seqs)
        loss = loss_function(preds.reshape(-1), labs.reshape(-1))
        
        if i%10==0:
            print(f'{loss.item():.3f}')
        
        opt.zero_grad()
        loss.backward()
        opt.step()

After training, test performance

model.eval()
seq, lab = dataset[0]

pred = model(seq.to(device)).cpu()

pred = (torch.sigmoid(pred)>0.5).float().squeeze()
lab = lab.squeeze()

acc = (pred == lab).float().mean()
print(acc)
> tensor(0.9990)

PyTorch LSTM not using hidden layer

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in MACHINE-LEARNING

Related Questions in PYTORCH

Related Questions in LSTM

Trending Questions

Popular # Hahtags

Popular Questions