Update What the state of the art way to build LSTM data: data_generator or tf.data.Dataset.window?

28 Views Asked by At

I always did my own code to format my data (3D, normalize...) for my LSTM models. Now I have to work with bigger dataset and need to ingest many csv files. What is the best way to make all the work fast (reducing IO) and memory efficiency.

Tensorflow suggest a data generator and finaly convert data set to data.dataset and I found guy doing thing like this:

WINDOW_SIZE = 72
BATCH_SIZE = 32
dataset = (
    tf.data.Dataset.from_tensor_slices(dataset_train)
    .window(WINDOW_SIZE, shift=1)
    .flat_map(lambda seq: seq.batch(WINDOW_SIZE))
    .map(lambda seq_and_label: (seq_and_label[:,:-1], seq_and_label[-1:,-1]))
    .batch(BATCH_SIZE)
)

I realy want to learn the best way, my goal is to use my code in production and learn in the futur more about Mlops. Thank for your help and if you have good explained exemple to set up 3d lstm data.dataset, I take all suggestion

0

There are 0 best solutions below