I'm trying to transform some data read from CSV files using tf.data pipelines and overlapping windows and its not working as expected. All the documentation is not providing clear explanation on how to deal with this case. The columns of the csv files are 'timestamp','open','high', 'low', 'close', 'volume'.
dataset = tf.data.experimental.make_csv_dataset(
file_pattern="/path/stock/*1min*.csv",
batch_size=1,
num_epochs=1,
shuffle=False,
header=False,
column_names=['timestamp','open','high', 'low', 'close', 'volume'],
column_defaults=[tf.string, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32]
).window(
size=5, # Number of rows per window
shift=1, # Stride for overlapping windows
stride=1
)
This produces the following structure:
-WindowDataset
--OrderedDict
---VariantDataset
----Tensor (single element)
----Tensor...
This is not allowing me to transform in a simple way because OrderedDict has not batch method and I cannot flatten following the documentation.
dataset = tf.data.experimental.make_csv_dataset(
file_pattern="/path/stock/*1min*.csv",
batch_size=1,
num_epochs=1,
shuffle=False,
header=False,
column_names=['timestamp','open','high', 'low', 'close', 'volume'],
column_defaults=[tf.string, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32]
).window(
size=5, # Number of rows per window
shift=1, # Stride for overlapping windows
stride=1
).flat_map(lambda window: window.batch(5))
Gives the following error:
AttributeError Traceback (most recent call last)
<ipython-input-47-46d1550f08a0> in <cell line: 1>()
11 shift=1, # Stride for overlapping windows
12 stride=1
---> 13 ).flat_map(lambda window: window.batch(5))
19 frames
/tmp/__autograph_generated_filersrgq3km.py in <lambda>(lscope)
3
4 def inner_factory(ag__):
----> 5 tf__lam = lambda window: ag__.with_function_scope(lambda lscope: ag__.converted_call(window.batch, (5,), None, lscope), 'lscope', ag__.STD)
6 return tf__lam
7 return inner_factory
AttributeError: in user code:
File "<ipython-input-47-46d1550f08a0>", line 13, in None *
lambda window: window.batch(5)
AttributeError: 'collections.OrderedDict' object has no attribute 'batch'
If I try to batch the datasets of the OrderedDict, I get the following error
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-59-16e14170c082> in <cell line: 25>()
23
24
---> 25 data = dataset.map(extract)
26
27
35 frames
/usr/local/lib/python3.10/dist-packages/tensorflow/python/framework/tensor.py in __getattr__(self, name)
259 tf.experimental.numpy.experimental_enable_numpy_behavior()
260 """)
--> 261 self.__getattribute__(name)
262
263 @property
AttributeError: in user code:
File "<ipython-input-59-16e14170c082>", line 3, in extract *
opens = data.get('open').flat_map(lambda x: x.batch(5))
AttributeError: 'SymbolicTensor' object has no attribute 'batch'
This is becoming extremely confusing.
What would be a the right way to transform this structure so that I can later apply better transformations to build a timeseries dataset.