Tensorflow Data Pipeline: Read text files from sub-directories for Seq2Seq Models

12 Views Asked by Krishnang K Dalal At 24 August 2023 at 01:56

I am trying to create a data pipeline using Tensorflow's text_dataset_from_directory method for training a seq-to-seq model. The folder structure as below:

BBC News Articles
   |_ News Articles
      |_Business
          |_001.txt, 002.txt
      |_Sports
          |_001.txt, 002.txt
   |_Summaries
      |_Business
          |_001.txt, 002.txt
      |_Sports
          |_ 001.txt, 002.txt

How do I create a tensorflow pipeline that reads the text files where the data is read from the folders. News Articles are is the input and Summaries is the target. I have tried the following but it doesn't maintain the index of the inputs and the target

articles_path = "./BBC News Summary/News Articles/"
summary_path = "./BBC News Summary/Summaries/"

batch_size = 32
seed = 42

articles_data = utils.text_dataset_from_directory(
                    articles_path,
                    labels=None,
                    batch_size = batch_size,
                    validation_split=0.2,
                    subset='training',
                    seed=seed)

summary_data = utils.text_dataset_from_directory(
                    summary_path,
                    labels=None,
                    batch_size = batch_size,
                    validation_split=0.2,
                    subset='training',
                    seed=seed)

If I just pass "./BBC News Articles" as the path then it reads all the files under one training set and doesn't create labels which would be the Summaries. Your help is much appreciated. Thank you.

Original Q&A

Tensorflow Data Pipeline: Read text files from sub-directories for Seq2Seq Models

There are 0 best solutions below

Related Questions in DEEP-LEARNING

Related Questions in TENSORFLOW2.0

Related Questions in TENSORFLOW-DATASETS

Related Questions in SEQ2SEQ

Trending Questions

Popular # Hahtags

Popular Questions