I've created a lazy dataloader in Pytorch using linecache. It pulls from a tsv file that I also use to building the vocabulary with Pytorch's build_vocab, so I need to have a header line of titles for each of the columns.
For the dataset I'm using getitem:
def __getitem__(self, index):
"Generates one sample of data"
line = linecache.getline(self._filepath, index + 1)
However, since linecache doesn't load the whole file in memory, there's no obvious way to skip the first line/header of the tsv file. I tried "if index == 0: pass" but this obviously returned None that threw a different error.
My current solution is just to have two tsvs, one with a header and one without.