I'm getting the following error when I try to create a NeighborLoader for my data.
AttributeError: 'EdgeStorage' object has no attribute 'num_nodes'
I load up my graph and split it into training / test / validation using RandomNodeSplit, then, I try and pass the split data into the NeighborLoader and get the error above.
Code used:
data = torch.load(training_config['data_file'])
targets = pd.read_pickle(training_config['targets_file'])
print(type(data))
# CREATING THE TRAIN, TEST AND VAL MASKS
split = T.RandomNodeSplit(num_val=training_config['validation_split'], num_test=training_config['test_split'])
data_split = split(data)
data_split.num_nodes = data.num_nodes
print(data_split)
print(type(data_split))
sampler = ImbalancedSampler(data_split['word'].y, input_nodes=data_split['word'].train_mask)
loader = NeighborLoader(
data_split,
num_neighbors=[10] * 2,
batch_size=training_config['batch_size'],
input_nodes=data_split['word'].train_mask,
sampler=sampler
)
And here's the result from those print statements:
<class 'torch_geometric.data.hetero_data.HeteroData'>
HeteroData(
num_classes=2,
num_nodes=59565,
word={
y=[39566],
x=[39566, 2],
train_mask=[39566],
val_mask=[39566],
test_mask=[39566],
},
sentence={ x=[19999, 1] },
(word, depGraph, word)={ edge_index=[2, 934] },
(word, head, word)={ edge_index=[2, 934] },
(word, previousWord, word)={ edge_index=[2, 842] },
(word, fromSentence, sentence)={ edge_index=[2, 574] },
(word, nextWord, word)={ edge_index=[2, 842] },
(word, pos, pos)={ edge_index=[2, 39566] },
(word, edge, edge)={ edge_index=[2, 39566] },
(word, feat_aspect, feat_aspect)={ edge_index=[2, 1318] },
(word, feat_case, feat_case)={ edge_index=[2, 1251] },
(word, feat_conjtype, feat_conjtype)={ edge_index=[2, 708] },
(word, feat_definite, feat_definite)={ edge_index=[2, 5349] },
(word, feat_degree, feat_degree)={ edge_index=[2, 2735] },
(word, feat_foreign, feat_foreign)={ edge_index=[2, 19] },
(word, feat_gender, feat_gender)={ edge_index=[2, 169] },
(word, feat_mood, feat_mood)={ edge_index=[2, 307] },
(word, feat_number, feat_number)={ edge_index=[2, 14435] },
(word, feat_numtype, feat_numtype)={ edge_index=[2, 1588] },
(word, feat_person, feat_person)={ edge_index=[2, 1713] },
(word, feat_polarity, feat_polarity)={ edge_index=[2, 35] },
(word, feat_poss, feat_poss)={ edge_index=[2, 142] },
(word, feat_prontype, feat_prontype)={ edge_index=[2, 7914] },
(word, feat_punctside, feat_punctside)={ edge_index=[2, 1344] },
(word, feat_puncttype, feat_puncttype)={ edge_index=[2, 3603] },
(word, feat_tense, feat_tense)={ edge_index=[2, 1895] },
(word, feat_verbform, feat_verbform)={ edge_index=[2, 2457] },
(sentence, nextSentence, sentence)={ edge_index=[2, 39996] }
)
<class 'torch_geometric.data.hetero_data.HeteroData'>
Full Error Trace
Traceback (most recent call last):
File "/path/PyTorchConvert.py", line 117, in <module>
loader = NeighborLoader(
File "/path/anaconda3/envs/graph_builder/lib/python3.8/site-packages/torch_geometric/loader/neighbor_loader.py", line 229, in __init__
neighbor_sampler = NeighborSampler(
File "/path/anaconda3/envs/graph_builder/lib/python3.8/site-packages/torch_geometric/sampler/neighbor_sampler.py", line 101, in __init__
colptr_dict, row_dict, self.perm = to_hetero_csc(
File "/path/anaconda3/envs/graph_builder/lib/python3.8/site-packages/torch_geometric/sampler/utils.py", line 101, in to_hetero_csc
out = to_csc(store, device, share_memory, is_sorted, src_node_time)
File "/path/anaconda3/envs/graph_builder/lib/python3.8/site-packages/torch_geometric/sampler/utils.py", line 66, in to_csc
colptr = index2ptr(col, data.size(1))
File "/path/anaconda3/envs/graph_builder/lib/python3.8/site-packages/torch_geometric/data/storage.py", line 489, in size
self._parent()[self._key[-1]].num_nodes)
File "/path/anaconda3/envs/graph_builder/lib/python3.8/site-packages/torch_geometric/data/storage.py", line 87, in __getattr__
raise AttributeError(
AttributeError: 'EdgeStorage' object has no attribute 'num_nodes'
Process finished with exit code 1
So the types are HeteroData, however, NeighborLoader is getting an EdgeStorage somewhere and I can't use it to batch my data? Are there any fixes or alternatives that I could try to this?
I am able to train a SageConv without batching, however, I'd like to try it and see how it affects my results.