How can I measure distances between categories using hierarchical clustering with Orange Data Mining?

25 Views Asked by At

I'm doing text mining with Orange Data Mining. I'm using a corpus of 24 PDF files divided in 5 categories. I want to use hierarchical clustering in order to mesure distances between each category rather than between each text.

I tried doing so by concatenating the texts' content with the "Group By" widget. However, when I place the "Group by Widget" after the "Preprocess Text" widget, it seems to cancel both the filtering and the normalization of my corpus, as stopwords now appear in both "Word Cloud" and "Extract Keywords". I do I fix this problem? Is there another way of mesuring distances between categories using hierarchical clustering?

0

There are 0 best solutions below