Appropriateness of Labels by Unsupervised Clustering of Cognitive Distortion Sentence

16 Views Asked by At

For my project, I have data that is already correctly classified (cognitive distortion) and with the appropriate label (18 different types). What I am trying to do is to cluster the phrases without using the labels, to see if the labels are justified. To do this, I used a sentence-transformer with a pre-trained model, then I reduced the dimension before clustering: k-means with 18 as a parameter and HDBSCAN (with a more exploratory purpose). Then I tried to look at the consistency of the clusters.

Does this seem like a correct pipeline to you, or would you have done things differently?

0

There are 0 best solutions below