BERTopic: add legend to term score decline

57 Views Asked by At

I plot the term score decline for a topic model I created on Google Colab with BERTopic. Great function. Works neat! But I need to add a legend. This parameter is not specified in the topic_model.visualize_term_rank() function. Only the figure title, width, height, log transformation and topics to be plotted can be adjusted.

The term score decline function outputs a plotly figure and is based on tmtoolkit. So I tried to tweak it there. But I cannot load tmtoolkit in colab. Tried import tmtoolkit and from tmtoolkit import topicmod patching things together from the reference. I found another colab file that uses tmtoolkit but it gave me the same error. Colab does not find tmtoolkit. So not a problem with my file but seems to be a general issue?

Another solution is to update the plotly figure. But how? Had a ponder in the reference and tried

import plotly.graph_objects as go
import plotly.express as px

labels = topic_model.topic_labels_
fig = model.visualize_term_rank(title = 'Topic Coherence', custom_labels = True)
fig.update_legends(patch = labels)
fig

This throws the below error.

TypeError: object of type 'int' has no len() What does it mean? It seems illogical to me that the length cannot be computed for an integer variable.

When I omit the line fig.update_legends(patch = labels) it produces the figure but without the legend I need.

Up-date 2023-12-15 Occurred to me that maybe tmtoolkit has to be installed liked BERTopic. I was convinced it could be loaded like tm or nltk. Be it as it may, below solution works and is fast. Took less than a second in Colab to produce figure.

1

There are 1 best solutions below

1
EricLavault On

The simplest would be to update the plotly figure, knowing that fig['data'] contains a list of scatter traces, one per topic.

Inspecting the source code also reveals that each trace is created with an empty name and that the topic labels are set as hovertext, so you can do the following :

fig = model.visualize_term_rank(title = 'Topic Coherence', custom_labels = True)

for trace in fig['data']:
    topic_label = trace['hovertext']

    # The trace name appears as the legend item for that trace.
    trace['name'] = topic_label 

    # Whether or not the item corresponding to this trace is shown in the legend.
    trace['showlegend'] = True

    # You could also override the color for the given topic, eg. something like
    # trace['line']['color'] = _topic_color(topic_label)

# And the most important obviously, whether or not to draw the legend.
fig.update_layout(showlegend=True)

See fig.update_layout(showlegend=True)