I have a dataset, texts, that I want to display using a scatterplot. The texts represented in this dataset have different authors, and are written in different languages. I want to represent the language of a text on the scatterplot using colours, and the authors using shapes.

The dataset is a numpy array of 2D vector coordinates:

import numpy as np
texts = np.array([[1, 2], [1, 1], [2, 1], [1, 11], [8, 1], [8, 7], [9, 7], [8, 9], [8, 8], [12, 11], [11, 12]])

I have lists of languages and author IDs that are matched by index to the data points in the texts array.

languages = [0, 0, 0, 0, 2, 1, 1, 1, 1, 2, 2]
authors = [0, 0, 0, 2, 0, 1, 1, 1, 1, 2, 2]

language_names = ["Irish", "English", "Latin"]
author_names = ["Seán", "James", "Paul"]

I can now plot the data points from texts on a scatterplot and colour code them to represent the language of the text.

import matplotlib.pyplot as plt
import matplotlib.colors as mplcol

col_dict = mplcol.TABLEAU_COLORS
col_list = [i for i in col_dict]

fig, (plot) = plt.subplots(1, 1, sharex=False, sharey=False, figsize=(10, 10))

for unique_colour_num in sorted(list(set(languages))):
    colour_data = list()
    for i, j in enumerate(texts):
        if languages[i] == unique_colour_num:
            colour_data.append(j)
    colour_data = np.array(colour_data)
    x = [i[0] for i in colour_data]
    y = [i[1] for i in colour_data]
    plot.scatter(x, y, s=30, c=col_dict.get(col_list[unique_colour_num - 1]), label=language_names[unique_colour_num])

plot.legend(loc="best")

plot.set_title("Texts with Languages and Authors")

plt.show()

This creates a scatterplot which displays the data from texts, coloured in accordance with what language the text is in. The languages show up on the legend for the scatterplot also.

What I would like to do now is add shapes to the data points on the plot to represent who the authors are for each text. Maybe a star for author 0, a square for author 1, and a triangle for author 2.

Unfortunately, I can't work out how to do this. Firstly, I'm not sure how to get shapes in the scatterplot. More importantly, though, it seems that when I plot an individual point on the graph, I can only enter one type of data label for it, in this case label=language_names[unique_colour_num]. I don't know how I can add a second type of label for the same data point.

So, how can I include shapes in the plot and legend, which correspond to the same data point but represent different information (authors) than the colours I'm already using to represent languages?

0

There are 0 best solutions below