I have a dataset, texts
, that I want to display using a scatterplot. The texts represented in this dataset have different authors, and are written in different languages. I want to represent the language of a text on the scatterplot using colours, and the authors using shapes.
The dataset is a numpy array of 2D vector coordinates:
import numpy as np
texts = np.array([[1, 2], [1, 1], [2, 1], [1, 11], [8, 1], [8, 7], [9, 7], [8, 9], [8, 8], [12, 11], [11, 12]])
I have lists of languages and author IDs that are matched by index to the data points in the texts
array.
languages = [0, 0, 0, 0, 2, 1, 1, 1, 1, 2, 2]
authors = [0, 0, 0, 2, 0, 1, 1, 1, 1, 2, 2]
language_names = ["Irish", "English", "Latin"]
author_names = ["Seán", "James", "Paul"]
I can now plot the data points from texts
on a scatterplot and colour code them to represent the language of the text.
import matplotlib.pyplot as plt
import matplotlib.colors as mplcol
col_dict = mplcol.TABLEAU_COLORS
col_list = [i for i in col_dict]
fig, (plot) = plt.subplots(1, 1, sharex=False, sharey=False, figsize=(10, 10))
for unique_colour_num in sorted(list(set(languages))):
colour_data = list()
for i, j in enumerate(texts):
if languages[i] == unique_colour_num:
colour_data.append(j)
colour_data = np.array(colour_data)
x = [i[0] for i in colour_data]
y = [i[1] for i in colour_data]
plot.scatter(x, y, s=30, c=col_dict.get(col_list[unique_colour_num - 1]), label=language_names[unique_colour_num])
plot.legend(loc="best")
plot.set_title("Texts with Languages and Authors")
plt.show()
This creates a scatterplot which displays the data from texts
, coloured in accordance with what language the text is in. The languages show up on the legend for the scatterplot also.
What I would like to do now is add shapes to the data points on the plot to represent who the authors are for each text. Maybe a star for author 0
, a square for author 1
, and a triangle for author 2
.
Unfortunately, I can't work out how to do this. Firstly, I'm not sure how to get shapes in the scatterplot. More importantly, though, it seems that when I plot an individual point on the graph, I can only enter one type of data label for it, in this case label=language_names[unique_colour_num]
. I don't know how I can add a second type of label for the same data point.
So, how can I include shapes in the plot and legend, which correspond to the same data point but represent different information (authors) than the colours I'm already using to represent languages?