Plotting multiple grouped scatter graphs with regression lines on the same axes

Question

Plotting multiple grouped scatter graphs with regression lines on the same axes

34 Views Asked by Reptile Smile At 28 March 2024 at 20:19

I'm trying to create a data vis which is essentially a scattergraph.

A snip of the data looks like this:

Location	Year	A	B
Afghanistan	1990	48.3973	1377.85863
Afghanistan	2020	65.432	620.40751
Albania	1990	76.4334	32.87494
Albania	2020	79.6756	8.27645
Algeria	1990	68.5167	205.81979
Algeria	2020	75.9125	77.69479
Angola	1990	82.6756	1101.50579
Angola	2020	83.1922	221.90649
Antigua and Barbuda	1990	73.6497	58.0267
Antigua and Barbuda	2020	80.7452	21.24795
Argentina	1990	75.9699	87.81427
Argentina	2020	81.3155	44.90008

That is, each country has a pair of points. I'd like to plot A against B, and then ideally have each country's pair of points (in a perfect world) in a different colour (ideally with labels, showing which country they are and which year).

I then would like to plot a regression line of all the 1990 points, and a regression line on the same axes of the 2020 points.

In Excel would be ideal, but I could potentially use something else if it would be more appropriate. Any help would be very gratefully received.

Original Q&A

There are 1 best solutions below

**Tino D** · Answer 1 · 2024-03-30T14:28:03.037000

Since you are open to other options, I thought I can answer your question in python, if that's ok. Here's how you can do it with seaborn, a nice data visualization package that has a lot of helpful functions for statistics and ML.

First, to get you started with python, I would recommend anaconda, as their installation already includes a tonne of nice packages which are all compatible with each other.

After getting acquainted, you can import the following packages for the example:

import seaborn as sns # nice plotting
%matplotlib notebook
import matplotlib.pyplot as plt # package also for plottingm the above line is a magic command to define the backend
import pandas as pd # for data handling

I defined the data without Afghanistan for now, as it seems to be a bad outlier that negatively affects the regression models, I'll show that at the end. For now, here are the results without AFG:

data = {
    'Location': ['Albania', 'Albania', 'Algeria', 'Algeria', 'Angola', 'Angola', 'Antigua and Barbuda', 'Antigua and Barbuda', 'Argentina', 'Argentina'],
    'Year': [1990, 2020, 1990, 2020, 1990, 2020, 1990, 2020, 1990, 2020],
    'A': [76.4334, 79.6756, 68.5167, 75.9125, 82.6756, 83.1922, 73.6497, 80.7452, 75.9699, 81.3155],
    'B': [32.87494, 8.27645, 205.81979, 77.69479, 1101.50579, 221.90649, 58.0267, 21.24795, 87.81427, 44.90008]
} # define the data without afghanistan
df = pd.DataFrame(data) # generate dataframe
plt.figure() # figure
sns.scatterplot(data=df, x='A', y='B', hue='Location', style='Year') # scatter plot the data, changing style for year and colour with country
sns.regplot(data=df[df['Year'] == 1990], x='A', y='B', scatter=False, color='blue') # plot model for 1990
sns.regplot(data=df[df['Year'] == 2020], x='A', y='B', scatter=False, color='orange') # plot model for 2000
plt.grid() # add grid
plt.legend(loc = "lower right", ncols = 3) # and legend

This leads to the following plot:

Of course, if you have the data in a better way, you can use pd.read_excel() or pd.read_csv() to get the data from saved files.

Check the data with Afghanistan here:

As you see, the regression primarily captures the variance found between AFG and the other groups. This can be solves, for example, by using robust regression instead.

I hope this answer encourages you to start learning python and doing this. The advantage would be that there are a lot of resources these days to get started and to even master python. Also, chatGPT can help you get started (you will come back complaining about it anyway ;) )

Plotting multiple grouped scatter graphs with regression lines on the same axes

There are 1 best solutions below

Related Questions in EXCEL

Related Questions in REGRESSION

Related Questions in SCATTER-PLOT

Trending Questions

Popular # Hahtags

Popular Questions