Python Folium - Combine states in choropleth map of USA

207 Views Asked by At

I want to create a custom choropleth map for USA in which instead of showing states individually, I'd like to combine some states together e.g. Louisiana, Mississippi, Alabama, and Arkansas.

Below is some sample code with data from Folium github page. I tried to add a column named "region" to state_data which was set to unique values for all other states except the ones indicated above and changed the columns argument in folium.Choroplet to region but that didn't work either. Open to using another package besides folium (plotly etc.).

Sample Code:

import folium
import pandas as pd

sample_map = folium.Map(location=[40, -95], zoom_start=4)

url = (
    "https://raw.githubusercontent.com/python-visualization/folium/main/examples/data"
)
state_geo = f"{url}/us-states.json"
state_unemployment = f"{url}/US_Unemployment_Oct2012.csv"
state_data = pd.read_csv(state_unemployment)

folium.Choropleth(
    geo_data=state_geo,
    name="choropleth",
    data=state_data,
    columns=["Region", "Unemployment"],
    key_on="feature.id",
    fill_color="YlGn",
    fill_opacity=0.7,
    line_opacity=.1,
    legend_name="Unemployment Rate (%)",
).add_to(sample_map)

folium.LayerControl().add_to(sample_map)
sample_map.save('test.html')
3

There are 3 best solutions below

0
thetaco On

You are using a JSON that describes the US states with JSON data. You can modify this JSON to combine all of the desired states into a "South Central" region, while preserving the code you have already written. By representing all four states as one state, and combining their unemployment rate, you can use Folium and get the desired outcome.

Combining geojson data is pretty difficult to do manually, so I would recommend using a sophisticated tool like QGIS to combine the four states. You can import the JSON to QGIS, select the desired states, and click "Merge Selected Features" as shown here:

enter image description here

Merging all the states creates the following:

enter image description here

Great, that is what we want- the new quad-state. After this, you can export is as a geojson file and use this for the geographical data of the US.

You can adjust this code to use the new geojson data. I have connected this JSON to my Github for ease of download and reference since I can't post a massive JSON here, and have referenced the Github file in my code. Along with merging the geodata for the JSON, you also need to combine unemployment rates for the given states. I have used the new geojson and combined rates in my code:

import folium
import pandas as pd

southcentral_geo = 'https://raw.githubusercontent.com/samwilliamsprojects/southCentralRegion/main/southcentraljson.geojson' #location of adjusted geojson data
url = "https://raw.githubusercontent.com/python-visualization/folium/main/examples/data"
state_unemployment = f"{url}/US_Unemployment_Oct2012.csv"
state_data = pd.read_csv(state_unemployment)
# set the cumulative 4 states to the same avg unemployment
states = ["AR", "AL", "LA", "MS"]
total = 0
for state in states:
    total += float(state_data.loc[state_data['State'] == state]['Unemployment'])
avg = total / float(len(states))
state_data.loc[state_data['State'] == "AL", 'Unemployment'] = avg

sample_map = folium.Map(location=[35, -85], zoom_start=5)

folium.Choropleth(
    geo_data=southcentral_geo,  # new geojson data
    name="choropleth",
    data=state_data,
    columns=["State", "Unemployment"],
    key_on="feature.properties.id",
    fill_color="YlGn",
    fill_opacity=0.7,
    line_opacity=0.1,
    legend_name="Unemployment Rate (%)",
).add_to(sample_map)

folium.LayerControl().add_to(sample_map)
sample_map.save('1states_combined_unemployment.html')

With the new GeoJSON data, you can see all states combined, with combined unemployment:

enter image description here

Note that zooming in will still show states names, but theres will not be a hard-defined border, and unemployment rates represents an average of all the specified states so they all share the same value. You can see that Tennessee looks similar to the quad-state, but it has a thicker state line than the quad states; it is purely coincidence that the color is the same, as it has a similar unemployment rate to the quad average. Also, in the GeoJSON I provided, I combined all states into one called "Alabama"- I realize this may be confusing, and you may want to modify it to be something like "South Central Region".

0
Johnny Cheesecutter On

In python you can use either "geopandas" or "shapely" to combine shapes. If you check the link with the states you'll find that the input data is in "GeoJSON" format. This format can be imported to geopandas or shapely for later processing.

Here is a snippet for the geopandas:

import geopandas as gpd
url = (
    "https://raw.githubusercontent.com/python-visualization/folium/main/examples/data"
)
state_geo = f"{url}/us-states.json"

gdf = gpd.read_file(state_geo)

# selecting three states
selected_rows = gdf[gdf['id'].isin({'AL','AK','AZ'})]

# merging polygons of the selected states
new_polygon = selected_rows.unary_union 

# creating new row with name "ZZ"
new_row = pd.Series(['ZZ', 'AL+AK+AZ', new_polygon], index=gdf.columns)

# add row to data frame
gdf.loc[gdf.shape[0]] = new_row

enter image description here

0
Johnny Cheesecutter On

Following my last answer attaching the full code to the solution:

import folium
import pandas as pd
import geopandas as gpd
import numpy as np


# read geometries into geopandas
url = (
    "https://raw.githubusercontent.com/python-visualization/folium/main/examples/data"
)
state_geo = f"{url}/us-states.json"
gdf = gpd.read_file(state_geo)

# read unemployment data
state_unemployment = f"{url}/US_Unemployment_Oct2012.csv"
state_data = pd.read_csv(state_unemployment)

# merge geometries and unemployment data
gdf.rename(columns = {"id":'State'}, inplace=True)


# create region column
gdf['region'] = gdf['State']

# group several states together
filt = gdf['name'].isin(['Louisiana','Mississippi','Alabama', 'Arkansas'])
gdf.loc[filt, 'region'] = 'LMAA'
gdf.loc[filt, 'name'] = 'union'

# averaging Unemployment rate by region 
state_data = state_data.merge(gdf[['State','region']])
state_data = state_data.groupby('region', as_index=False)['Unemployment'].mean()


# Create polygons for Regions by doing groupby union on State polygons 
gdf = gdf.dissolve(by='region',aggfunc='first')



sample_map = folium.Map(location=[40, -95], zoom_start=4)

folium.Choropleth(
    geo_data=gdf,
    name="choropleth",
    data=state_data,
    columns=["region","Unemployment"],
    key_on="feature.id",
    fill_color="YlGn",
    # fill_opacity=0.7,
    # line_opacity=.1,
    legend_name="Unemployment Rate (%)",
).add_to(sample_map)


folium.LayerControl().add_to(sample_map)
sample_map