Dynamic label with number of points in scatterplot based on transform selection in Altair?

54 Views Asked by At

tl;dr: is there a way to get a dynamic count of points in filtered plots in Altair?


In the 'filtering by selection' example for Altair, it is possible to update a scatterplot based on a selection in a linked bar chart.

import altair as alt
from vega_datasets import data

cars = data.cars()

click = alt.selection_point(encodings=['color'])

scatter = alt.Chart(cars).mark_point().encode(
    x='Horsepower:Q',
    y='Miles_per_Gallon:Q',
    color='Origin:N'
).transform_filter(
    click
)

hist = alt.Chart(cars).mark_bar().encode(
    x='count()',
    y='Origin',
    color=alt.condition(click, 'Origin', alt.value('lightgray'))
).add_params(
    click
)

scatter & hist

unfiltered charts

filtered charts

However, it would be useful to show how many data points remain in the scatterplot after filtering. How can this be achieved?

It is possible to get a static count of data in a dataframe and use that for a label, which I tried with another approach eg:


import altair as alt
import pandas as pd

# Sample data
data = pd.DataFrame({'values': [1, 2, 3, 4, 5, 1, 2, 3]})
total_count = data['values'].sum()

# Create data frame for total count
count_data = pd.DataFrame({'label': ['Total Count'], 'count': [total_count]})

# Combine data
combined_data = pd.concat([data, count_data])

# Create histogram with text for total count
hist = alt.Chart(combined_data).mark_bar().encode(
    x='values:Q',
    y='count()',
)

text = hist.mark_text(
    color='black',
    baseline='middle'
).encode(
    x=alt.value(200),  # Adjust position as needed
    y=alt.value(40),
    text='count:Q',
)

# Display the chart
hist + text

This adds a rather ugly label:

labelled chart

Is there a way to append a dynamic label with the number of points present in the filtered scatterplot? An external element would be fine, but I am quite new to Altair and haven't figured that bit out yet despite some searching.

1

There are 1 best solutions below

6
bertieb On

To show a dynamic count of filtered data in Altair, use transform_aggregate

Showing how many data points are present on a plot after filtering can be done with a combination of transform_aggregate to retrive a count, and text marks.

The aggregate transformation can be added on to the the plot which is getting filtered, eg: filtered_plot.transform_aggregate(count='count(*)'). The Text mark can then be applied as you would any label.

Example using iris sepal data

Note: adding the text with count adds an 'undefined' field in the legend, which I have yet to find a way to remove

filtering with count

Code:

import altair as alt
from vega_datasets import data  # Example dataset library

# Sample data - iris sepal length
source = data('iris')

# Create the scatterplot - eg length by width, coloured by species
scatterplot = alt.Chart(source).mark_point().encode(
    x='sepalLength:Q',
    y='sepalWidth:Q',
    color='species:N'
)

# Create a selection to filter points
selection = alt.selection_point(encodings=['color'], resolve='global')

# count 
countplot = alt.Chart(source).mark_bar().encode(
    y='species:N',
    x='count()',
    color=alt.condition(selection, 'species:N', alt.ColorValue('gray'))
).add_params(selection)


# Filter scatterplot based on selection
filtered_scatterplot = scatterplot.transform_filter(selection)

# Calculate count of filtered points
filtered_count = filtered_scatterplot.transform_aggregate(
    count='count(*)')

# Display count as text
text = filtered_count.mark_text(
    color='black',
    fontSize=14,
    baseline='middle'
).encode(
    x=alt.value(100),  # Adjust position as needed
    y=alt.value(10),
    text='count:Q',
)

# Display the chart with linked selection and count
filtered_scatterplot & text & countplot