Bokeh checkboxes with multiple dataframe columns

342 Views Asked by At

TLDR: I want to create an interactive visualization with Bokeh where I can toggle the appearance of individual bars in a bar plot based on the values of multiple categorical dataframe columns.

The data

I have a Pandas dataframe with 5 columns. One column contains sample ID numbers (x), one column contains quantitative output data (y), and the other three have categorical data used to classify each sample as big or small, A or B, and blue or red.

data = dict(size=['big', 'big', 'big', 'big', 'small', 'small', 'small', 'small'],
            design=['A', 'A', 'B', 'B', 'A', 'A', 'B', 'B'],
            color=['blue', 'red', 'blue', 'red', 'blue', 'red', 'blue', 'red'],
            x=['1', '2', '3', '4', '5', '6', '7', '8'],
            y=['10', '20', '10', '30', '10', '40', '10', '30'])
data = pd.DataFrame(data)
print(data)

Output:

    size design color  x   y
0    big      A  blue  1  10
1    big      A   red  2  20
2    big      B  blue  3  10
3    big      B   red  4  30
4  small      A  blue  5  10
5  small      A   red  6  40
6  small      B  blue  7  10
7  small      B   red  8  30

The problem

I want to plot the above data as a bar graph, with the x values plotted along the x axis, and the y values plotted along the y axis.

Data from the above dataframe plotted as a bar graph

I also want to toggle the appearance of the bars using something like Bokeh's CheckboxGroup, so that there is a total of 6 selectable checkboxes, one for each of the values in the three categorical columns (big, small, A, B, blue, and red). If all boxes are checked, all bars would be shown. If all but the A boxes are checked, then only half the data is shown (only the half with design value B). If all but the A and blue boxes are checked, none of the data with design value A or color value blue will be shown in the bar plot.

The solution posted to this older StackOveflow question is close to what I want to achieve. However, unlike the dataframe described in the linked post, which only had 3 columns (an X column, a Y column, and a single categorical column which was tied to the Bokeh CheckboxGroup), I have 5 columns, 3 of which are categorical columns that I want tied to selectable checkboxes.

I am not very familiar with JavaScript, so I'm not sure how I could achieve what I am describing with Bokeh.

1

There are 1 best solutions below

3
mosc9575 On

The solution below is based on the simpler check boxes for lines example.

Explenation

Each renderer in bokeh has the attribute visible which is by default True. To hide or show each bar by his own, we need a renderer for each bar. Therefor we have to loop over the rows of the DataFrame.

Inside the JavaScript part we set all bars to visible by default. This is Ture if all boxes are active. Then we remove the bars which are inactive if a Checkbox is not active. The logic is coded by hand and takes the index of the cases from the DataFrame.

The last step is, to set the visible attribute.

Example Code

import pandas as pd
from bokeh.plotting import show, figure, output_notebook
from bokeh.models import CheckboxGroup, CustomJS
from bokeh.layouts import row
output_notebook()

data = dict(size=['big', 'big', 'big', 'big', 'small', 'small', 'small', 'small'],
            design=['A', 'A', 'B', 'B', 'A', 'A', 'B', 'B'],
            color=['blue', 'red', 'blue', 'red', 'blue', 'red', 'blue', 'red'],
            x=['1', '2', '3', '4', '5', '6', '7', '8'],
            y=['10', '20', '10', '30', '10', '40', '10', '30'])

df = pd.DataFrame(data)
df['x'] = df['x'].astype(float)
df['y'] = df['y'].astype(float)

# get a dict for unique deciders by decider columns
selections = {k: list(df[k].unique()) for k in ['size','design','color']}

# names an indexes for the names are both collected as lists
names = []
indexes = []
for col, items in selections.items():
    names += items
    for item in items:
        indexes.append(list(df[df[col]==item].index))

p=figure(width=300, height=300)
bar_renderer = []
for i, item in df.iterrows():
    bar_renderer.append(
        p.vbar(x=item['x'], top=item['y'], width=0.7, color=item['color'])
    )

checkbox = CheckboxGroup(labels=names, active=list(range(len(names))), width=100)
callback = CustomJS(args=dict(bars=bar_renderer,checkbox=checkbox, indexes=indexes),
    code="""
    function removeItems(arr, values){
      for (let value of values){
        const index = arr.indexOf(value);
        if (index > -1) {
          arr.splice(index, 1);
        }
      }
      return arr;
    }
    // initalize all bars as active
    let active = [...Array(bars.length).keys()];

    // loop over all checkboxes, remove indexes from active
    // if checkbox is inactive
    for(var i=0; i<checkbox.active.length; i++){
        if (!checkbox.active.includes(i)){
            active = removeItems(active, indexes[i])
        }
    }
    // set bar to visible if value is in active, else invisible
    for(var i=0; i<bars.length; i++){
        bars[i].visible = active.includes(i);
    }
    """
)
checkbox.js_on_change('active', callback)
show(row(p,checkbox))

Output

bar plot with toggled visibility