Goal: I am having trouble creating a histogram of normalized frequencies in Weights and Biases custom charts -- which are implemented in Vega-Lite. I would love some community help to resolve this.
I modify the default Vega-Lite code from W&B custom chart histograms to produce this plot:
plot of unnormalized histogram frequencies in weights and biases
I want to normalize the histograms per-group such that the bin heights add up to one. (Note that because the bin-width is set to one, this is both a valid PDF and PMF.)
I am surprised that this is so difficult to do -- normalized histograms are so common! -- and would immensely appreciate any help to get this to work.
Current Approach: I am following this example from the Vega-Lite documentation that creates a normalized frequency histogram. In the transform block, they aggregate by count, use joinaggregate to sum the entire count, and then calculate the datum.Count / datum.TotalCount to get the normalized frequencies.
When I try adding this functionality to my Vega-Lite code, no plot appears in the editor, indicating some sort of error .
Code I Used: More specifically, I got an error when adding the following Vega-Lite code to the bottom of my transform block:
{
"joinaggregate": [
{"op": "sum", "field": "Count", "as": "TotalCount"}
],
"groupby": ["newGroupKeys", "color", "grouped"]
},
{
"calculate": "datum.Count / datum.TotalCount",
"as": "RelativeFrequency"
}
Here is my working Vega-Lite code used to produce the plot above. When adding changes to normalize by frequency, this code no longer works.
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"description": "A simple histogram",
"data": {
"name": "wandb"
},
"transform": [
{
"calculate": "if('${field:groupKeys}' === '' || datum['${field:groupKeys}'] === '', false, true)",
"as": "grouped"
},
{
"calculate": "if('${field:groupKeys}' === '' || datum['${field:groupKeys}'] === '', datum.name, datum['${field:groupKeys}'])",
"as": "newGroupKeys"
},
{
"calculate": "if('${field:groupKeys}' === '' || datum['${field:groupKeys}'] === '', datum.color, datum['${field:groupKeys}'])",
"as": "color"
},
{
"aggregate": [
{
"op" : "average",
"field": "${field:value}",
"as": "${field:value}"
}
],
"groupby": ["newGroupKeys", "color", "grouped", "${field:value}"]
}
],
"selection": {
"grid": {
"type": "interval", "bind": "scales"
}
},
"title": "${string:title}",
"layer": [
{
"transform": [
{"filter": "datum.grouped == false"}
],
"mark": {"type": "bar", "tooltip": {"content": "data"}},
"encoding": {
"x": {
"bin": {"binned" : false, "step" : 1},
"type": "quantitative",
"field": "${field:value}"
},
"y": {
"aggregate": "count",
"stack": null
},
"opacity": {"value": 0.6},
"detail": [{"field": "newGroupKeys"}, {"field": "color"}],
"color": {
"type": "nominal",
"field": "newGroupKeys",
"scale": {"range": {"field": "color"}},
"legend": {"title": null}
}
}
},
{
"transform": [
{"filter": "datum.grouped == true"}
],
"mark": {"type": "bar", "binSpacing": 0, "tooltip": {"content": "data"}, "clip": true},
"encoding": {
"x": {
"bin" : {"binned" : false, "step" : 1},
"type": "quantitative",
"scale": {"domain": [0, 30]},
"field": "${field:value}"
},
"y": {
"aggregate": "count",
"stack": null
},
"opacity": {"value": 0.6},
"detail": [{"field": "newGroupKeys"}, {"field": "color"}],
"color": {
"field": "newGroupKeys",
"type": "nominal",
"scale": {"range": "category"},
"legend": {"title": null}
}
}
}
],
"resolve": {"scale": {"color": "independent"}}
}
My wandb data looks something like:
{"data": { "values": [
{"uturns/uturns": 0.3, "groupKeys": "group1", "grouped": true},
{"uturns/uturns": 2.8, "groupKeys": "group1", "grouped": true},
{"uturns/uturns": 1.7, "groupKeys": "group2", "grouped": true},
{"uturns/uturns": 0.8, "groupKeys": "group2", "grouped": true},
]}}
Any help to diagnose this issue would be immensely appreciated. Thanks.