I am working in jupyter notebook and have a pandas dataframe from which I would like to fill a ROOT TH3F histogram and save it to a ROOT file using uproot. I haven't been able to find much in the way of examples that would illustrate how to do this, but here is what I assume is the procedure:
- Declare a ROOT TH3F and iterate over the dataframe to fill the histogram.
- Open ("recreate") a new ROOT file with uproot and write this histogram to it.
Below is some example code that shows how I tried to go about it (incorrectly, because it segfaults).
import ROOT as R
import uproot as ur
import numpy as np
import pandas as pd
# Example dataframe
data = {
'x': [9.5, 5.0, 2.2, 8.1, 5.5, 1.4, 2.5, 9.2, 3.0, 7.9],
'y': [2.0, 5.7, 1.3, 9.1, 6.0, 6.2, 5.8, 1.8, 5.8, 3.1],
'z': [7.5, 4.1, 3.1, 1.6, 2.4, 8.2, 1.3, 4.4, 2.3, 5.0]
}
df = pd.DataFrame(data)
# Fill TH3F
xyz_hist = R.TH3F('xyz', 'xyz', 100, 0, 10, 100, 0, 10, 100, 0, 10)
for index, row in df.iterrows():
xyz_hist.Fill(row['x'], row['y'], row['z'])
# Open file and write histogram
outfile = ur.recreate('outfile.root')
outfile['xyz'] = xyz_hist
Could someone please clarify what is the correct way to go about it? Or is this wrong because I am trying to use uproot for something that it wasn't intended/built for, and the solution is to just use ROOT for opening the file, storing the histogram, etc.?
I executed exactly your code and encountered no issues, regardless of whether I read the histogram back into ROOT:
or Uproot and hist:
so you might just have an old version of one of the packages and are seeing a bug that was fixed since then. Here are the versions that successfully tested the above:
More generally, I'd like to point out a few things.
df.iterrows(); doing so defeats the purpose of putting data into arrays that can be manipulated with precompiled routines. I'll show an example in a moment.Here's a way that it can be done entirely with Uproot and hist:
and here's how it can be done entirely with ROOT:
(In both cases, the Pandas DataFrame is also superfluous; both the
Hist.filland theROOT.RDF.FromNumpymethods actually want NumPy arrays. In the ROOT case, I have to explicitly pull NumPy arrays out of the DataFrame. However, I assume that you have a reason for wanting to use Pandas that goes beyond this example.)