Access to ICARUS data products in Python via gallery
Note and disclaimer: this is the type of information that falls outdated fast. Update it freely or contact the author if it stopped working. It was tested with
icarusalg
v09_91_02_01
under the SL7 contained provided by Vito Di Benedetto at that time.
It is possible to have almost full access to data products stored in art/ROOT files with Python.
gallery is a light-weight library provided by the authors of art, which understands the art data format.
It is written in C++, but the amazing cppyy
library can make it available in Python.
Setup
At the time of writing, LArSoft is only compatible with Scientific Linux 7. If you are using a different operating system, like Alma Linux 9 on the ICARUS GPVM, you’ll need to use a SL7 container.
The required setup is:
gallery
, for accessing the art/ROOT filesicarusalg
, for accessing the ICARUS data products- if any data product class is still defined only in
icaruscode
, it should be migrated toicarusalg
; but in the meanwhile, you’ll need to set upicaruscode
too (it automatically pullsicarusalg
in, so onlyicaruscode
setup would be needed); - if all needed data products are in SBN (as opposed to ICARUS) code, setting up
sbnobj
will suffice; - likewise, if all needed data products are in LArSoft, setting up
larsoftobj
will suffice;
- if any data product class is still defined only in
Currently, icarusalg
pulls in larsoftobj
and larsoftobj
pulls in gallery
, so no explicit setup of gallery is needed unless going the sbnobj
way.
In summary:
source /cvmfs/icarus.opensciencegrid.org/products/icarus/setup_icarus.sh
setup icarusalg v09_91_02_01 -q e26:prof
is all it’s needed to access gallery.
Python environment setup
It is recommended that for Python work a Python virtual environment be created, where the needed version of Python libraries can be pulled in.
Python area creation
The simplest way to craete a Python working area is via venv
and pip
Python modules.
For example, this is a Bash script creating a new Python environment
and pulling in enough packages to work with Jupyter notebooks and with Pandas library:
#! /usr/bin/env bash
#
# Usage: createPandasEnv.sh [WorkingAreaPath]
#
# If not specified, the area is created in the current directory.
#
# will install via pip the following "additional" Python modules:
declare -a PythonModules=(
'urllib3<2' # urllib3 v2 requires a OpenSSL newer than the one installed with SLF 7
'numpy' 'matplotlib' 'pandas'
'tables' # PyTables: required by Pandas to support HDF5 I/O
'jupyter-server' 'notebook' 'nbconvert'
)
declare WorkDir="${1:-.}"
echo "Creating a new Python environment in '${WorkDir}'"
python -m venv --upgrade-deps "$WorkDir" || exit $?
# let's enter the environment immediately, before installing the modules
source "${WorkDir}/bin/activate" || exit $?
pip --require-virtualenv install "${PythonModules[@]}"
This example script is provided here as createPandasEnv.sh.
For example, bash createPandasEnv.sh pythonAnalysis
will create a ./pythonAnalysis
working area.t this time, it download and installs 350 MB worth of software.
Regular Python area setup
Once the environment is created (in $WorkDir
),
source "${WorkDir}/bin/activate"
cd "$WorkDir"
will activate the area and enter its directory.
Example
Assuming that the current area is already set up as described above,
access to art/ROOT files can be achieved via galleryUtils
module (provided in icarusalg
).
In an interactive python
session (or equivalent),
import galleryUtils
help(galleryUtils)
will print an example of how to access an event.
This example is a refurbished version of the one provided in galleryUtils
:
import galleryUtils
import ROOT
sampleEvents = galleryUtils.makeEvent("sample.root")
LArG4tag = ROOT.art.InputTag("largeant")
for event in galleryUtils.forEach(sampleEvents):
particles = event.getValidHandle[ROOT.std.vector[ROOT.simb.MCParticle]](LArG4tag).product()
nMuons = sum(abs(part.PdgCode()) == 13 for part in particles)
print(f"{event.eventAuxiliary().id()}: {nMuons} muons")
# for all events
This can also access official production files — it’s as simple as using XRootD URLs in the makeEvent()
argument,
and having all the access permissions correctly configured.
Some notes:
- The whole known C++ namespace is exposed in the
ROOT
module namespace; for example,geo::CryostatID
becomesROOT.geo.CryostatID
. - the syntax of
cppyy
for template arguments is to enclose them in square brackets, and expressing them either as classes (ROOT.std.vector[ROOT.geo.CryostatID]
) or as strings ("std::vector<geo::CryostatID>"
). [author preference: that the former allows more control to the user.] event.eventAuxiliary().id()
returns aart::EventID
object; ingalleryUtils
its class is given a method to convert it into a string (R:1 S:0 E:1
), which is why we can seamlessly useprint()
on it. Similar tricks are performed byROOTutils.py
onTVector
-like objects, and byLArSoftUtils.py
on several geometry objects (includinggeo::CryostatID
,geo::Point_t
, etc.) when it loadsgeo::Geometry
.
Known limitations
gallery suffers some limitations compared to art:
- it is not able to access
art::Run
andart::SubRun
data products, but onlyart::Event
ones. - its interface is behind compared to
art::Event
; for example, it does not yet supportart::Event::getProduct()
.
cppyy
also suffers severe limitations.
- overload resolution is tricky; apparently
cppyy
attempts to call all the possible functions/methods with the same name in the attempt to figure out which one is the correct one, and captures the exceptions from errors.- The error message when failing to find the appropriate function is usually of not much use.
- If the call itself throws an exception, it may be impossible to access it in Python.
Other resources
A guide SBN DocDB 4339, by now ancient,
describes how to use gallery in C++.
Compared to using Python, C++ requires a careful and sometimes painstaking compilation of building instructions (e.g. Cmake);
a middle ground is the use of ROOT interpreter (Cling), which does for C++ a good deal of the magic that cppyy
does for Python.
For questions or feedback, contact Gianluca Petrillo.