SBN analysis code conventions — summary


Quick links to the introductory sections:

The guidelines with explanations are on their own page.


[N] Naming conventions

Rationale: naming should:

  • be predictable to facilitate code discovery and automatic maintenance
  • facilitate the understanding of the code
  • not replace the documentation
  • bear consistent style

[NL] Libraries and packages

Naming of libraries can be almost completely automated by cet_build_tools. A “package” is a branch of the source tree in a repository.

[NL.01] [+++] The implicit rules of cet_build_tools must be followed for all code that is built under it (MRB).

[NL.02] [++] The same implicit rules are encouraged for all code in SBN repositories.

[NL.03] [++] Only one library should generated per source code directory (i.e. per package); if multiple libraries are desired, they should be placed in their own subdirectories (sub-packages).

[NL.04] [+++] Package and library names must use CamelCase and single string (no - nor _ ).

A summary of cet_build_tools naming convention

  • Plugin library name (module, service, tool) matches its main file name and the plugin class: PCAngleKinkFinder_module.cc contains a class (art module plugin) called PCAngleKinkFinder (its namespace is not constrained).
  • Library name matches the directory where the source code is stored, and it encodes the path of the source within the repository; for example, code in sbncode/TPCReco/KinkExp is all wrapped up in a single library sbncode_TPCReco_KinkExp (libsbncode_TPCReco_KinkExp.so).

[NS] Source files

explanations

[NS.01] [++] Use .cc suffix for C++ source files containing the definition of art plugin classes (modules, services, tools).

[NS.02] [+++] For the other files, stick to the existing convention in the source directory or its parent, if any is present.

[NS.03] [++] Match names of source files hosting a main algorithm or class with the name of that algorithm or class (e.g. sbn::Track source file should be called Track.h/Track.cxx).

[NS.04] [+] File suffixes:

type of source name pattern example
art plugin source file *_module.cc, *_service.cc, … PCAngleKinkFinder_module.cc
C++ header *.h Track.h
C++ source *.cxx Track.cxx
C++ template implementation *.txx Track.txx

[NC] Capitalization

explanations

[NC.01] [++] “CamelCase” is encouraged for composite names (e.g. PhotonLibrary).

[NC.01] [+++] art plugin name is required to match the file it is defined in.

[NV] Variables and other identifiers

explanations

[NV.01] [+++] Descriptive control variable name in any loop longer than five lines.

[NV.02] [−−] Avoid identifiers starting with an underscore (e.g. _i).

[NV.03] [−−−] No declaration of identifiers starting with two or more underscores (e.g. __i).

[NV.04] [−−−] No identifiers with different capitalization in the same scope except ([−−]) if the capitalization follows a physics formula.

[NV.05] [++] Private data member names start with f and use CamelCase. Public data members and local variables follow simple camelCase (as above; e.g. trackLength).

Example

Summary of identifier name recommendations

[NV.t1]

context name style example
local variable camelCase nearestTrackIndex
function camelCase findNearestTrack
class/struct name CamelCase IsolatedTrack
public member variable camelCase nTracks
private member variable f + CamelCase fNTracks
global variable CamelCase GlobalVariablesAreEvil
namespace short, lower case track
library name CamelCase TrackFinder

[NA] CAF-Specific Naming Conventions

explanations

[NA.01] [++] When adding branches or data products to the CAF files, follow the standards for nomenclature and numbering already existing in the file (e.g.: initialize empty variables to -5).

[NA.02] [+++] Use k and CamelCase for names for Cuts and Vars in CAFAna macros is required.

[NA.03] [++] Use namespaces to tag the names/versions of the cuts. For example:

  namespace SBNworkshop2020 {
    
    Cut kEnergy;
    
  } // namespace SBNworkshop2020

instead of Cut kEnergy_SBNworkshop2020.

[NA.04] [+++] End the name of a cut with a corresponding _tag unless defined in a namespace (e.g. kEnergyCut_2020PAC).

[NA.05] [+++] When editing cuts and vars from frozen analyses, create a new copy of the cut or var and leave the old one untouched.

[NA.06] [++] Store cuts and vars in sensibly corresponding scripts (e.g. keep numu analysis cuts in Cuts/NumuCuts.cxx).

[NA.07] [++] Remove unused CAF branches.

[C] Coding

[CO] Organization, layout and style

explanations

Rationale: protect the modularity of the code and control the dependency tree.

[CO.01] [++] One header and source file per class.

[CO.02] [+++] Indent via spaces (2 per level suggested).

[CO.03] [−−] Avoid editor-specific directives to describe the indentation settings.

[CO.04] [++] Use K&R style of brackets.

Specific for CAF libraries and tools

explanations

[CO.21] [+++] CAFMaker_module.fcl is intended to access art data products, create StandardRecord objects, and call filling functions only.

[CO.22] [+++] All computations for filling CAF branches and calculations live in the corresponding Fill<specifier>Vars.cxx script.

[CO.23] [−−−] No dependence to LArSoft packages in StandardRecord.

[CS] Source file metadata

explanations

Rationale: we want every piece of code associated to one or more authors, both to facilitate its use and maintenance by allowing to ask to its authors, and as a recognition and acknowledgement.

[CS.01] [+++] Report author(s) and possibly a contact mean on every source file.

[CS.02] [++] Use Doxygen format for file metadata.

  /**
   * @file   TrackBloating/TrackBloatAlg.h
   * @brief  Algorithm to double the memory required by a track.
   * @author Mark Johnson (mjohnson@fnal.gov), John Markson (jmarkson@fnal.gov)
   */

[CS.03] [++] Authors of major changes add their name (and contact).

[CE] Error handling and message logging

explanations

Rationale: users should be immediately reported errors stemming from faulty configuration or input. Code should prioritize reporting dangerous or dubious conditions over automatic mitigation.

[CE.01] [++] Use assert() to document assumptions (e.g. assert(!tracks.empty());).

[CE.02] [++] Use C++ exceptions to reporting errors (cet::exception where available)

The guideline above is the original thought by G. Petrillo. C. Backhouse proposes the exact opposite. Which practice is best for us should be considered under discussion.

[CE.03] [−−] Avoid “catch-all” constructs (catch (...)). Really.

[CE.04] [++] Route reports of unusual conditions into specific message streams (use mf::LogError/mf::LogProblem where available, or std::cerr).

[CE.05] [++] Use message facility library for message logging in art code (e.g. LOG_MF_INFO(), mf::LogVerbatim(), …).

[CE.06] [−−] Avoid C/C++ output streams to console (std::cout, std::cerr) when logging libraries are available.

[CE.07] [−−−] Do not include <iostream> in header files.

[CE.08] [++] With message facility (or Python logging module):

  • DEBUG level: messages that may help tracking bugs
  • INFO level: one-time (terse) configuration messages, summary messages at the end of the job, and possibly per-event or per-run summary (e.g. 67 tracks produced from 1267 hits in C:0 T:1.).
  • WARNING level: exceptional conditions that might compromise the result.
  • ERROR level: exceptional conditions which compromise the result. Most often, the program should be interrupted.
  • FATAL level: avoid; the program should be interrupted.

[CE.09] [+++] A failure to retrieve configured input data must trigger a fatal error. This behaviour may be bypassed with a separate configuration parameter, provided that additional diagnostic information is then reported.

[CC] Configuration and experiment-specific features

explanations

Rationale: code must work for all SBN experiments with the least possible changes, ideally limited to configuration files. The explicit lack of support for one of the experiments is still preferable to code that gives for that experiment wrong results.

[CC.01] [−−−] No constants describing the specific detectors in SBN-shared code, and avoid those in experiment-specific coed too.

[CC.02] [+++] Attempt to use a feature of shared code with a detector/experiment that does not support it must trigger a fatal error.

[CC.03] [++] Use LArSoft services to gain information and functionality specific to the detector if possible.

[CL] LArSoft and art

explanations

[CL.01] [++] In general, try to adopt the practices recommended by art if not conflicting with these guidelines.

[CL.02] [++] Depending on the context, read a art data product using:

  • if no art associations are needed:
    auto const& hits = event.getByLabel<std::vector<recob::Hit>>(fHitTag);
    
  • if associations are needed, and therefore art::Ptr or art::Handle:
    auto const hitHandle = event.getValidHandle<std::vector<recob::Hit>>(fHitTag);
    std::vector<recob::Hit> const& hits = *hitHandle;
    

[CL.03] [++] Use art::InputTag data type to identify data products.

[CL.04] [−] Consider alternatives to art::FindManyP if available.

[CL.05] [+] Use of art::ProductToken for simple data product reading.

[CL.06] [++] Declare input data products to art by calling consumes() in module constructors.

[CL.07] [++] Use configuration validation in place of plain fhicl::ParameterSet access.

[CL.08] [++] Use the containers provided by LArSoft Geometry service to store data by geometry plane, TPC etc.

[CL.09] [++] Use Geometry service methods to iterate through detector TPC, planes etc.

[CF] Language features

explanations

[CF.001] [++] Motivated adoption of well-supported new features is encouraged; still, if the feature is considered too obscure a clarification comment is suggested.

auto keyword

explanations

The auto keyword has the magic ability of decreasing with its sole appearance the readability of the code by a few marks. It should be used judiciously (and sparsely).

[CF.005] [−−] In general, avoid using auto.

[CF.006] [+] As an exception, use auto keyword when the underlying type is obvious from the code in the same line or the previous one. In doubt, spell the type out instead.

examples

Namespaces

explanations

[CF.011] [−−] Avoid importing a namespace (e.g. using namespace std;), especially in global scope.

[CF.012] [−−−] Never ever use using namespace directives in global scope of header files.

[CF.013] [++] Consider using a namespace to enclose a large self-contained group of utilities and to enclose implementation details.

Variables and parameters

explanations

[CF.021] [+++] Define variables in the lowest scope they are needed in, and as close as where they are needed as possible

[CF.022] [+] Prefer a single collection of structured elements over multiple collections of simple elements when no other performance issue is involved.

C++ best practices

explanations

[CF.051] [−−−] Pointer variables never own their memory. Do not use new operator.

[CF.052] [++] Manage a resource by using a wrapping object that follows RAII pattern (examples).

[CF.053] [++] Profusely use decorations explaining your intentions to the compiler (e.g. const qualifier, [[fallback]] annotation on switch, override methods).

[CF.054] [+++] Use the most appropriate and descriptive type for each variable and quantity (e.g. use an unsigned type for a counter — many examples here —, use const for all variables that are not supposed to be modified, etc.).

[CF.055] [++] Qualify as const all “variables” that are not expected to changed value.

[CF.056] [++] Use temporary variables to avoid code repetitions.

[CF.057] [−−−] Do not use const_cast nor reinterpret_cast.

[CF.058] [−−] Avoid dynamic_cast.

[CF.059] [−−−] Avoid goto statements.

[CF.060] [+] Consider using braces as initialization syntax (e.g. DataEntry data { 5.0, 7.0 };).

[CF.061] [+++] Do not use word-spelling of logical operators (use &&, ||, ! rather than and, or, not).

examples

References vs. pointers

explanations

Rationale: references and pointers share the same performance, but have different features that make them appropriate for different usage patterns.

[CF.101] [++] Choose a pointer type if a variable is allowed to point to “no object” (nullptr), or if reassignments are required.

[CF.102] [+] In class definitions, consider mixed use to gain the safety of a reference and the flexibility of a pointer (e.g. presenting a reference in class interface but internally storing a pointer).

Memory usage and avoiding unnecessary copy of data

explanations

Rationale: copying large chunks of data has deleterious effects, and has been a common cause of excessive use of memory (for example, a temporary copy of a 1-GB data structure may be almost negligible in the overall memory balance, but it may make a remote job cross a 2 GB memory hard limit and have the job killed). There are well known patterns that prevent unnecessary copies.

[CF.111] [++] Prevent unnecessary copy of values returned by functions (e.g. auto const& digits = event.getByLabel<std::vector<raw::RawDigits>>(WaveformTag);).

[CF.112] [+++] Declare function arguments of large non-trivial types as constant references. Conversely, arguments of fundamental data types and small trivial data types should always be passed by value.

[CF.113] [+] Consider, as pattern for initialization of large data members from function arguments, to pass the argument by value and then std::move() that value.

[CF.114] [++] When the final size of a collection is known in advance, allocate its memory at once.

Checked vs. unchecked element access (i.e. at() vs. [])

explanations

Rationale: most often, the unchecked access should be preferred because faster, but it is important to avoid indices out of range. Conversely, the choice of one method over the other reflects the underlining consideration.

[CF.121] [++] Use unchecked access when the element is already known to be in range.

[CF.122] [++] Use unchecked access when the element is expected by protocol to be present.

User class design

explanations

[CF.151] [−−] You should not need destructors in your classes.

[CF.152] [+] For objects that will exist in large collections, and especially for data structures, consider ordering data members from the largest to the smallest to avoid waste of memory from data alignment.

[CF.153] [++] Initialize data members holding configuration information (e.g. a threshold for an algorithm) in the constructor initializer list (and mark them const if feasible).

[CF.154] [+++] Mark all member functions that do not modify the object as const, and consider factoring out the parts of non-const methods which do not modify the object as new const methods.

[CF.155] [++] Initialize all non-const data members in their declaration in the class.

[CF.156] [++] All data members of a class should be assigned a known value on construction.

[CF.157] [−−−] Never write empty constructors.

[CF.158] [++] Consider using delegated constructors to improve maintainability.

[CF.159] [+] Consider declaring a type a class if there is an “invariant” (i.e. constraints on the data members) to be preserved, struct otherwise.

[CT] Libraries and tools

Rationale: libraries evolve fast and what was best practice may become obsolete or detrimental. But it often also becomes an acquired pattern that is hard to break.

[CT.001] [++] Use geo::Point_t and geo::Vector_t in place of ROOT’s TVector3 for locations and displacements.

[CT.002] [–] Avoid using std::pow() with integral exponents; use std::hypot() or cetlib cet::pow() or similar as appropriate. Also avoid taking the square root for distance when not essential.

[CQ] Quantity types and their units

Rationale: clarity and predictability are essential when interpreting data values, and relying heavily on conventions facilitate it.

Units

[CQ.01] [+++] Mandatory units for data quantities are described in StandardRecord documentation in sbnanaobj. Existing exceptions should be treated as a bug rather than a precedent.

Data types

[CQ.11] [++] The following C++ data types are encouraged for storage of some quantities:

Unit type data type
momentum, energy, energy density float
charge, charge density float
space coordinate
(added precision reduces rounding errors of geometry calculations)
double
time, relative to a reference within the event double
absolute time
(achieves (barely) nanosecond precision on UTC times)
long double

[D] Documentation

[DF] Functionality documentation

explanations

Rationale: plain-English documentation should allow the use of an algorithm without falling back to interpret the code, and should especially include the assumptions and considerations that are relevant to the users and are not expressed by the code.

[DF.01] [++] Algorithm classes and function should include also

  1. description of their function
  2. explanation of the input format
  3. explanation of the requirements and assumptions on the input
  4. an explanation of the features of the output
  5. plus a reference to external documentation (e.g. SBN DocDB document)

[DF.02] [+++] Add documentation in the source code, in Doxygen format, attached to the object being described.

[DC] Tracking of changes

explanations

Rationale: discovery of major changes to the functionality of the code or “relevant” changes of working parameters should be achievable without a systematic comparison the different versions of the code.

[DC.01] [+++] Each repository contains in its main directory a changes.md file; the start of the file includes information about the format, like the pattern used for each entry and the tag for breaking changes.

[DC.02] [+++] Authors are required to update the changes.md log when…

  • … a new algorithm is added: mentioning the purpose of the new feature.
  • … a change in parameters that may affect future results.
  • … a change makes existing samples unreadable or misinterpreted.

[DC.03] [+++] In the last case, the change is defined as breaking and the entry is required to explicitly state that with a standard tag (**BREAKING CHANGE** is the recommendation).

[DC.04] [+] Consider keeping the format of the change log simple.

[DC.05] [++] The information about the release version of a change, should be added by the release manager.

[L] Linking and building

[LB] Build diagnostics (“warnings”)

explanations

Rationale: compilation warnings have proven to be a powerful tool in early detection of program mistakes. Experience shows that a single ignored warning both creates a habit, and makes it harder to spot additional ones.

[LB.01] [+++] Building warnings must be addressed. All of them must be treated as errors, except for the deprecation one.

[LB.02] [++] If the compiler reports an intended behaviour as problematic, the code should be reviewed with other people to identify whether the report is correct. If the review concludes the diagnostic message is spurious, the diagnostic message can be “acknowledges”.

[LB.03] [+++] Acknowledgement of a diagnostic message, when decided, must target only the line of code triggering it, must disable only the diagnostic message that needs to be acknowledged, and must be exhaustively documented in a comment.

[LB.04] [+++] Deprecation warnings on new code must be resolved.