Computing Resources
ICARUS has a number of computing resources available, many of which are supplied by Fermilab Computing Sector (SCD when we talk about the Scientific Computing Division).
Here we list some of them.
Fermilab Computing Division has produced a very short flyer explaining in a nutshell the different credentials used in Fermilab. It can be found as document 5892 in the SCD DocDB. It is a strongly recommended reading, to everybody!
SCD liaison
The Scientific Computing Division Liaison is a person from Fermilab SCD who acts as a bridge between SCD and an experiment. They is typically fairly informed of what is going on, computation-wise, in the experiment, and in the computing division [1]. They is also able to route any request to the proper people, but they is not necessarily required to do actual work on solving the merit of the issues. Also, they is not required to be a member of the experiment.
The current liaison for ICARUS can be found in ICARUS Roles and people wiki page.
[1] This is defined as knowing what goes on in the computing division more than most experiment members, and knowing what is going on in the experiment more than most of the computing division people.
Computing Access
To get a Fermilab computing account follow the instructions at this site: https://fermi.service-now.com/kb_view_customer.do?sysparm_article=KB0010797. Our experiment number is E-1052 (ICARUS).
Write down the initial Kerberos and Services Account passwords before submitting. They will be necessary once your accounts have been created.
You will receive an email once this is complete which gives you your Fermilab email account and username (you will need this for service now). Once your computer is SSH-ready and kerberized, you should be able to log on by typing:
kinit ${USER}@FNAL.GOV
ssh -Y ${USER}@icarusgpvm01.fnal.gov
(where ${USER}
should be replaced by your Fermilab user name).
If this does not work, please submit a service desk ticket to request a ICARUS project account (note that someone has to authorise this so it will probably fail):
- ServiceNow -> Request something;
search for something like:
Experiment Computing Account
and you’ll be offered a “Experiment/Project/Collaboration Computing Account” match, that you can pick; - Enter your name, then select
E-1052 (ICARUS)
- Verify that the correct Fermilab principal is displayed when you enter your name
- Enter your home institution, then click
.
You can reset your Fermilab passwords for service-now here: https://password-reset.fnal.gov/showLogin.cc If you can’t remember your password because you didn’t write it down then you will have contact the service desk (+1 (630) 840-2345).
SSH access to remote servers
Access to servers via terminal is obtained via secure shell (SSH).
For most cases you access to a remote server authenticating as yourself by providing a Kerberos 5 ticket: see above for how to get registered for this type of authentication. The ticket can be obtained on the local machine (e.g. your laptop) using a Kerberos 5 client (usually that is MIT’s):
kinit -F -l 26h -r 7d username@FNAL.GOV
asks the Kerberos realm FNAL.GOV
(which must be described somewhere in
your system, typically in /etc/krb5.conf
that you copied from
Fermilab) to authenticate you as username
, which must be the Fermilab
user name (it can be omitted if it is the same as the local user name on
the laptop). The other options:
-F
: the ticket is forwardable: you can send it with your request for connection; this is the first of the steps to avoid errors likeCould not chdir to home directory /nashome/u/username: Permission denied
;-l 26
: lifetime of the ticket is 26 hours: for 26 hours SSH can reuse this ticket, then you have to get a new one or renew this one;-r 7d
: expiration time is 7 days: after 7 days, this ticket can’t be renewed any more and you need to get a new one.
You can see the list of current Kerberos5 tickets with klist -f
(-f
shows the flags of the tickets, F
means it’s forwardable, f
means
it has been forwarded; R
means renewable, i
means invalid and the
others mean you need to read the manual, man klist
).
If you do not have a valid ticket, you can maybe renew one with:
kinit -R user
and if it works you save yourself typing a password. A practical
command: kinit -R username || kinit -F -l 26h -r 7d username
will
attempt renewing and get a new ticket on failure.
Next and last, SSH needs not only to use that ticket, but also to forward it to the remote server. This is called “delegating credentials” in SSH lingo. One time command:
ssh -K username@icarusgpvm01.fnal.gov
-K
tells SSH to forward the Kerberos credentials; this is the other step to avoid errors likeCould not chdir to home directory /nashome/u/username: Permission denied
;- other common options can be added (
-X
,-Y
, …).
For a permanent solution, you can change the SSH client configuration of
your user (OpenSSH stores it in ${HOME}/.ssh/config
by default) by
adding something like:
Host cdcvs.fnal.gov
ForwardX11 no # for Redmine GIT connections, don't try graphics
ForwardX11Trusted no
Host *.fnal.gov 131.225.*
User username # Fermilab user name (allows something like `ssh icarusgpvm01.fnal.gov`)
ForwardX11 yes # establish an X11 connection to get graphics on your laptop (via X servers like X11 or XQuartz)
ForwardX11Trusted yes
GSSAPIAuthentication yes # enable authentication via Kerberos 5
GSSAPIDelegateCredentials yes # forward Kerberos 5 credentials
ConnectTimeout 60 # bail out if no answer for one minute
Hint: you can edit Kerberos 5 configuration file so that FNAL.GOV
is
the default realm, and then you don’t need to type @FNAL.GOV
in
Kerberos 5 commands. That might not be welcome if your institution uses
Kerberos 5 too with a different realm, which opens a new realm of
complication.
Using a VNC connection to view windows
A good wiki page on how to setup for a VNC connection can be found at:
Note those instructions are valid if you are opening an ssh connection from an OSX laptop/computer. If you are connecting via a laptop/computer running a different operating system (e.g. linux) then you will need to use a remote desktop program supported by that operating system. For example, on ubuntu one can use Remmina.
Where to work: interactive nodes (“GPVM”)
We have some dedicated “nodes” where we can log in and work interactively.
These are General Purpose Virtual Machines (GPVM); they are shared by
all ICARUS collaborators, and they can not be the fastest way to develop
and run analyses.
But they see all the resources that we have available: you can read data
from ICARUS disks and tapes, and you can submit ICARUS jobs to the
grid.
The access is via SSH with FNAL.GOV [Kerberos credentials].
The nodes are called: icarusgpvm0X.fnal.gov
and icarusbuild0X.fnal.gov
(see ICARUS GPVM page for more details).
With SL7 reaching EOL on June 30, 2024 GPVMs are being migrated to EL9. In case users have the need to use SL7, there is the possibility to use SL7 dev container. More details are in the GPVM migration wiki page.
The GPVM interactive machines have no relevant local storage,
and areas in /icarus
(“BlueArc”) and /pnfs
(“dCache”) should be used
(the home directory works too, but it has small allowance).
The interactive “build nodes” icarusbuild01.fnal.gov
and
icarusbuild02.fnal.gov
have roughly the same environment as the other GPVM’s.
The policy is to use this machine only to build code and running tests;
this excludes job submission and local running of job campaigns
and of single jobs beyond the size of a test job (say, 100 events).
To maximise the benefit, build from the
local disk; you can make your own scratch area by
cd /scratch
./createMyScratchArea.sh
Information about all ICARUS GPVM’s is on a separate page.
Opening a ticket in Fermilab Service Desk
When something goes wrong, open a ticket! Its fun, its easy but most important it is how you engage experts to solve your problems (well, software problems).
You want to open a Service Desk ticket if your job submission that used
to work yesterday fails today, if the GPVM are very slow (if they are
just slow, it's normal), if you are denied access to some resource you
expect to deserve.
You don’t open a service desk ticket if your jobs crash (most of the
time, you just have to fix them), if there is a bug in ICARUS code
(e-mail to ICARUS mailing list,
write about it on SBN Slack channel (#icarus_general
)
and/or open a GitHub issue)
or if you need help with using LArSoft
(LArSoft wiki,
LArSoft mailing list,
and ICARUS mailing list).
You need to log in via Fermilab “Services” password (the same you use to access Fermilab e-mail and Redmine), and then you go at:
https://fermi.servicenowservices.com/wp
That will bring you to the “web portal” interface where you can generically “request something” and a human being will sort it out in due time. Another way is to access the older service catalog. Then, on the left, you’ll have a list of “Self service” items: selecting “Scientific Computing Services” or “Core Computing Services” you will be presented with a list of topics that should direct you to the right type of request. Most of the requests are under the “Scientific Computing Services”, with a relevant exception in the creation of computing accounts, which is in “Core Computing Services”.
This ticket is completely unrelated to issue tracking tickets, opened via Redmine, against LArSoft or
icaruscode
pertaining LArSoft or ICARUS software bug reports or feature requests.
Accessing resources (DocDB, VOMS, …) via certificates
Note: this section is about “personal” certificates, not the kind of certificate proxies you need to work on the grid.
A CILogon certificate gives access to a number of Fermilab resources. After you get a CILogon certificate, you will gain access to:
- SBN DocDB (see also some specific ICARUS instructions)
- Fermilab Virtual Organization (VOMS) server
- Jenkins build server
Accessing resources via Virtual Private Network
Some resources can be accessed only from within Fermilab network. For some, the access outside Fermilab network is still allowed but crippled.
To allow users to access these resources in full by connecting via a Virtual Private Network (VPN), which needs to be set up on the client node
Submitting jobs: Virtual Organisation
The name of our virtual organisation is icarus
.
Submitting jobs: grid resources
This is about where your job gets actually shipped for execution, and who runs it.
We can use different resources:
- FermiGrid is the grid dedicated to Fermilab users
- Open Science Grid is an international grid, with strong contributions from physics but not limited to it
- Wilson Cluster is a Fermilab cluster offering Graphics Processing Units (GPU)
To submit jobs, you’ll also need a certificate proxy.
Note: to ensure that your LArSoft jobs are executed in a complete environment,
it is strongly recommended that the jobs are executed in the proper Singularity container
(FIFE documentation here).
One such container featuring Scientific Linux Fermi 7 (SL7) is available in CVMFS,
and it can be enabled by adding to jobsub_submit
the options:
--append_condor_requirements='(TARGET.HAS_SINGULARITY=?=true)' -l '+SingularityImage="/cvmfs/singularity.opensciencegrid.org/fermilab/fnal-wn-sl7:latest"'
(when using LArBatch project.py
from larbatch
v01_55_01
or newer,
Singularity containers are already enabled in most configurations,
as documented by the author,
and often no action is needed, beside maybe adding a <os>
element for good measure).
Also note that FIFE recommends that Operating System selection be not enforced
via --os
option when using containers (after all, we are explicitly shipping the whole OS).
And, about FIFE recomendations, Project-py is the recommended job submission tool (SBN-specific wiki documentation pending).
Software development and distribution
ICARUS code and the “software stack” it relies on are distributed in binary form for a few supported platforms.
Here is where you can find those packages pre-installed: CVMFS and FermiApp. LArSoft/ICARUS code UPS products precompiled for selected platforms can also be downloaded from the SciSoft repository. The content of the areas is described in its own web page.
CVMFS
ICARUS has a CVMFS area with the icarus.opensciencegrid.org
.
CVMFS is a file system that mirrors locally a remote storage area (not dissimilarly from NFS), and it is available on the Open Science Grid nodes (and Fermigrid as well). So it is a safe bet.
You can also install it on your laptop; LArSoft links to instructions (currently from MicroBooNE). Good news is, it’s not hard.
The full path to access the area is:
/cvmfs/icarus.opensciencegrid.org/products/icarus
. The
products/icarus
directory contains our software distribution.
[Instructions on how to deploy software in there are elsewhere.]
Source code
ICARUS software codebase is stored as part of SBN, using GitHub organization SBNSoftware
(which should not come to a surprise since it’s where this wiki is also being hosted).
The development follows a pull request model. While pull requests can be made from personal forks, it is also possible to push pull request branches into SBNSoftware
itself. To join the organization, follow the instructions in SBN wiki and in particular the request form therein.
Storing data
We have two types of data storage available, that we’ll call “BlueArc” and “dCache”. FIFE explains their characteristics in detail. Production level data are stored to tape (“Enstore”) and tracked with a database (“SAM”).
Local storage: CEPH disks
CEPHFS is the new Network-Attached Storage area for Fermilab experiments. It replaced BlueArc disk servers between September 2023 and May 2024. The new one is a multi-terabyte disk-stored area split between experiments. ICARUS has its own slice of it, in two partitions:
/exp/icarus/data
where you can write a few large files (e.g. ROOT data); if your data is larger than a hundred GB, people start muttering, as the space is limited and shared among all of us. Time to go dCache then./exp/icarus/app
where you can keep a few software builds; this is also a shared disk, and you should keep no large data here. But you need to have the code here, as content in the/icarus/data
partition can’t be executed.
These data areas can be only directly accessed from the GPVM nodes. If your grid job needs data from them, you have to copy the information locally first.
You should create your own directory under /exp/icarus/data/users/${USER}
and /exp/icarus/app/users/${USER}
, and stick to them.
Space available to ICARUS collaboration (updated February 15, 2024):
path | space | user quota |
---|---|---|
/exp/icarus/app |
6 TiB | 100 GiB |
/exp/icarus/data |
25 TiB | 300 GiB |
To check these limits: from any accessing node (e.g. GPVM),
df -h /exp/icarus/{app,data}
returns the space allocated for the whole experiment and the global usage,
while the easy-to-remember:
getfattr -n ceph.quota.max_bytes -n ceph.dir.rbytes "/exp/icarus/app/users/${USER}"
getfattr -n ceph.quota.max_bytes -n ceph.quota.max_bytes "/exp/icarus/app/users/${USER}"
getfattr -n ceph.quota.max_bytes -n ceph.dir.rbytes "/exp/icarus/data/users/${USER}"
getfattr -n ceph.quota.max_bytes -n ceph.quota.max_bytes "/exp/icarus/data/users/${USER}"
show in the last line the usage (ceph.dir.rbytes
) and limit (ceph.quota.max_bytes
) on the specified directory, in bytes.
World-visible storage: dCache
dCache area consists of a huge data space, made of tapes and disks.
We have three types of dCache areas:
scratch
: files in this area can be deleted automatically; you can store a grid job result there waiting to transfer it into a safe place, but don't rely on leaving your important stuff there for longpersistent
: files in this area will not be deletedresilient
: area used for files with intense access (like a working area snapshot which is accessed by thousands of jobs)
We have a limited quota of space as well, so fill it responsibly.
We also have some StashCache storage.
Tape storage
Tape storage is available to ICARUS. Instructions to store the data are provided in the specific wiki page.
File database (SAM)
Fermilab provides a service, SAM, to track where files are currently stored. An introduction to the system can be found on User Guide for SAM, and users interact with it via sam-web-client. In short, SAM not only stores metadata of the files (e.g., that a file is simulation, and how many events are in it) and the current locations (that is, the “original” tape location, the copies cached in dCache, the copies spread across worker nodes…), but it can also queue the files for delivery.
ICARUS is assigned a “station” that SAM clients will communicate with to access the database.
Web server pages
ICARUS and SBN has a number of web pages. A few are listed here.
ICARUS at work web page
The page https://icarus-exp.fnal.gov, managed by Donatella Torretta,
contains information about ICARUS working groups, icaruscode
software documentation
and more.
Authentication is currently needed for access, in the form of username (icarus
)
and password (the same as for SBN DocDB).
SBN online web gate
The SBN online web gate is at https://sbn-online.fnal.gov. That is not
a public page.
Its content is also readable directly from ICARUS GPVM’s and FNALU as
/web/sites/s/sbn-online.fnal.gov
. Direct write access is restricted,
and can be requested asking William Badgett.
Currently the site has almost no content and we are looking for
volunteers to help populate it with useful detector status info.
ICARUS data web directory
The ICARUS data web directory is at https://icarus-data.fnal.gov.
The site is currently only accessible via a browser when connected to
the FNAL network (VPN included). We are in the process of changing
this.
Its content is also readable directly from ICARUS GPVM’s and FNALU as
/web/sites/s/icarus-data.fnal.gov
. Direct write access is restricted,
and can be requested asking the “owner” of the site. At time of
writing, the site owners are Wes Ketchum and
Gianluca Petrillo.
Communication tools
Slack channels
ICARUS shares the Slack workspace shortbaseline.slack.com
with the whole SBN.
A contact person for that workspace is Wesley Ketchum.
Mailing lists
ICARUS owns some mailing lists for specific purposes.
Some mailing lists owned by ICARUS and hosted at Fermilab ListServ:
Mailing list | owner | purpose |
---|---|---|
icarus_reconstruction@fnal.gov | Daniele Gibin | General software mailing list |
Other obscure services and resources
ICARUS uses more services, which are usually hidden to most users. Their documentation is a well-kept secret.