How to launch grid jobs
SBND now has its VO which is a part of the Fermilab VOMS and allows us to run on the FermiGrid and Open Science Grid.
Launching jobs is done using the jobsub_client
package:
https://cdcvs.fnal.gov/redmine/projects/jobsub/wiki/Using_the_Client
but a quick way to launch your job is the following:
-
make sure you create your user directory on the dCache disk space mounted at:\
mkdir -p "/pnfs/sbnd/scratch/users/${USER}" chmod g+w -R "/pnfs/sbnd/scratch/users/${USER}"
This will be the directory with which your jobs will be able to communicate, as it is visible from the OSG. Hence, all your input files and executables (unless they are UPS products already available in CVMFS) should be there and copied over to the worker nodes. -
To run a job, then you need to set
jobsub_client
up (no version is required since there is a "current" one):\source /grid/fermiapp/products/common/etc/setup # already included in setup_sbnd.sh setup jobsub_client
-
then, you need a wrapper script. For now you can use the one attached here: copy it over to your working directory (e.g. the
/sbnd/app/users/${USER}
above) and edit the following lines (starting from l. 55):\#################################### ###### setup your needed products here, e.g. geant4 etc... #################################### # source /grid/fermiapp/products/larsoft/setup # setup geant4 v4_9_6_p03e -q debug:e6 # setup geant4 v4_9_6_p03e -q e6:prof #no debug information, faster. #################################### #### This is where you copy all of your executable/necessary files to the worker node #### ( If applicable ) #################################### ###### this is where you copy your executable - I have a simple hello.out code here. ifdh cp /pnfs/sbnd/scratch/users/andrzejs/hello.out . ####### ####### ifdh cp does not preserve permissions, so need to add executable. ######### ####### chmod u+x hello.out ####### ####### launch executable ####### ./hello.out ####### ####### Copy results back ####### ifdh mkdir ${SCRATCH_DIR}/${GRID_USER}/output_${CLUSTER}.${PROCESS} ifdh cp test_hello.txt ${SCRATCH_DIR}/${GRID_USER}/output_${CLUSTER}.${PROCESS}/
-
this is how you would launch the job from your working directory (e.g. BlueArc space like
/sbnd/app/users/
):\jobsub_submit -G sbnd --role=Analysis -N 3 -M --resource-provides="usage_model=DEDICATED,OPPORTUNISTIC" --OS="SL6" "file://$(pwd)/run_job.sh"
Note-N
specifying the number of jobs you want-M
, which will send you an email every time a subprocess finishesusage_model=OPPORTUNISTIC
tells the script to use any available slots, whileDEDICATED,OPPORTUNISTIC
sets the first priority to our dedicated slots--OS
specifies the Scientific Linux version, if you need to care about that
The results of your job will end up in\
/pnfs/sbnd/scratch/users/${USER}/output_${CLUSTER}.${PROCESS}
Some other useful commands:
For more details go to the jobsub (client) documentation.
See how your jobs are doing
jobsub_q -G sbnd --user="$USER"
Remove a job
jobsub_rm -G sbnd --jobid=<number of job, can get e.g. from previous question>
Fetch the log files
jobsub_fetchlog -G sbnd --jobid <job id specified at runtime, e.g. 77457.0@fifebatch2.fnal.gov>
Launching LArSoft jobs (also in large numbers)
To be written. Sorry.
Using project.py
to launch LArSoft jobs (especially in large numbers)
project.py
is a wrapper script that takes a lot of the tedious setup
of grid etc out of the hand of the user, instead using an XML
configuration file.
The general instructions for project.py
are in larbatch
's user
guide.
To use this package with SBND we need to set up sbndcode
and
sbndutil
UPS products. An example shell session could look like this:\
source /grid/fermiapp/products/sbnd/setup_sbnd.sh
setup sbndcode v05_12_01 -q e9:prof
setup sbndutil v05_12_01 -q e9:prof
now you should be able to run project.py
to actually run grid jobs you
need two things:
- an xml file that will configure your job, present locally, and
- a FHiCL configuration file that will be used by the LArSoft
instance, which needs to be present in your
FHICL_FILE_PATH
path.
xml file preparation
Example xml files can be found in the lar1ndutil repository, which you can download by doing:
mrb g lar1ndutil
in your srcs directory. the xml files, are e.g. in lar1ndutil/xml/test/
.
These are currently untested. The instructions on xml file format can be
found in:
https://cdcvs.fnal.gov/redmine/projects/larbatch/wiki/User_guide
but the relevant part for lar1ndcode is:\
<!ENTITY release "v00_05_00">
i.e. the release number should be the number of the lar1ndcode version
you are using. So it has to be present in
/grid/fermiapp/products/lar1nd or in your localProducts directory (in
that case you need to use the tarball option in the file. You can make
the tarball using the: /lar1ndutil/scripts/make_tar_lar1nd.sh ).
.fcl file preparation
The .fcl files need to be in your FHICL_FILE_PATH, so again, they need
to be either present in a tagged release or in your localProducts
directory.
This also means that after each modification of the .fcl file you need
to make install for project.py to pick it up.
An example .fcl file is in:
lar1ndcode/lar1ndcode/JobConfigurations/prod_eminus_0.1_0.9_lar1nd.fcl
Running
Once you have these elements you can run:\
project.py --xml <path to your xml file> --stage <your defined stage > --submit
and look for results.