- Log into CMS Connect (replace [username] with your CMS Connect username):
ssh [username]@login.uscms.org
- Make a working directory:
mkdir -p /local-scratch/`whoami`/work/v17; cd /local-scratch/`whoami`/work/v17
- Follow the TreeMaker installation instructions:
wget https://raw.githubusercontent.com/TreeMaker/TreeMaker/Run2_2017/setup.sh chmod +x setup.sh ./setup.sh cd CMSSW_10_2_11_patch1/src/ cmsenv cd TreeMaker/Production/test
- Set up for job submission:
./lnbatch.sh myProd cd myProd cp /home/pedrok/assignmentsV17/`whoami`/.prodconfig . ln -s $CMSSW_BASE/src/Condor/Production/python/manageJobs.py . python $CMSSW_BASE/src/Condor/Production/python/cacheAll.py
- Get a grid proxy (lasts for 7 days by default, then you must renew it), make a CMSSW tarball, and copy it to EOS:
./checkVomsTar.sh -i root://cmseos.fnal.gov//store/user/lpcsusyhad/SusyRA2Analysis2015/Run2ProductionV17/`whoami`
- Prepare and submit the assigned jobs:
Submitting jobs on CMS Connect may be slow. Do not worry if it takes longer than you expect.python submitJobs.py -p -o root://cmseos.fnal.gov//store/user/lpcsusyhad/SusyRA2Analysis2015/Run2ProductionV17 -k --cpus 4 -t root://cmseos.fnal.gov//store/user/lpcsusyhad/SusyRA2Analysis2015/Run2ProductionV17/`whoami`/ python submitJobs.py -s -o root://cmseos.fnal.gov//store/user/lpcsusyhad/SusyRA2Analysis2015/Run2ProductionV17 -k --cpus 4 -t root://cmseos.fnal.gov//store/user/lpcsusyhad/SusyRA2Analysis2015/Run2ProductionV17/`whoami`/
- As often as you prefer (at least several times per day), check on the status of the jobs:
condor_q `whoami` | tail -n 1
- If there are any held jobs, check why they are held:
It is important tocd /local-scratch/`whoami`/work/v17/CMSSW_10_2_11_patch1/src/TreeMaker/Production/test/myProd cmsenv python manageJobs.py -howm
cd
to yourmyProd
directory and runcmsenv
before usingmanageJobs.py
, in order to pick up the proper configuration from your.prodconfig
file. This specific command lists, for every held job (-h
): the output log name (-o
), why it is held (-w
), and the site and machine where it ran (-m
). If you see a large number of failed jobs at a specific site, it may be a "black hole". Please report this to the list so others can avoid it. Common error codes for xrootd failures are 84 and 85. Other exit codes most often indicate transient failures and can be ignored. - Release any held jobs to run again:
If you need to remove a black hole site, you can use an extra argument (filling inpython manageJobs.py -hs
[site1,site2,...]
with a comma-separated list of black hole sites):python manageJobs.py -hs --rm-sites [site1,site2,...]
- Every day, you should also check for stuck jobs, which are still running on a worker node, but no longer active.
To do this, replace
-h
with-t
in the commands from steps 8 and 9. - Reply to the list when most of your jobs are finished to receive further instructions.
Last active
March 29, 2019 22:48
-
-
Save kpedro88/44eb0182c1e016c72de0a89484f65d2d to your computer and use it in GitHub Desktop.
TreeMaker Ntuple Production V17 Instructions
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment