These are tales of An Admin getting APEL+HTCondor-CE to work at GridKa/FZK Tier 1.
As background information, see
You need a pristine HTCondor-CE. That part is easy – just follow the docs. (For real. Don't be clever, just do as they say.)
Install the CE APEL plugin, plus the APEL dependency dirq
:
$ yum install htcondor-ce-apel python-dirq
This provides scripts that extract job information from the Batch System for APEL.
It also provides a timer to run these automatically – enable condor-ce-apel.timer
via systemd.
These scripts have some knobs. Do not touch the knobs.
On each workernode, you may add a ScalingFactor.
Put this in a condor configuration in /etc/condor/config.d/
:
ApelScaling = 1.062 # to be adjusted per machine
STARTD_ATTRS = $(STARTD_ATTRS) ApelScaling
This tells APEL that the CPU on this workernode is 1.062 times as powerful as the average workernode in your cluster. There is little harm in scaling everything by 1.0 for a start.
APEL currently (1.8.1) cannot query perfomance of your cluster from BDII. You must explicitly tell APEL the Spec of your CE.
In the /etc/apel/client.cfg
add to the [spec_updater]
an entry such as:
manual_spec1 = <CE FQDN>:9619/<CE FQDN>-condor,<SPEC Type>,<SPEC Value>
Where <SPEC Type>
is one of HEPSPEC
or Si2k
.
For example, for a machine htcondor-ce-4-kit.gridka.de
it looks like this:
manual_spec1 = htcondor-ce-4-kit.gridka.de:9619/htcondor-ce-4-kit.gridka.de-condor,HEPSPEC,12.75
This tells APEL the average spec of the resources of this CE. The ApelScaling
is used to scale this per slot.
In the /etc/apel/parser.cfg
make sure you have the following:
[blah]
enabled = true
dir = /var/lib/condor-ce/apel/
filename_prefix = blah
subdirs = false
[batch]
enabled = true
reparse = false
type = HTCondor
parallel = false
dir = /var/lib/condor-ce/apel/
filename_prefix = batch
subdirs = false
This tells APEL where to find the extracted job information.
Your must announce your CE to APEL in order to authorise the local APEL client/sender. This requires an entry in GOCDB.
Field | Value | Example |
---|---|---|
Service Type | gLite-APEL |
gLite-APEL |
Host name | <CE FQDN> |
htcondor-ce-4-kit.gridka.de |
Contact E-Mail | ... | ... |
Host DN | <CE hostcert DN> |
/C=DE/O=GermanGrid/OU=KIT/CN=htcondor-ce-4-kit.gridka.de |
Description | ... | glite-APEL for FZK Tier1 cluster |
See the entry for htcondor-ce-4-kit.gridka.de as an example.
Here be dragons. j/k, will add info on the general APEL setup if desired.