Submitting Jobs to HTCondor from a local Machine
The recommended way of using HTCondor is to submit jobs by logging to lxplus.cern.ch.
It is also possible to configure your own computer to manage HTCondor jobs, as described in this guide.
These notes refer to Ubuntu 16.04 LTS, 18.04 LTS and 20.04 LTS and includes possible caveats. If you have a different Linux distribution, steps might be the same, but syntax may change. Sudo rights are needed.
As pre-requisite, you will need to install a kerberos client on your desktop; afterwards, you can proceed with the installation of HTCondor
Install kerberos
install user and developer packages and add lxplus credential components (when asked, default realm is CERN.CH):
sudo apt install krb5-user libkrb5-dev libauthen-krb5-perl
scp $USERNAME@lxplus.cern.ch:/usr/bin/batch_krb5_credential .
chmod +x batch_krb5_credential
sudo mv batch_krb5_credential /usr/bin/
scp $USERNAME@lxplus.cern.ch:/etc/ngauth_batch_crypt_pub.pem .
sudo mv ngauth_batch_crypt_pub.pem /etc/
scp $USERNAME@lxplus.cern.ch:/etc/krb5.conf.no_rdns .
sudo mv krb5.conf.no_rdns /etc/krb5.conf.no_rdns
scp $USERNAME@lxplus.cern.ch:/etc/sysconfig/ngbauth-submit .
sudo mkdir /etc/sysconfig/
sudo mv ngbauth-submit /etc/sysconfig/
Confirm installation
Before this step make sure you have valid credentials already (i.e. run kinit).
Then check that the kerberos components are properly installed and set-up (the script will tell you the missing perl packages):
/usr/bin/batch_krb5_credential
There should be an output like:
-----BEGIN NGAUTH COMPOSITE-----
# LOTS OF LINES OF YOUR KEY
-----END NGAUTH COMPOSITE-----
and nothing else (i.e. no missing files or errors).
Debugging the kerberos installation
if the last step does not deliver the desired output, /usr/bin/batch_krb5_credential might have to be modified.
Some things can be tried:
-
change the line
my $principalName = "ngauth/SOMESERVER";into
my $principalName = "ngauth/ngauth.cern.ch"; -
install missing
perllibraries: If there is an error likeCan't locate Sys/Syslog.pm in @INCyou need to install the missingperllibraries:yum install perl-CPAN perl-Sys-Syslog -
install missing
perlcomponents:perl -MCPAN -e 'install Authen::Krb5' -
on Ubuntu 20.04 neither of these steps helped, as the
Authen:Krb5package was not available. Try getting a new version ofbatch_krb5_credentialdirectly fromlxplus8:scp $USERNAME@lxplus8.cern.ch:/usr/bin/batch_krb5_credential . chmod +x batch_krb5_credential sudo mv batch_krb5_credential /usr/bin/ -
or try manual fix of
Authen:Krb5issue by replacing the lines:my $newCreds = Authen::Krb5::cc_resolve("FILE:".$tgt_fn); $newCreds->initialize($credCache->get_principal()); Authen::Krb5::cc_copy_creds($credCache, $newCreds);with
copy($tgt, $tgt_fn); -
Fix cache not found: If the message is
ERROR: No AP_REQ message created by kerberos: No credentials cache foundsomehow theKRB5CCNAMEvariable is not set. Runklistto find your current cache (at the top of the output) and export the variable:export KRB5CCNAME=/tmp/krb5cc_####
Install HTCondor
On Ubuntu 18.04+ it is usually enough to install condor from the packaged version
sudo apt update
sudo apt install htcondor
sudo apt-get update
sudo apt-get install condor
If you need a more recent version, or in case of Ubuntu 16.04, use the resources on the web-page of HTCondor.
Debugging the HTCondor installation
It may happen that at sudo apt-get update, you get the error message:
N: Skipping acquire of configured file 'contrib/binary-i386/Packages' as repository 'http://research.cs.wisc.edu/htcondor/ubuntu/stable trusty InRelease' doesn't support architecture 'i386'
In case your system is actually 64bit, a common solution is to limit the research of the package distro to just 64 bit by introducing the [arch=amd64] in the list of sources (in /etc/apt/sources.list), e.g.
deb [arch=amd64] http://research.cs.wisc.edu/htcondor/ubuntu/stable/ trusty contrib
Configure HTCondor
-
create the config file
/etc/condor/config.d/10-local.config.Please set as scheduler (
SCHEDD_HOST) the default one you get onlxplus, e.g. in yourcondor_qoutput. You can also find it out by running (onlxplus):condor_config_val SCHEDD_HOSTAn example content is provided here:
CONDOR_HOST = tweetybird03.cern.ch, tweetybird04.cern.ch COLLECTOR_HOST = tweetybird03.cern.ch, tweetybird04.cern.ch SCHEDD_HOST = bigbirdXX.cern.ch SCHEDD_NAME = $(SCHEDD_HOST) SEC_CLIENT_AUTHENTICATION_METHODS = KERBEROS SEC_CREDENTIAL_PRODUCER = /usr/bin/batch_krb5_credential CREDD_HOST = $(SCHEDD_HOST) FILESYSTEM_DOMAIN = cern.ch UID_DOMAIN = cern.ch -
restart
HTCondor:/etc/init.d/condor restart
Debugging the HTCondor configuration
-
Useful: The full configuration can be checked by
condor_config_val -dump -
If you have connection problems when running
condor_qorcondor_status, you might want to check yourNETWORK_INTERFACE.condor_config_val NETWORK_INTERFACEIn some cases it might be set to
127.0.0.1or similar. Yet it should be set to*. If this is not the case, simply add the appropriate line at the end of your configuration file (from above):NETWORK_INTERFACE = *Don't forget to restart
HTCondor. -
Pay attention to the couple
COLLECTOR_HOSTandSCHEDD_HOST, as, depending on the collector, you may be able to reach only a sub-set of the scheduler. To get the whole lists, please login tolxplus.cern.chand type:condor_status -schedcondor_status -collector