Download the authoritative guide: Cloud Computing 2018: Using the Cloud to Transform Your Business
Condor is a batch job queueing system, which runs across multiple machines. It takes jobs from users, queues them up, decides where and when to run them, and then returns the results to the user once done. It enables you to turn any group of machines into a cluster-like system -- setting up a distributed-processing network with whatever resources you have available. You can use it on pretty much any setup, including dedicated clusters, but arguably its best use is as software enabling you to treat your desktops as a part-time cluster. You can set rules so that jobs are only run on idle desktops -- making the most of unused CPU cycles and power resources, especially if your site has an always-on policy.
The basic workflow is that the user submits a job (a resource request) from a Condor client. The job can specify its resource requirements and preferences, as well as what should be run and where the output should be sent. The central Condor server then examines its database to find a client that matches the job requests. When an appropriate client comes up, the job is sent out, run, and the output sent back to the user. It has a checkpointing system which can handle pausing or cancelling jobs on-the-fly -- e.g. if a destkop comes back into use halfway through a job -- and resuming them if possible later.
The first part of this series deals with installing the Condor server and client; the second part will show how to go about submitting jobs and specifying resources.
Thinking about your setup
You need to create a condor user on all machines running condor: this user will own the files created by the Condor daemons (although the daemons themselves run as root). Ideally, the home directory for this user would be centralised, to simplify admin -- for ease of explanation, I'll assume this setup here. (You can check the documentation for how to handle it if you want to have separate home directories on each client.) Don't edit the config file until you've unpacked and installed the software (see below) -- just decide what you intend to do.
Each machine that condor runs on (either server or client) needs to have its own spool, log, and execute directories. If you're using a centralised home directory, you can set the home directory and local directories up in the configuration file (condor_config) as
It's also a good idea to have that condor_config global configuration file on a shared directory -- that way you only ever have to edit it once! You also have a local configuration file for each machine, which can override global config options -- you can keep this in the LOCAL_DIR directory. You set this in the global configuration file:
LOCAL_CONFIG_FILE = $(LOCAL_DIR)/condor_config.local
Now you've thought about all that, untar the Condor download into an appropriate directory. Then run condor_install from this directory. This script needs a few options:
condor_install --prefix=/dir/condor --local-dir=/dir/condor/hosts/myserver --type=managerThis will do a "manager" type install (other types are submit and execute - a manager-only machine won't be able to submit jobs or have jobs run on it), install Condor to the /dir/condor directory (the home directory you decided on in the previous section), and set /dir/condor/hosts/myserver as the local directory for this machine.
The global condor_config file is divided into 4 parts. The first part is the settings that you must change. Some of these are the variables you decided on earlier (e.g. LOCAL_DIR). You also need to specify an admin email address, your local domain (e.g. example.com), and a name for your system. The config file is well-documented.
Part 2 of the file is usually safe to leave as-is, but you do need to set the HOSTALLOW_READ and HOSTALLOW_WRITE variables. You can just set then as * (i.e. any machine at all can read/write to your pool), but this is a bit of a risk. More likely you want to set these to *.example.com or whatever your domain is.
Once you're happy with the settings, make sure that you've set the environment variable CONDOR_CONFIG to the location of this file.
Next, execute /dir/condor/sbin/condor_master to start the Condor daemons - this is the daemon that starts and monitors the other daemons. It also checks for updated binaries and restarts the daemons if necessary.
If you've unpacked the Condor tarball somewhere central, you can log on to the client, cd to that directory, and run this command:
condor_install --type=execute,submit --local-dir=/dir/condor/hosts/myclient --central-manager=myserver.example.com --verboseNote that you need to specify the client name (for --local-dir) and the manager name. (If you didn't unpack the files centrally, you'll have to copy the release tarball somewhere appropriate and add a --install=/path/to/condorrelease.tar option.)
As with the server, set the CONDOR_CONFIG variable, and execute /dir/condor/sbin/condor_master to start the Condor daemon! Check it's running by grepping the process list for condor_* processes.
Once you've got everything up and running, you may want to set up a start script so that Condor starts automatically on boot in the future.
You probably also want to have a look at the rules that govern when a job can be run on a client. These are in Part 3 of the global config file. The file sets the "UWisc - CS Dept" rules (look for the definition of UWCS_WANT_SUSPEND in the config file). The rules as defined here are probably a good start -- you can adjust them later if you start having problems, or you can override them for a particular machine if need be.
OK, now you have your server and your first client set up. In the next part of this piece we'll look at how to submit a job.
This article was first published on LinuxPlanet.com.