Condor is a batch job queueing system, which runs across multiple machines. It takes jobs from users, queues them up, decides where and when to run them, and then returns the results to the user once done. It enables you to turn any group of machines into a cluster-like system -- setting up a distributed-processing network with whatever resources you have available. You can use it on pretty much any setup, including dedicated clusters, but arguably its best use is as software enabling you to treat your desktops as a part-time cluster. You can set rules so that jobs are only run on idle desktops -- making the most of unused CPU cycles and power resources, especially if your site has an always-on policy.
The basic workflow is that the user submits a job (a resource request) from a Condor client. The job can specify its resource requirements and preferences, as well as what should be run and where the output should be sent. The central Condor server then examines its database to find a client that matches the job requests. When an appropriate client comes up, the job is sent out, run, and the output sent back to the user. It has a checkpointing system which can handle pausing or cancelling jobs on-the-fly -- e.g. if a destkop comes back into use halfway through a job -- and resuming them if possible later.
The first part of this series deals with installing the Condor server and client; the second part will show how to go about submitting jobs and specifying resources.
Condor can be downloaded from the Condor website. Before installation, you need to consider where the files will live.
You need to create a condor user on all machines running condor: this user will own the files created by the Condor daemons (although the daemons themselves run as root). Ideally, the home directory for this user would be centralised, to simplify admin -- for ease of explanation, I'll assume this setup here. (You can check the documentation for how to handle it if you want to have separate home directories on each client.) Don't edit the config file until you've unpacked and installed the software (see below) -- just decide what you intend to do.
Each machine that condor runs on (either server or client) needs to have its own spool, log, and execute directories. If you're using a centralised home directory, you can set the home directory and local directories up in the configuration file (condor_config) as
It's also a good idea to have that condor_config global configuration file on a shared directory -- that way you only ever have to edit it once! You also have a local configuration file for each machine, which can override global config options -- you can keep this in the LOCAL_DIR directory. You set this in the global configuration file:
LOCAL_CONFIG_FILE = $(LOCAL_DIR)/condor_config.local