HTCondor Annex

HPC Resources can temporarily be mad available to the workflows with the HTCondor Annex tool, by sending pilot jobs (also called glideins) to the ACCESS resource providers. These pilots have the following properties:

  • A pilot can run multiple user jobs - it stays active until no more user jobs are available or until end of life has been reached, whichever comes first.

  • A pilot is partitionable - job slots will dynamically be created based on the resource requirements in the user jobs. This means you can fit multiple user jobs on a compute node at the same time.

  • A pilot will only run jobs for the user who started it.

As part of setting up annex, you need to know the your local username on the various Resources. The easiest way for you to figure out that is by navigating to your ACCESS Profile on the Allocations Page . There at the bottom of the page, you will see a table titled “Resource Provider Site Usernames”.

You have to have an allocation at the resource provider you want to use. The resources we currently support are:

Resource

Nickname

Ondemand Instance

Project list command

Queues (tested)

Resource

Nickname

Ondemand Instance

Project list command

Queues (tested)

PSC Bridges2

bridges2

Log in (but passwords/keys have to be registered - see user guide)

projects

RM

Purdue Anvil

anvil

Log in

mybalance

standard

SDSC Expanse

expanse

Log in

module load sdsc; expanse-client user -p

compute

Setting Up SSH Keys and Config

ACCESS resource providers have slightly different policies for logging in to the resources. We recommend that you create a separate key for HTCondor Annex, and a set up a ~/.ssh/config file containing remote usernames and which ssh key to use. Log in to https://access.pegasus.isi.edu and start an interactive shell. Create a new ssh key:

$ ssh-keygen -f ~/.ssh/annex

The open an editor and create ~/.ssh/config. You will have to specify the username you have been assigned for each resource

Host anvil.rcac.purdue.edu *.anvil.rcac.purdue.edu    User MYUSERNAME    IdentityFile ~/.ssh/annex Host bridges2.psc.edu *.bridges2.psc.edu    User MYUSERNAME    IdentityFile ~/.ssh/annex Host expanse.sdsc.edu *.expanse.sdsc.edu    User MYUSERNAME    IdentityFile ~/.ssh/annex

Determining Project ID and Queue

To start an annex, you need to have the project identifier at the particular resource provider. Note that this might not be the same as your ACCESS allocation id. You have to log in to the resource provider, via the OpenOndemand instances in the table above, and run a resource provider specific command to determine the id. You can also use this login to authorize the ssh key from the previous step. For example, to get set up on Anvil, log in to https://ondemand.anvil.rcac.purdue.edu and start an interactive shell. In that shell, first run mybalance:

$ mybalance Allocation Type SU Limit SU Usage SU Usage SU Balance Account (account) (user) ============= ==== ========== ========== ========== ========== abc12345 CPU 100000.0 0.0 0.0 100000.0

Take note of the allocation account name, you will need it when starting the annex.

Installing the SSH key

Install the ~/.ssh/annex.pub key from access.pegasus.isi.edu in the resource ~/.ssh/authorized_keys:

Copy the contents from ~/.ssh/annex.pub (make sure it is the .pub one).

Provisioning Resources

You can create an annex with the annex create command . There is also a annex add  command once you have an annex running and want to add more resources. You have to specify your allocation and the last part of the command is the queue and resource. Note that $USER should be left alone in the command - the shell will substitute the correct value there.

For example, if you want to run on Anvil, using the standard queue and your project id is abc1234, the command would be:

The command will ask you to authenticate. For some resource providers, the ssh key will be enough. Some might require a two-factor login:

Monitoring

The status of your annex can be displayed with the annex status  command:

The command will provide an overview of resources, and how long they will be available:

Another tool to show your resources is condor_status. This will show the “slots” available, but note that these are partitionable, e.g. they can be dynamically created based on the size of your jobs. Example: