Submit to your campus cluster from OSG Connect

Overview

Suppose you have access to a cluster at your home institution. You normally login to that cluster and submit jobs to its local queue with your local identity.  You can connect this cluster and local identity to OSG Connect. Once you have done that you can submit and monitor jobs in the normal way, using condor_submit and condor_q from login.osgconnect.net.

To make this possible OSG Connect uses the OSG BOSCO technology. The remote clusters connected this way are called BOSCO resources and login.osgconnect.net is your BOSCO submit node. When you connect a BOSCO resource only you will be able to send jobs to that resource and the jobs will run there using your identity, exactly like when you ssh and submit the jobs there.

Requirements

The BOSCO resource you want to connect must satisfy the following requirements:

  • You must be able to ssh to it (using username/password or ssh keys)
  • The resource must run a BOSCO supported platform (currently RHEL5, RHEL6, Debian 6, or any of their derivative, all in the 64bit version)
  • The resource must run one of the following queue managers:
    • PBS flavors (Torque and PBSPro)
    • HTCondor (7.6 or later)
    • SGE (Sun Grid Engine)
    • LSF
    • SLURM (with Torque/PBS command wrappers installed)

Connecting your resource

If the requirements above are satisfied you can go ahead and connect the resource.

You need to know the following and replace them in the commands below:

  • USER_NAME,  your user name on the BOSCO resource, a.k.a. login name
  • HOSTNAME.DOMAIN, the host name of the BOSCO resource, a.k.a the fully qualified host name 
  • QUEUE_MGR, the queue manager used on the BOSCO resource. This is the program used to submit jobs on the BOSCO resource. If you don't know what this is please ask the administrator of that cluster. For HTCondor use condor, for LSF use lsf, for SGE (or Open Grid or other Grid Engine versions) use sge, for PBS  (Torque and PBSPro) or SLURM use pbs

Type:

bosco_cluster -a USER_NAME@HOSTNAME.DOMAIN QUEUE_MGR

Then BOSCO may ask you to add the host key to the known hosts (answer yes) and will prompt you to enter the password that you normally enter when you login on HOSTNAME.DOMAIN. This setup process will add some files in your home directory on login.osgconnect.net and add a bosco folder on the home directory of the BOSCO resource. Once the setup process completes BOSCO will print two lines that you have to add to your submit files to send jobs to this resource (see below). Take note of these two lines.

Example of running bosco_cluster -a
[osgid@login01 ~]$ bosco_cluster --add cnetid@midway-login1.rcc.uchicago.edu pbs
Enter the password to copy the ssh keys to cnetid@midway-login1.rcc.uchicago.edu:
cnetid@midway-login1.rcc.uchicago.edu's password:
Detecting PBS cluster configuration...Done!
Downloading for cnetid@midway-login1.rcc.uchicago.edu.
Unpacking.
Sending libraries to cnetid@midway-login1.rcc.uchicago.edu.
You are nor running as the factory user. BOSCO Glideins disabled.
Installing on cluster cnetid@midway-login1.rcc.uchicago.edu...
Installation complete
The cluster cnetid@midway-login1.rcc.uchicago.edu has been added to BOSCO
It is available to run jobs submitted with the following values:
> universe = grid
> grid_resource = batch pbs cnetid@midway-login1.rcc.uchicago.edu

You must be able to login to the remote cluster. If password authentication is OK, the script will ask you for your password. If key only login is allowed, then you must load your key in the ssh-agent. Here is an example adding the key and testing the login:

[osgid@login01 ~]$ eval `ssh-agent`
Agent pid 17103;
[osgid@login01 ~]$ ssh-add id_rsa_bosco 
Enter passphrase for id_rsa_bosco: 
Identity added: id_rsa_bosco (id_rsa_bosco)
[osgid@login01 ~]$ ssh mm@itbv-ce-pbs.uchicago.edu
Last login: Thu Sep 13 13:49:33 2013 from login01.osgconnect.org
$ logout

Some clusters have multiple login nodes behind a round robin DNS server. You can recognize them because when you login to the node (e,g: ssh login.mydomain.org), it will show a name different form the one used to connect (e.g.: hostname -f will return login2.mydomain.org). If this happens you must add the BOSCO resources by using a name of the host, not the DNS alias (e.g. bosco_cluster --add login2.mydomain.org). This is because sometime these multiple login nodes do not share all the directories and BOSCO may be unable to find its files if different connections land on different hosts :

Note how midway-login2.rcc.uchicago.edu is use instead of midway.rcc.uchicago.edu:
[user@bosco ~]$ ssh  mm@midway.rcc.uchicago.edu
mm@midway.rcc.uchicago.edu's password: 
===============================================================================
                               Welcome to Midway
                           Research Computing Center
                             University of Chicago
...
===============================================================================
[mm@midway-login2 ~]$ hostname -f
midway-login2.rcc.uchicago.edu
[mm@midway-login2 ~]$ logout
Connection to midway.rcc.uchicago.edu closed.
[user@bosco ~]$ marco$ bosco_cluster --add mm@midway-login2.rcc.uchicago.edu
Warning: No batch system specified, defaulting to PBS
If this is incorrect, rerun the command with the batch system specified
Enter the password to copy the ssh keys to mm@midway-login2.rcc.uchicago.edu:
mm@midway-login2.rcc.uchicago.edu's password: 
Detecting PBS cluster configuration...
Done!
Downloading for mm@midway-login2.rcc.uchicago.edu....
Unpacking.........
Sending libraries to mm@midway-login2.rcc.uchicago.edu...
Creating BOSCO for the WN's....................................................................
Installation complete
The cluster mm@midway-login2.rcc.uchicago.edu has been added to BOSCO
It is available to run jobs submitted with the following values:
> universe = grid
> grid_resource = batch pbs mm@midway-login2.rcc.uchicago.edu

For further options check the BOSCO manual.

You can connect multiple resources to your account. You can always list the BOSCO resources currently connected to your account using bosco_cluster -l. And you can remove a BOSCO resource by using bosco_cluster -r USER_NAME@HOSTNAME.DOMAIN .

Submitting jobs to your resource

Jobs will not go automatically to your BOSCO resources, you have to send them there explicitly. To submit jobs to your connected BOSCO resources you have to replace the universe = vanilla line in the submit file with the two lines suggested at the end of the connection setup (see above). And you can add also an optional line to specify the name of the queue. These three lines will look like:

 

universe = grid
grid_resource = batch QUEUE_MGR USER_NAME@HOSTNAME.DOMAIN
batch_queue = QUEUE_NAME

The example below is very similar to the quickstart example, the main difference are the first two lines in the submit file. As for the quickstart example you need the short.sh and the log directory, you can get them setting up the quickstart tutorial: $ tutorial quickstart; $ cd ~/osg-quickstart  Then edit tutorial01 to look like the example below: 

 In the example remember to replace QUEUE_MGR, USER_NAME and HOSTNAME.DOMAIN with the ones you just connected!

Example of submit file
universe = grid
grid_resource = batch QUEUE_MGR USER_NAME@HOSTNAME.DOMAIN
Executable = short.sh

Error = log/job.err.$(Cluster)-$(Process)
Output = log/job.out.$(Cluster)-$(Process)
Log = log/job.log.$(Cluster)

+ProjectName="con-train"

Queue 

Most options and techniques used in the submit file can be used also in these jobs. An exception is that you cannot use the requirements attribute to select resources because you send the job explicitly to the BOSCO resource. See the BOSCO manual for ways to pass custom submit properties and modifying the maximum number of submitted jobs to a resource.

When you type condor_q these jobs will appear together with all your other jobs running on OSG Connect resources.