Job Submission from a Laptop (BOSCO)
Overview
In this module we demonstrate job submission to the OSG Connect environment from your laptop with BOSCO. This will allow you to manage jobs running on OSG from your familiar environment. It does not have to be a laptop: Any Linux or Mac host can be used provided it runs RHEL5 or RHEL6 (and Scientific Linux distributions), Debian 6, or Mac OS X (10.5 or later).
Here is a diagram showing a picture of how the different resources are connected when you install BOSCO on your laptop:
Jobs running on OSG Connect must have a project name set ( +ProjectName
in the HTCondor submit file). The best way to do that is by adding in your home directory a file with your default project name: $HOME/.osg_default_project
For example on login.osgconnect.net
do: echo ConnectTrain > $HOME/.osg_default_project to use the ConnectTrain account. Once you create a project, you should replace the ConnectTrain with your project name.
In the examples below substitute user
with your actual User ID.
In the rest of the tutorial we'll assume that you'll be working on a terminal on your laptop or on whichever host you choose as BOSCO submit host.
Install and configure BOSCO
The following example is on a host named laptop
. On other machines your prompt will be more something like [yourname@yourhost ~]
, mentioning your host instead of [user@laptop ~]
. The rest will be the same.
Download the BOSCO Quickstart Multi-Platform installer from this download page. If you prefer to work in a terminal window, you can also copy the URL that will be printed in the download page and use cURL:
[user@laptop Downloads]$ curl -o bosco_quickstart.tar.gz ftp://ftp.cs.wisc.edu/GET_THE_URL_FROM_THE_PAGE/bosco_quickstart.tar.gz
If you have no curl you can use wget to download the file:
wget -O ./bosco_quickstart.tar.gz ftp://ftp.cs.wisc.edu/GET_THE_URL_FROM_THE_PAGE/bosco_quickstart.tar.gz
- Untar the bosco_quickstart script from a terminal with a current working directory of the
~/Downloads
folder or the folder in which you saved the file:[user@laptop Downloads]$ tar xvzf ./bosco_quickstart.tar.gz
- Run the quickstart script and answer the questions.
[user@laptop Downloads]$ ./bosco_quickstart
- When prompted "
Do you want to install Bosco? Select y/n and press [ENTER]:
" press "y" and ENTER. - When prompted "
Type the cluster name and press [ENTER]:
" typelogin.osgconnect.net
and press ENTER. - When prompted "
Type your name at login.osgconnect.net (default YOUR_USER) and press [ENTER]:
" enter your user name on OSG-Connect and press ENTER. - When prompted "
Type the queue manager for login.osgconnect.net (pbs, condor, lsf, sge, slurm) and press [ENTER]:
" entercondor
and press ENTER. - Then when prompted "
user@login.osgconnect.net's password:
" enter your OSG-Connect user password.
- When prompted "
- After a successful installation, before changing directory, you can remove the installer and its log file:
[user@laptop Downloads]$ rm bosco_quickstart*
- Setup the environment
[user@laptop ~]$ source ~/bosco/bosco_setenv
- BOSCO has been started for you but in the future you may need to restart it with:
[user@laptop ~]$ bosco_start BOSCO Started
At this point, submission tologin.osgconnect.net
, which gets to the full OSG-Connect environment is now ready. The BOSCO services will remain running even if you log out unless explicitly shut down.
Each time setup the BOSCO environment
Each time you login or start a new shell stup the environment and invoke bosco_start (bosco_start
is a no-op if the services are already running):
$ source ~/bosco/bosco_setenv $ bosco_start BOSCO Started
Create a tutorial directory
Create a new directory to run this tutorial and the log directory for the jobs:
$ mkdir -p tutorial-bosco/log $ cd tutorial-bosco
Submit a job to OSG-Connect
Now run a simple job, like the Job 1 of the Quickstart tutorial . The workload is the same, the submit description file will be slightly different.
Create a workload
Inside the tutorial directory that you created or installed previously, let's create a test script to execute as your job (remember to make the script executable!):
$ vi short.sh $ chmod +x short.sh
Here is the content of short.sh:
#!/bin/bash # short.sh: a short discovery job printf "Start time: "; /bin/date printf "Job is running on node: "; /bin/hostname printf "Job running as user: "; /usr/bin/id echo "Environment:" /bin/env | /bin/sort echo "Dramatic pause..." sleep ${1-15} # Sleep 15 seconds, or however much we're told to sleep echo "Et voila!"
Create a condor submit file:
The next step is to create a submission file for the job.
$ vi bosco01.sub
Here is the bosco01.sub content, configured to use a special project name on login01.osgconnect.net
. This is a general purpose project name and you are encouraged to use one of the projects that you are member of. You can see the projects you are member of by using the osgconnect_show_projects
command. This is very nearly the minimal content of a submission file.
Note that differently from the previous examples, now the Universe
of the job is now grid
. This tells BOSCO to run the job on the resource added during the setup.
######################## # Submit description file for short test program ######################## Universe = grid Executable = short.sh Error = log/job.err.$(Cluster)-$(Process) Output = log/job.out.$(Cluster)-$(Process) Log = log/job.log.$(Cluster) +ProjectName="ConnectTrain" Queue 1
Submit the job using condor_submit
.
$ condor_submit bosco01.sub Submitting job(s). 1 job(s) submitted to cluster 2.
Note the "submitted to cluster 1": if you did a fresh installation of BOSCO the ID of the job group you've created is 1. You'll use this for monitoring the status of your jobs.
Check job status
The condor_q
command tells the status of currently running jobs. Generally you will want to limit it to your own jobs:
$ condor_q -- Submitter: laptop : <127.0.0.1:11000?sock=44111_3112_3> : laptop ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 2.0 user 8/20 16:12 0+00:00:00 I 0 0.0 short.sh 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended
condor_q
lists only your jobs even without specifying the cnet id. Only you can submit to your BOSCO, it is your personal HTCondor installation.If you notice that your jobs are being held after submitting them from BOSCO, double check your ~/.osg_default_project
to make sure that you have your project listed in that file.
Submit more jobs to OSG-Connect
This example submits 20 jobs to OSG-Connect. To ease the observation of the job we'll increase the sleep time to 40 seconds.
Edit the submit file
bosco01.sub
and add the lineArguments = 40
and change the last line toQueue 20
:$ vi bosco01.sub
Submit the set of 20 jobs:
$ condor_submit ./bosco01.sub Submitting job(s).................... 20 job(s) submitted to cluster 3.
Watch the jobs go through the queue by using
watch -n2 condor_q -grid
. The -grid option changes the format of condor_q and provides more information about where the jobs run.$ condor_q -grid -- Submitter: laptop.local : <127.0.0.1:11000?sock=52977_4003_3> : laptop.local ID OWNER STATUS GRID->MANAGER HOST GRID_JOB_ID 3.0 user IDLE batch-> user@login01.osgcon /804// 3.1 user IDLE batch-> user@login01.osgcon laptop.local_1100 3.2 user IDLE batch-> user@login01.osgcon /809// 3.3 user IDLE batch-> user@login01.osgcon laptop.local_1100 3.4 user IDLE batch-> user@login01.osgcon laptop.local_1100 3.5 user IDLE batch-> user@login01.osgcon /806// 3.6 user IDLE batch-> user@login01.osgcon laptop.local_1100 3.7 user IDLE batch-> user@login01.osgcon /810// 3.8 user IDLE batch-> user@login01.osgcon /802// 3.9 user IDLE batch-> user@login01.osgcon laptop.local_1100 3.10 user IDLE batch-> user@login01.osgcon /807// 3.11 user IDLE batch-> user@login01.osgcon laptop.local_1100 3.12 user IDLE batch-> user@login01.osgcon /811// 3.13 user IDLE batch-> user@login01.osgcon /803// 3.14 user IDLE batch-> user@login01.osgcon laptop.local_1100 3.15 user IDLE batch-> user@login01.osgcon /808// 3.16 user IDLE batch-> user@login01.osgcon laptop.local_1100 3.17 user IDLE batch-> user@login01.osgcon laptop.local_1100 3.18 user IDLE batch-> user@login01.osgcon /805// 3.19 user IDLE batch-> user@login01.osgcon laptop.local_1100
Note that condor_q
on your BOSCO installation will list only your jobs. There may be other jobs queued on OSG Connect but to see them you'll have to login on login01.osgconnect.net
and issue condor_q
there.
Other BOSCO commands
You can check the resources connected to BOSCO:
$ bosco_cluster --list user@login01.osgconnect.net/condor
You can stop and uninstall BOSCO:
$ source ~/bosco/bosco_setenv $ bosco_stop Sending off command to condor_master. Sent "Kill-Daemon" command for "master" to local master Stopped HTCondor BOSCO is now off. $ bosco_uninstall Ensuring Condor is stopped... BOSCO is now off. Removing BOSCO installation under /home/mmb/bosco Done
All the HTCondor commands work form BOSCO. This document contains a detailed description of all the installation options and all the BOSCO commands.