Child pages
  • ATLAS Connect Quickstart
Skip to end of metadata
Go to start of metadata

This is a quick start page which should take only a few minutes to complete. It covers only the basics of job submission.

Table of Contents

Login to the ATLAS Connect submit host

  • If not already registered with ATLAS Connect, go to the registration site and follow the SignIn/SignUp instructions.
  • Once registered you will be authorized to use login.usatlas.org (the HTCondor submit host) and faxbox.usatlas.org (the data host), in each case authenticating with your network ID (netid) and password:
$ ssh netid@login.usatlas.org

Remember:

Always replace netid with your own username.

Set up the tutorial

You may perform the examples in the tutorial by typing them in from the text below, or by using tutorial files already on login.usatlas.org. It's your choice; the tutorial is the same either way.

Pretyped setup

To save some typing, you can install the tutorial into your home directory from login.usatlas.org. This is highly recommended to ensure that you don't encounter transcription errors during the tutorials.

$ tutorial 
usage: tutorial name-of-tutorial
       tutorial info name-of-tutorial
Available tutorials:
FAX                Federated ATLAS XRootD tutorial
analysis           User analysis tutorial using RootCore
quickstart         Basic HTCondor job submission tutorial

Now, run the quickstart tutorial:

$ tutorial quickstart
Basic HTCondor job submission tutorial
Tutorial 'quickstart' is set up.  To begin:
     cd ~/tutorial-quickstart
$ cd ~/tutorial-quickstart 

Manual setup

Alternatively, if you want the full manual experience, create a new directory for the tutorial work:

$ mkdir tutorial-quickstart 
$ cd tutorial-quickstart 

Tutorial jobs

Job 1: A simple, nonparallel job

Create a workload

Inside the tutorial directory that you created or installed previously, let's create a test script to execute as your job:

$ vi short.sh
file: short.sh
#!/bin/bash
# short.sh: a short discovery job
printf "Start time: "; /bin/date
printf "Job is running on node: "; /bin/hostname
printf "Job running as user: "; /usr/bin/id
printf "Job is running in directory: "; /bin/pwd
echo
echo "Working hard..."
sleep ${1-15}
echo "Science complete!"

Now, make the script executable.

$ chmod +x short.sh 

If you used the tutorial command, all files are already in your workspace.

Run the job locally

When setting up a new job type, it's important to test your job outside of Condor before submitting it to HTCondor.

$ ./short.sh
Start time: Wed Aug 21 09:21:35 CDT 2013
Job is running on node: login.usatlas.org
Job running as user: uid=54161(netid) gid=1000(users) groups=1000(users),0(root),1001(osg-connect),1002(osg-staff),1003(osg-connect-test),9948(staff),19012(osgconnect)
Job is running in directory: /home/netid/tutorial-quickstart
Working hard...
Science complete!

Create an HTCondor submit file

So far, so good! Let's create a simple (if verbose) HTCondor submit file.

$ vi tutorial01
file: tutorial01
# The UNIVERSE defines an execution environment. You will almost always use VANILLA. 
Universe = vanilla 

# EXECUTABLE is the program your job will run It's often useful 
# to create a shell script to "wrap" your actual work. 
Executable = short.sh 

# ERROR and OUTPUT are the error and output channels from your job
# that HTCondor returns from the remote host.
Error = job.error
Output = job.output

# The LOG file is where HTCondor places information about your 
# job's status, success, and resource consumption. 
Log = job.log

# +ProjectName is the name of the project reported to the accounting system
+ProjectName="AtlasConnect"

# QUEUE is the "start button" - it launches any jobs that have been 
# specified thus far. 
Queue 1

Choose the project name

You have two ways to set the project name for your jobs:

  1. Write the correct name of your project in a file: $HOME/.ciconnect/defaultproject 
  2. Add the +ProjectName="MyProject" line to the HTCondor submit file. Remember to quote the project name!

If you do not set a correct project name either in $HOME/.default_project file or HTCondor file,  your job submission will fail.

You can join projects after you login at https://portal.usatlas.org/. Within minutes of joining and being approved for a project, you will have access via condor_submit as well.  To see projects you belong to visit https://portal.usatlas.org/globus-app/groups.


If you decided to define the project name in HTCondor job file,  set a correct project name using the +ProjectName = "project" parameter. A job with incorrect ProjectName will fail with a message like:

No ProjectName ClassAd defined!
Please record your project ID in your submit file.
  Example:  +ProjectName = "AtlasConnect"

Based on your username, here is a list of projects you might have 
access to:
AtlasConnect

Note that project names are case sensitive.

Submit the job

Submit the job using condor_submit.

$ condor_submit tutorial01
Submitting job(s). 
1 job(s) submitted to cluster 823.

Check job status

The condor_q command tells the status of currently running jobs. Generally you will want to limit it to your own jobs:

$ condor_q netid
-- Submitter: login01.usatlas.org : <128.135.158.173:43606> : login.usatlas.org
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
 823.0   netid           8/21 09:46   0+00:00:06 R  0   0.0  short.sh
1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended

If you want to see all jobs running on the system, use condor_q without any extra parameters.

You can also get status on a specific job cluster:

$ condor_q 823
-- Submitter: login.usatlas.org : <128.135.158.173:43606> : login.usatlas.org
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
 823.0   netid           8/21 09:46   0+00:00:10 R  0   0.0  short.sh
1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended

Note the ST (state) column. Your job will be in the I state (idle) if it hasn't started yet. If it's currently scheduled and running, it will have state R (running). If it has completed already, it will not appear in condor_q.

Let's wait for your job to finish – that is, for condor_q not to show the job in its output. A useful tool for this is watch – it runs a program repeatedly, letting you see how the output differs at fixed time intervals. Let's submit the job again, and watch condor_q output at two-second intervals:

$ condor_submit tutorial01
Submitting job(s). 
1 job(s) submitted to cluster 824
$ watch -n2 condor_q netid 
... 

When your job has completed, it will disappear from the list.

To close watch, hold down Ctrl and press C.

Job history

Once your job has finished, you can get information about its execution from the condor_history command:

$ condor_history 823
 ID      OWNER            SUBMITTED     RUN_TIME ST   COMPLETED CMD
 823.0   netid            8/21 09:46   0+00:00:12 C   8/21 09:46 /home/netid/

You can see much more information about your job's final status using the -long option.

Check the job output

Once your job has finished, you can look at the files that HTCondor has returned to the working directory. If everything was successful, it should have returned:

  • a log file from Condor for the job cluster: jog.log
  • an output file for each job's output: job.output
  • an error file for each job's errors: job.error

Read the output file. It should be something like this:

$ cat job.output
Start time: Wed Aug 21 09:46:38 CDT 2013
Job is running on node: appcloud01
Job running as user: uid=58704(osg) gid=58704(osg) groups=58704(osg)
Job is running in directory: /var/lib/condor/execute/dir_2120
Sleeping for 10 seconds...
Et voila!

Job 2: Submitting jobs concurrently

What do we need to do to submit several jobs simultaneously? In the first example, Condor returned three files: out, error, and log. If we want to submit several jobs, we need to track these three files for each job. An easy way to do this is to add the $(Cluster) and $(Process) macros to the HTCondor submit file. Since this can make our working directory really messy with a large number of jobs, let's tell HTCondor to put the files in a directory called log. Here's what the second (less verbose) submit file looks like:

file: tutorial02
Universe = vanilla 
Executable = short.sh 
Error = log/job.error.$(Cluster)-$(Process) 
Output = log/job.output.$(Cluster)-$(Process) 
Log = log/job.log.$(Cluster) 
+ProjectName="AtlasConnect"
Queue 10 

Before submitting, we also need to make sure the log directory exists.

$ mkdir -p log

You'll see something like the following upon submission:

$ condor_submit tutorial02
Submitting job(s)..........
10 job(s) submitted to cluster 837.

Job 3: Passing arguments to executables

Sometimes it's useful to pass arguments to your executable from your submit file. For example, you might want to use the same job script for more than one run, varying only the parameters. You can do that by adding {Arguments to your submission file. Let's try that with tutorial03.

We want to run many more instances for this example: 100 instead of only 10. To ensure that we don't collectively overwhelm the scheduler let's also dial down our sleep time from 15 seconds to 5.

file: tutorial03
Universe = vanilla 
Executable = short.sh 
Arguments = 5 # to sleep 5 seconds 
Error = log/job.err.$(Cluster)-$(Process) 
Output = log/job.out.$(Cluster)-$(Process) 
Log = log/job.log.$(Cluster) 
+ProjectName="AtlasConnect"
Queue 100

And let's submit:

$ condor_submit tutorial03
Submitting job(s)....................................................................................................
10 job(s) submitted to cluster 938. 

Where did jobs run?

It might be desirable to find out what machines or sites ran our jobs. We can use the condor_history command to look at our job's complete history file. In this case, we're looking for a specific piece of information, the LastRemoteHost. Here's the appropriate condor_history command:

$ condor_history -format '%s\n' LastRemoteHost 938
...
506192@bl-6-2.aglt2.org
838967@bl-5-6.aglt2.org
834699@bl-5-6.aglt2.org
2543312@bl-5-4.aglt2.org
22471@pt3wrk4.atlas.csufresno.edu
22642@pt3wrk4.atlas.csufresno.edu
3293@pt3wrk7.atlas.csufresno.edu
11096@t3head.atlas.csufresno.edu
...

You can see the full job history with condor_history -long.

This output can be further parsed and put into a nice format. We've done this for you with the command connect history --last. It will parse and bin your most recently run jobs by site:

$ connect histogram --last
Val          |Ct (Pct)      Histogram
mwt2.org     |1887 (55.88%) ███████████████████████████████████████████████████▏
csufresno.edu|557 (16.49%)  ███████████████▏
aglt2.org    |319 (9.45%)   ████████▋

Important ClassAds

Here a short list of most important ClassAds:


ClassAdMeaning

HAS_CVMFS

HAS_CVMFS_xenon_opensciencegrid_org

HAS_CVMFS_spt_opensciencegrid_org

HAS_CVMFS_oasis_opensciencegrid_org

HAS_CVMFS_osgstorage_org

worker node will have CVMFS mounted

these can be combined eg.

Requirement = ( HAS_CVMFS_osgstorage_org && HAS_CVMFS_oasis_opensciencegrid_org )

SL6

RedHat6

CentOS6

select OS type

to combine them do eg.

Requirement = ( OpSysAndVer is "CentOS6" || OpSysAndVer is "RedHat6" || OpSysAndVer is "SL6" )

Request_Cpus = <number of CPU cores>to request up to 8 cores for your job (default 1)
Request_Memory = <memory in MB>to request up to 16GB of RAM (default 2GB)
+RCC_MaxWallClockTime   = <seconds>

maximum time job will need in seconds (defaults to 1440 seconds).

Using this value will guarantee that job will give a slot in the MWT2 glidein of that at least that duration. MWT2 will not accept jobs requesting more than 3 days.

+WANT_RCC_ciconnect = true

without this ClassAd, job won't run on RCC

Requirement = ( IS_RCC )

this will send jobs only to RCC


Getting help

If anything here didn't work, please email support@connect.usatlas.org.

  • No labels
Write a comment…