This section covers how to use the OASIS system to run a real application like R statistical package. For this example, we'll estimate the value of pi using a Monte Carlo method. We'll first run the program locally, then create a submit file, send it out to OSG-Connect, and collate our results.
Some background is useful here. We define a square inscribed by a unit circle. We randomly sample points, and calculate the ratio of the points outside of the circle to the points inside for the first quadrant. This ratio approaches pi/4.
This method converges extremely slowly, which makes it great for a CPU-intensive exercise (but bad for a real estimation!).
First we'll need to create a working directory, you can either run
tutorial R or type the following:
Since R is installed into OASIS, it's not available in the normal system paths. We'll need to set up those paths so we can access R correctly. To do that we'll use PALMS:
Once we have the path set up, we can try to run R. Don't worry if you aren't an R expert, I'm not either.
Great! R works. You can quit out with "q()".
Now that we can run R, let's try using my Pi estimation code:
R normally runs as an interactive shell, but it's easy to run in batch mode too.
This should take few seconds to run. Now edit the file. Increasing the trials ten times (10000000) it will take little over a minute to run, but the estimation still isn't very good. Fortunately, this problem is pleasingly parallel since we're just sampling random points. So what do we need to do to run R on the campus grid?
The first thing we're going to need to do is create a wrapper for our R environment, based on the setup we did in previous sections.
Notice here that I've changed to Rscript (equivalent to R --slave) instead of --no-save. It accepts the script as command line argument, it makes R much less verbose and it's easier to parse the output later.
Now that we've created a wrapper, let's build a Condor submit file around it.
Notice the requirements line? You'll need to put HAS_CVMFS =?= TRUE any time you need software, such as R, from CVMFS. There's also one small gotcha here – make sure the "log" directory used in the submit file exists before you submit! Else Condor will fail because it has nowhere to write the logs.
Finally, submit the job to OSG-Connect!
Since our jobs just output their results to standard out, we can do the final analysis from the log files. Let's see what one looks like:
I'm just going to use a bit of awk magic to do the average for me.
That's pretty close!
What to do next?
The R.submit file may have included a few lines that you are unfamiliar with. For example,
$(Process) are variables that will be replaced with the job's cluster and process id. This is useful when you have many jobs submitted in the same file. Each output and error file will be in a separate directory.
Also, did you notice the
transfer_input_files line? This tells HTCondor what files to transfer with the job to the worker node. You don't have to tell it to transfer the executable, HTCondor is smart enough to know that the job will need that. But any extra files, such as our MonteCarlo R file, will need to be explicitly listed to be transferred with the job. You can use
transfer_input_files for input data to the job, as shown in Transferring data with HTCondor. If you have larger data requirements, you may look into Transferring your Stash'd data with HTCondor.