BoscoR: Distributed Computing with R


BoscoR is a software solution utilizing Bosco and GridR to enable remote processing of R programming language functions.  Utilizing BoscoR, you can submit remote processing from within your R environment, whether it is RStudio or the R command line to clusters on your campus or on national infrastructure such as the OSG or XSEDE.

This tutorial is adapted from the GridR Wiki MonteCarlo example.

There are many ways to parallelize R executions, such as the builtin parallel (multicore and snow) package.  But GridR integrates better with the High Thoughput Computing resources that is available on osgconnect.


In order to run BoscoR, you need on your laptop or Bosco submit host (not login01):

Installing GridR

Download GridR (list of releases).

After downloading, in your R environment (whether RStudio or R GUI, or the R command line) run the command:

> install.packages("~/Downloads/GridR_0.9.7.tar.gz", repos=NULL, type="source")

This will install GridR from the source package.

Running the PI example

In the previous R example, we attempted to calculate PI using R.  It required us to create submission files and manage the submission the HTCondor submissions.  This time we will use BoscoR (with help from Bosco and GridR) to automate this process.

First, we will start with the original R code to estimate PI, and we use GridR to send the jobs to osgconnect.  The code is listed below, or you can download it here: monte-carlo.R

montecarloPi <- function(trials, inst) {
  count = 0
  for(i in 1:trials) {
    if((runif(1,0,1)^2 + runif(1,0,1)^2)<1) {
      count = count + 1

# load the GridR library
grid.init(service="", localTmpDir="tmp")
# Send 10 instances of the montecarloPi
grid.apply("pi_estimate", montecarloPi, 10000000, c(1:10), batch=c(2))

Notice that you didn't have to write a Condor script.  It will take about 1 minute to complete, you can check the value of pi_estimate to see if it has been completed.

Lets get the average PI from the output:

> mean(unlist(pi_estimate))
[1] 3.141534

Further examples, and the full reference documentation can be found on the GridR Wiki.

Increasing Throughput

By default, Bosco attempts to protect the remote cluster by throttling submissions.  For Condor clusters, the limit is set very low for evaluation purposes, specifically 10 jobs can be running at a time.  To increase this limit, follow instructions on the Bosco install document.