Using TensorFlow

OSG provides ready-to-use singularity containers for TensorFlow workflows for both CPU and GPU jobs [1]. These containers are based on Ubuntu and live in cvmfs.

  1. /cvmfs/singularity.opensciencegrid.org/opensciencegrid/tensorflow:latest is the CPU only version of TensorFlow. This runs slower, but OSG has many more CPU only resources available than GPU resources.
  2. /cvmfs/singularity.opensciencegrid.org/opensciencegrid/tensorflow-gpu:latest is a modified TensorFlow-GPU image to be used on the GPUs available on OSG.

CPU jobs

For CPU jobs, the following attributes need to be added to your submit file:

This will load the OSG supported TensorFlow singularity container. For more information about using singularity, click here.

Note: For CPU jobs, both python2 and python3 have the TensorFlow module available.

GPU jobs

For GPU, besides changing the singularity image properly, the number of GPUs needs to be added as a requirement and "CUDACapability >= 3" is also necessary, to ensure the GPU is new enough to support TensorFlow features. At present, a job can only use 1 GPU at a time. Note the number of GPU resources is still limited, so matching these jobs can take a while.

Note: For GPU jobs, only python3 has the TensorFlow module available at present.


Tutorial Example

The following example shows how to submit a TensorFlow matrix multiplication example as a CPU and GPU job. Type the following from the CMS Connect submit node to get the tutorial example code:

Then, submit a CPU and a GPU job

The above will submit jobs that will run the following python code with python3:


You can track your jobs via condor_q (check here, for a quick condor tutorial regarding submitting and checking the status of your job)

Once the job are finished, you will be able to see the following output:

The tutorial example can be found in github here.


Working with TensorFlow and GPU resources interactively

The following link shows how to submit condor jobs, requesting GPU resources and how access them via SSH: Using GPU resources via SSH.

References

[1] https://support.opensciencegrid.org/support/solutions/articles/12000028940-tensorflow#about-tensorflow