Using TensorFlow

OSG provides ready-to-use singularity containers for TensorFlow workflows for both CPU and GPU jobs [1]. These containers are based on Ubuntu and live in cvmfs.

  1. /cvmfs/singularity.opensciencegrid.org/opensciencegrid/tensorflow:latest is the CPU only version of TensorFlow. This runs slower, but OSG has many more CPU only resources available than GPU resources.
  2. /cvmfs/singularity.opensciencegrid.org/opensciencegrid/tensorflow-gpu:latest is a modified TensorFlow-GPU image to be used on the GPUs available on OSG.

CPU jobs

For CPU jobs, the following attributes need to be added to your submit file:

file.submit
Requirements = HAS_SINGULARITY == True
+SingularityImage = "/cvmfs/singularity.opensciencegrid.org/opensciencegrid/tensorflow:latest"

This will load the OSG supported TensorFlow singularity container. For more information about using singularity, click here.

Note: For CPU jobs, both python2 and python3 have the TensorFlow module available.

GPU jobs

For GPU, besides changing the singularity image properly, the number of GPUs needs to be added as a requirement and "CUDACapability >= 3" is also necessary, to ensure the GPU is new enough to support TensorFlow features. At present, a job can only use 1 GPU at a time. Note the number of GPU resources is still limited, so matching these jobs can take a while.

Note: For GPU jobs, only python3 has the TensorFlow module available at present.


file.submit
Requirements = HAS_SINGULARITY == True && CUDACapability >= 3
request_gpus = 1
+SingularityImage = "/cvmfs/singularity.opensciencegrid.org/opensciencegrid/tensorflow-gpu:latest"

Tutorial Example

The following example shows how to submit a TensorFlow matrix multiplication example as a CPU and GPU job. Type the following from the CMS Connect submit node to get the tutorial example code:

basg
$ tutorial tensorflow-matmul
$ cd tutorial-tensorflow-matmul

Then, submit a CPU and a GPU job

basg
$ condor_submit tf_matmul.submit 
Submitting job(s).
1 job(s) submitted to cluster 649698.
$ condor_submit tf_matmul_gpu.submit
Submitting job(s).
1 job(s) submitted to cluster 649699.

The above will submit jobs that will run the following python code with python3:

tf_matmul.py
# simple example to show the matrix multiplication with tensorflow

import tensorflow as tf
matrix1 = tf.constant([1.0,2.0,3.0,4.0], shape=[2, 2])
matrix2 = tf.matrix_inverse(matrix1)
product = tf.matmul(matrix1, matrix2)
with tf.Session() as sess:
    result = sess.run(product)
    print("result of matrix multiplication")
    print("===============================")
    print(result)
    print("===============================")


You can track your jobs via condor_q (check here, for a quick condor tutorial regarding submitting and checking the status of your job)

Once the job are finished, you will be able to see the following output:

basg
$ cat 649698.0.output
result of matrix multiplication
===============================
[[  1.00000000e+00   0.00000000e+00]
 [ -4.76837158e-07   1.00000024e+00]]
===============================
$ cat 649699.0.output
result of matrix multiplication
===============================
[[  1.00000000e+00   0.00000000e+00]
 [ -4.76837158e-07   1.00000024e+00]]
===============================

The tutorial example can be found in github here.


Working with TensorFlow and GPU resources interactively

The following link shows how to submit condor jobs, requesting GPU resources and how access them via SSH: Using GPU resources via SSH.

References

[1] https://support.opensciencegrid.org/support/solutions/articles/12000028940-tensorflow#about-tensorflow