Index of /classes/CS542

[ICO]NameLast modifiedSizeDescription

[PARENTDIR]Parent Directory   -  
[TXT]SC542.Rmd 2019-10-10 20:10 14K 
[TXT]SC542.html 2019-10-10 20:10 671K 
[   ]hello.py 2018-10-21 13:32 40  
[DIR]images/ 2017-10-24 11:58 -  
[TXT]matmul.py 2017-10-24 14:32 380  
[TXT]matmul.qsub 2017-10-24 14:37 433  
[TXT]simple_TF_test.ipynb 2018-10-21 13:48 1.9K 

Deep Learning on the SCC

Tensorflow on the SCC

Tensorflow is available on the SCC with support for GPU accelerated computations and CPU-only computations. This page provides examples and guidance on how to use Tensorflow on the SCC.

Modules

To see the latest version of Tensorflow available run the command:

 module avail tensorflow

The SCC has two modules for Tensorflow. They are labeled to distinguish between Tensorflow for GPU nodes and CPU-only nodes:

Here is an example of loading the release 1.10 Tensorflow module. This module supports Python 2.7.x and 3.6.x and will automatically load CPU or GPU compiled versions based on the availability of a GPU. It is compiled with CUDA 9.2 and cuDNN 7.2 support. The following commands will therefore work on GPU or CPU nodes:


module load python/3.6.2 
module load tensorflow/r1.10 

Queue Resources

When requesting GPUs it is important to specify that the assigned GPUs have a CUDA compute capability of at least 3.5 as this is the minimum requirement for Tensorflow. This is done using the -l gpu_c=3.5 option for queue jobs. This is an exampe of specifying the compute capability when requesting a GPU and 2 CPU cores. The use of 2 cores is the minimum requirement for Tensorflow (see the next section for details):

#interactive job 
qrsh -l gpu_c=3.5,gpus=0.5 -pe omp 2
#queue job
qsub -l gpu_c=3.5,gpus=0.5 -pe omp 2 my_script.qsub  

Configuring the Tensorflow Session object for the SCC

When a job is run on the SCC it has resources (some number of cores and GPUs) assigned to it. In order for Tensorflow code to access the assigned resources properly, the following instructions for configuring the Tensorflow Session object are mandatory for your code to run properly. The Session is object is configured when it is initialized using a Tensorflow ConfigProto object. A code example follows the description of the ConfigProto options.

allow_soft_placement=True

The allow_soft_placement option will cause Tensorflow to search for a compatible device if the requested on is not available. If the Python code requests the first GPU on the compute node (with the with tf.device('/gpu:0'): syntax) but is assigned to the second or third GPU on the node the job will crash. The allow_soft_placement option will let Tensorflow identify the actual assigned GPU and use it in place of gpu:0 automatically.

An additional effect of the allow_soft_placement option is that Tensorflow code that is requested to be run on the GPU will be automatically run on the CPU if no GPUs are available. This allows you to test or debug Tensorflow code on a non-GPU compute node or the login node without any code changes provided the CPU version of the Tensorflow module is loaded.

Set intra_op_parallelism_threads and inter_op_parallelism_threads

These two options control the number of CPU cores that Tensorflow will use. If Tensorflow attempts to use more cores than it the job requested then the job will be killed. In order to make sure that Tensorflow only uses the assigned number of cores, the inter_op_parallelism parameter should always have the value of 1 and intra_op_parallelism_threads should be equal to 1 less than the requested number of cores. See the example below for a way to do this automatically.

Request at least 2 cores

As a corollary to the settings for intra_op_parallelism_threads always request at least 2 cores for Tensorflow jobs. The number of cores requested is set with the -pe omp N flag. When requesting multiple cores the value for the gpus flag is the number of GPUs divided by the number of cores:

# interactive job with 2 cores and 1 GPU
qrsh -l gpus=0.5 -l gpu_c=3.5 -pe omp 2
#queue job with 8 cores and 1 GPU
qsub -l gpus=0.125 -l gpu_c=3.5 -pe omp 8 script.qsub  


The following is an example Python code that properly configures the Tensorflow Session object for running on the SCC. A function called get_n_cores() is defined to read the NSLOTS variable from the environment for proper setting of intra_op_parallelism_threads:


# Saved as matmul.py
import os
import tensorflow as tf

# Get the assiged number of cores for this job. This is stored in
# the NSLOTS variable, If NSLOTS is not defined throw an exception.
def get_n_cores():
  nslots = os.getenv('NSLOTS')
  if nslots is not None:
    return int(nslots)
  raise ValueError('Environment variable NSLOTS is not defined.')

# Now start the Tensorflow code...
with tf.device('/gpu:0'):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
# If an op is not assigned to a device then Tensorflow will pick one, which is
# typically the GPU if one is available.
c = tf.matmul(a, b)

# with allow_soft_placement this will work if the assigned GPU is not 
# gpu:0 or if run on a node without a GPU
# Create the configuration for the Session.
# Note that if only 1 core was requested then this code will not work.
session_conf = tf.ConfigProto(
      intra_op_parallelism_threads=get_n_cores()-1,
      inter_op_parallelism_threads=1,
      allow_soft_placement=True, 
      log_device_placement=True)

sess = tf.Session(config=session_conf)
# Runs the op.
print(sess.run(c))


Example queue submission script

This is an example queue submission script that runs the above Python code. It is saved as test_t.qsub:


#!/bin/bash -l

# Request 4 cores. This will set NSLOTS=4
#$ -pe omp 4
# Request 1 GPU
#$ -l gpus=0.25
# Request at least compute capability 3.5
#$ -l gpu_c=3.5

# Give the job a name
#$ -N TF_test

# load modules
module load python/3.6.2
module load tensorflow/r1.10

# Run the test script
python matmul.py

Keras with the Tensorflow backend

By default Keras will keep Tensorflow limited to a single core which does not result in any issues with the SCC queue. If multiple cores are desired, the following code can be used to configure the Tensorflow session for the Keras backend to take advantage of multiple cores. This is included in the example file test_keras.py.


import tensorflow as tf
import keras.backend.tensorflow_backend as ktf
import os
import sys

# Get the number of cores assigned to this job.
def get_n_cores():
    # On a login node run Python with:
    # export NSLOTS=4
    # python mycode.py
    #
    nslots = os.getenv('NSLOTS')
    if nslots is not None:
      return int(nslots)
    raise ValueError('Environment variable NSLOTS is not defined.')

# Get the Tensorflow backend session.
def get_session():
    try:
        nthreads = get_n_cores() - 1
        if nthreads >= 1:
            session_conf = tf.ConfigProto(
                intra_op_parallelism_threads=nthreads,
                inter_op_parallelism_threads=1,
                allow_soft_placement=True)
            return tf.Session(config=session_conf)
    except: 
        sys.stderr.write('NSLOTS is not set, using default Tensorflow session.\n')
        sys.stderr.flush()
    return ktf.get_session()

# Assign the configured Tensorflow session to keras
ktf.set_session(get_session()) 
# Rest of your Keras script starts here....

Multiple GPUs

It is possible to use multiple GPUs with Tensorflow. In many cases a Tensorflow run will not fully utilize a single GPU, so before requesting multiples you should check the GPU utilization. Requesting the use of multiple GPUs frequently results in little to no benefit to your program's runtime.

Here's the process:

Contact Information

Help: help@scc.bu.edu

Note: RCS example programs are provided "as is" without any warranty of any kind. The user assumes the entire risk of quality, performance, and repair of any defect. You are welcome to copy and modify any of the given examples for your own use.