Deep Learning on Rescale
Deep Learning is a sub-field of machine learning that focuses on predictive models that have large numbers of parameters, typically organized as a layered computational graph. It is fast becoming the preferred model choice for large datasets with samples that have many features.
Rescale provides GPU-based HPC nodes and clusters for training deep learning models in the cloud. Rescale supports batch training of models as well as interactive data analysis through Rescale Desktops. A wide variety of GPU configurations are available from lower cost previous-generation K80s to the latest multi-GPU P100s with NVLink interconnect. Clusters can be preconfigured with your choice from the most popular deep learning frameworks.
In this page, we will present to you different Rescale job examples for four different applications. Click on the Import Job Setup button to clone an example job into your account, which you can then submit. Click on the Get Job Results button to review the full setup and results of a completed example.
For more information on how to set up and submit a Basic Job, please refer to the tutorial here.
For more information on how to set up and launch a Desktop Session, please refer to the tutorial here.
Supported Frameworks and Applications
TensorFlow is the popular open source C++ and Python framework for high-performance computation over dataflow graphs. It is typically used to train deep neural network models and then use them to perform inference.
Here are some benchmark results comparing Rescale’s NVLinked P100 system (Rescale Amethyst) with the high-performance DGX-1 deep learning server. We see that Rescale can achieve comparable performance to high-end on-premises GPU servers.
MNIST Softmax Regression Example
Inception V3 Example
The second example is Tensorflow's image recognition model, Inception V3. This job corresponds to the benchmark results in the table above using 4 P100s.
Here is an example training a classification model on the classic MNIST handwritten digit dataset. We will train a simple multi-layer perceptron model using this input training script.
Super Resolution Example
The first example is training and then using a model to perform “super-resolution” to scale up an image while minimizing noise.
Super Resolution image
Next is an example of training a Deep Convolutional Generative Adversarial Network (DCGAN) which generates new realistic fake images that are similar to the input training images. This example is trained using Rescale’s Amethyst NVLink-connected P100 systems on the LSUN bedroom image dataset.
Synthesized fake images
LSTM DOE Example
So far examples have focused on training a single model for a task with a set of parameters selected by the user building the model. The example here is now a Design of Experiments (DOE) which is a sensitivity analysis for one or more parameters. Here is an example using the Rescale DOE framework to build many models, randomly sampling hyper-parameters that define the model. This case builds a LSTM model to perform word-level modeling.
In this case, we are doing a Monte Carlo sampling of the following LSTM model parameters:
embed _size and
n_hidden parameters impact how many nodes are in the network. The
dropout parameter is used to control overfitting to input data. Finally,
batch_size determines how many examples we train one at a time.
Singularity Container Example
Singularity containers are a tool for packaging up applications and running them on various host systems reproducibly. Singularity can import most Docker containers without issue and can be easily deployed as a user application that can run without any administrative privileges.
As of version 2.3, Singularity supports running containers that also use GPUs running CUDA applications, making it a useful choice for running packaged deep learning jobs.
The “--nv” flag in the command line above instructs Singularity to pass through the host GPU interface to the container, enabling CUDA applications to run inside. This particular example runs the TensorFlow CNN benchmarks in a container on one or more GPUs.