From the blog

MPI Benchmarking on Google Compute Platform Revisit

Three years ago we visited Google’s IaaS service – Google Compute Engine (GCE) for its networking performance and Ryan posted the results in his blog post. Back then, the conclusion was that GCE instances were more suitable for a typical workload of hosting web services but there was still performance tuning space for HPC applications. Recently, we revisited the GCE’s instances with their latest offering again.

Benchmark Tools
To make the results somewhat comparable with the old ones, we’re still using the OSU Micro Benchmarks but with the latest version 5.3.2. And among all the benchmarking tools being offered, we pick two most critical ones: osu_latency for latency test and osu_bibw for bidirectional bandwidth test.

Test Environment
Operating System: Debian GNU/Linux 8 (Jessie)
MPI Flavor: MPICH3

Test Instances
Since we are testing the interconnection performance between VM instances, we want to make sure the VM instances we launched are actually sitting on different physical hosts so the traffic actually goes through the underlying network but not the host machine’s memory.
So we picked the biggest instance of each series:
n1-standard-32, n1-highmem-32 and n-highcpu-32

Test Results
For latency (in microseconds):

Instance Type Trial #1 Trial #2 Trial #3 Average
n1-standard-32 45.68 47.03 48.46 47.06
n1-highmem-32 43.17 43.08 36.87 41.04
n1-highcpu-32 47.11 48.51 48.17 47.93

(size: 0-bytes)
For bidirectional bandwidth: (MB/s)

Instance Type Trial #1 Trial #2 Trial #3 Average
n1-standard-32 808.28 864.91 872.36 848.52
n1-highmem-32 1096.35 1077.33 1055.2 1076.29
n1-highcpu-32 847.68 791.16 900.32 846.39

(size: 1,048,576-bytes)

Summary of Results
For the network latency, we can see the average is around 40 ~ 45 microseconds, which is 4x faster than the previous result – around 180 microseconds. And the new latency is fairly consistent among other smaller instance types.

For bandwidth, we don’t have a previous result to compare to but among all the GCE instance types, we found n1-highmem-32 has the best performance which can be as high as 1070 MB/s. This result aligns with GCE’s official document

Related articles

Introducing Amber for Molecular Dynamics Simulations on Rescale

Rescale is pleased to announce the availability and latest release of the Amber (2014) molecular dynamics and modeling suite for CPUs and GPUs. I will go over the process for requesting and receiving access to run your Amber simulations. Then, […]

read more »

Software's Key Role in Getting the Most Out of Cloud's Computational Power

Much focus is directed towards hardware when it comes to parallel performance scalability of solving massive engineering problems. When distributing a large partitioned CFD or FEA model over many nodes, the lower latency network becomes exponentially more important as the […]

read more »