Separation Anxiety: I’m Away From My Cluster

Distractions, noise, kids, interruptions. All part of the modern experience it would seem. Here at Rescale, like others, we have been stuck working from home for 67 days and counting. There are a million reasons it might be challenging. However, we feel strongly that access to your cluster and the software licenses you need to do your job, should not be one of them.
Unfortunately, this seems to be the reality for most companies. In fact, according to Accenture, only 10% of companies have comprehensive plans and resources for situations like this. The reality is that the rest of us are stuck with a cobbled-together distributed work environment. This comes with numerous pains: Software licenses tied to workstations we can’t get to, VPNs that were only designed to support 10-15% of our workforce, slow network interconnect speeds, broken engineering workflows, underpowered home computing, and more.
Unfortunately, this generally adds up to a widespread drop in productivity. In challenging macroeconomic conditions, most companies simply cannot afford this drop. These pressures add up to a dire situation. Without a solution for distributed teams working efficiently, the company risks entering a downward spiral.
The infrastructure that was built to enable the next generation of science, engineering, and technology was built on the assumption that it would largely be done from the office. So what does the next generation of infrastructure look like? How are companies investing to enable the new environment?
Beefing up our VPN
Many companies already have a VPN designed to allow people remote access to their cluster, or sometimes even virtual desktops and more. The problem? Most companies have configured their VPN to handle a very small portion of their company’s workforce at one time.
A number of companies we have talked to have made a plan to roll out improvements to their VPNs — adding support for hundreds of concurrent connections, improving network I/O, and more. However, a multi-layered process like this takes time. So in the meantime, we see IT teams assigning specific scheduled hours of when groups are allowed to access the VPN.

“I have from 9-9:30 AM and 1-1:30 PM to access my cluster, download or upload whatever data I need, or submit a job. If I miss that window I have to wait another day to get to the next step. It’s a huge waste of time,” said one engineer we spoke with.
This might be a strategy for solving this problem one brick at a time, but it will be months before the average engineer sees an improvement in the way they work.
Send our compute-intensive jobs to a third party
Companies feeling the pressure knowing they can’t rapidly improve access to network speeds fast enough to maintain productivity have turned to another option: outsourcing.
These companies take advantage of independent firms whose data centers have better access and ask them to complete simulation and other high compute workloads for them. When the third party has the ability to run a simulation on the exact right software, version, and has the available capacity, this can be a viable option for running jobs when you simply can’t do so within your own infrastructure.
However, working with a separate environment does present some risks. IT teams commonly run into issues when they manage multiple environments without strict and visible controls.
One of our customers told us this story about working with two environments: “The two sites ran simulations on different versions of the same software. Each one had a slightly different calculation for the stretch of a cable. When it came time to assemble our final plane, all the cables from the two versions were about six inches too short. This delayed our project by a year.”
Outsourcing makes sense with a trusted partner and super clear visibility, but in our experience, managing an environment through a person is fertile ground for mistakes that can add up to big delays.
Deploy a cluster in the public cloud
Knowing the timelines, and risks of the aforementioned solutions, some companies we have talked to have opted to move certain workloads to public cloud infrastructure. This option offers flexibility in compute scale, specialized hardware, and connectivity. Essentially, companies have the opportunity to create as many resources as their teams require. The ability to scale is built-in and delivered on demand.
The main challenges companies face when implementing a public cloud instance are operational in nature. This solution is entirely dependent on IT, systems integrators, and more. The design and management of this is usually a larger undertaking than expected.
A quality cloud deployment needs to be able to control, report, and manage individual budgets for teams and projects. It needs to address license hosting and allocation of jobs. It needs to be smart about choosing ideal hardware for any given job to ensure maximizing resources. It needs to intelligently kill clusters when they are not needed. It needs to be built with limits so IT doesn’t get a surprise bill at the end of the quarter.
Even when a system addresses all these issues, it sometimes can’t handle specific types of information because of regulatory requirements. The system is not ITAR, FedRAMP, or even SOC II compliant, which makes it a non-option for certain sensitive workloads.
Investing in a managed platform
Managed platforms allow companies to gain the benefits of a cloud practice as they need to, while solving for all the operational and compliance issues that are present. These systems aren’t really built for the purpose of enabling distributed teams, but they do represent the only systems that were built agnostic to the distribution of teams.
By their very nature, being built on cloud resources, access is inherently remote, making them robust to situations like we find ourselves in today, and making them equally useful when things return to whatever the new normal is.
Specifically, Rescale provides some unique approaches to the challenges that companies face today:

Budget management – Rescale is built for teams to be able to self serve with resources in the cloud. This means it is tailored to allow scale on demand. In order for this to work practically in an organization, it also needs to have the ability for administrators to set specific hard and soft budgets, allocate spend, and even split billing in some cases. This is native functionality on Rescale, and resource allocation is as easy as a few clicks.
Access management – Teams need to be able to access the cluster and data stored on the system. Rescale’s centralized management of user access allows for a secure and controlled way to give people access to the resources they need with ease.
Software license management – With options to bring your own existing licenses, or purchase new licenses, even in some cases on-demand licenses, Rescale offers a variety of ways to help your team get up and running, no matter your current situation. Over 2,000 software versions are already available and installed, just one license key away from running jobs. The system also allows centralized control of versions to avoid mismatching versions between environments.
Security and data management – Rescale is the most secure option when it comes to cloud HPC. As ITAR, FedRAMP, and SOC II compliant, Rescale is built with solid security from day one. Its system uses state-of-the-art encryption and data management protections. Additionally, Rescale gives control to admins to ensure users only have permissions to the data to which they are supposed to have access.
Architecture management – Rescale has developed a system that not only can spin up and kill clusters on every public cloud, but it has the ability to recommend the best hardware, regardless of location. Rescale knows the speed and robustness for every core type in the system and can guide users to the hardware type that provides the best cost, performance, and scalability that they need — Millions of cores, hundreds of core types, intelligently mapped to customer needs.

2020 will forever be the year that our systems and teams were put to the ultimate test. All future discussion around systems will be required to be pandemic-proof to ensure that technology empowers its users and doesn’t restrict them. In what ways do you think HPC will change in the wake of COVID-19?
Looking for ways to optimize remote teams? Join our webinar to learn more and hear applicable solutions from actual engineers. Here are the details.

Tanner Ham

View all posts

Cookie	Duration	Description
AWSALBCORS	7 days	This cookie is managed by Amazon Web Services and is used for load balancing.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
lang	session	LinkedIn sets this cookie to remember a user's language setting.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
player	1 year	Vimeo uses this cookie to save the user's preferences when playing embedded videos from Vimeo.

Cookie	Duration	Description
AWSALB	7 days	AWSALB is an application load balancer cookie set by Amazon Web Services to map the session to the target.
sync_active	never	This cookie is set by Vimeo and contains data on the visitor's video-content preferences, so that the website remembers parameters such as preferred volume or video quality.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_UA-32985745-1	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
_gcl_au	3 months	Provided by Google Tag Manager to experiment advertisement efficiency of websites using their services.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
utm_campaign	past	Google Ad Services sets this cookie to store session campaign value if present.
utm_content	past	This cookie is used for storing the session content value if present.
utm_source	past	This cookie is used to record from where the visitor came to the website orginally. This information is used by the website operator to know the efficiency of their marketing.
utm_term	past	This cookie is used to record from where the visitor came to the website orginally. This information is used by the website operator to know the efficiency of their marketing.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising, after visiting the website.
_mkto_trk	2 years	This cookie, provided by Marketo, has information (such as a unique user ID) that is used to track the user's site usage. The cookies set by Marketo are readable only by Marketo.
fr	3 months	Facebook sets this cookie to show relevant advertisements to users by tracking user behaviour across the web, on sites that have Facebook pixel or Facebook social plugin.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
personalization_id	2 years	Twitter sets this cookie to integrate and share features for social media and also store information about how the user uses the website, for tracking and targeting.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
utm_medium	past	This cookie is used to record from where the visitor came to the website orginally. This information is used by the website operator to know the efficiency of their marketing.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
_chtbl	session	No description available.
_dtses	30 minutes	No description available.
_dtuid	10 years	No description available.
BIGipServersj30web-nginx-app_https	session	No description
email	past	No description available.
gclid	past	No description
handl_ip	1 month	No description available.
handl_landing_page	1 month	No description available.
handl_original_ref	past	No description available.
handl_ref	past	No description available.
handl_url	1 month	No description available.
li_gc	2 years	No description
muc_ads	2 years	No description
username	past	No description available.

Rescale Platform

Overview

HPC & AI Software

HPC & AI Architectures

Security & Compliance

Ecosystem Integrations

Pricing

HPC as a Service

Intelligent Batch

Elastic Cloud Workstation

Storage Fabric

Enterprise Management

Multi-Team Management

Performance Management

Software Publisher

Digital Engineering

AI Physics

Knowledge Management

Computational Pipelines

Author

Similar Posts

Newsletter Sign Up