Supercomputers in the Cloud Erode another Case for Owning Data Centers
Critical Thinking: With HPC stalwarts such as Cray now embracing cloud, on-premises compute is becoming harder to justify.
October 26, 2017
Critical Thinking is a weekly column on innovation in data center infrastructure design and management. More about the column and the author here.
It’s an accepted trend that enterprise-owned facilities are shrinking as a percentage of total data center capacity. Colocation and cloud account for an increasing proportion of workloads, with more than one-third expected to be in the public cloud by 2019, according to 451 Research.
Enterprises that continue to build new facilities often cite cost, security, and data privacy as justifications for investment. But the need for specialized IT, including high-performance compute (HPC), also frequently plays a part. For example, Ford recently announced that it was investing more than $200 million to build a new data center in Flat Rock, Michigan. One of the reasons Ford is building its own facilities is to deal with the data deluge it expects it will face as a result of growth in highly connected and autonomous vehicles. A good proportion of the compute at the sites will be dedicated to HPC and big data analytics, which Ford believes it can manage more economically on its own.
But the financial viability of Ford’s on-premises investment is also partly due to the low energy rate it can to negotiate because it buys energy at massive scale for its adjacent manufacturing business. For other organizations, the cost and complexity of HPC may have them looking for other options, including cloud.
Intel delivered one of the more notable talks at this year’s ISC High Performance event in Frankfurt, Germany. Dr. Rajeeb Hazra, corporate VP, data center group, and general manager, Intel, predicted that up to 30 percent of high-performance compute CPUs will be in cloud data centers by 2021. Two years ago, the chipmaker suggested that it would be just 15 percent during the same timeframe.
The main reason for this rise, according to Intel, is the dramatic growth expected in AI and deep learning workloads. Intel’s message was simultaneously a boon and a potential slap in the face for the traditional HPC and supercomputing community. The good news is that AI and deep learning will make HPC-like applications more ubiquitous; HPC will no longer be an engineering-heavy sideshow estranged from the mainstream compute business. The downside is that a good proportion of those workloads won’t be going into traditional scientific and industry-owned HPC facilities. Instead, they are likely to end up in cloud data centers, likely operated by a small group of public cloud service providers.
Other prognosticators are less bullish about how much HPC will move to the cloud and by when. But they seem to have reached similar conclusions to Intel’s in terms of direction of travel. According to Hyperion Research, less than 10 percent of HPC workloads are currently run in public clouds. The group says that public clouds are cost-effective for some jobs but up to ten times more expensive for others. But the group also reported that up to 64 percent of HPC sites already run some jobs in public clouds compared with just 13 percent in 2011.
Alphabet’s Google, Amazon, and Microsoft have been keen to get a slice of the new HPC demand, investing heavily in infrastructure and software to support it for some time. For example, Microsoft Azure recently acquired HPC software specialist Cycle Computing to complement the investment it has already made in bulking up its infrastructure for HPC-like applications, introducing support for InfiniBand networking and offering GPUs as a cloud service. Google Cloud Platform and Amazon Web Services also have evolved HPC services. Some smaller HPC-focused providers have emerged as well, including Rescale, Nimbix, and Cirrascale.
Microsoft made another announcement around HPC this week together with supercomputing stalwart Cray. Cray will now enable customers to access its flagship supercomputing systems – the Cray XC series and Cray CS series – via Azure, integrated with the cloud network and services. Cray is positioning the partnership in the context of “a whole new class of customers” that will use its supercomputing capabilities for machine and deep learning applications.
Cray has to find new customers as its existing base and revenue have declined recently. According to its 2016 annual report, total revenue decreased by $94.9 million in 2016 compared to 2015, from $724.7 million to $629.8 million. The company attributed this decline to a slowdown in the traditional HPC market.
The deal Cray has struck with Microsoft gives the supercomputer maker some flexibility. Rather than having its systems simply assimilated into the Azure Cloud, Cray is providing them for “customer-specific provisioning.” In other words, the machines will effectively be hosted in Microsoft’s data centers and available on a case-by-case basis. Cray says this colocation model enables customers to get the most from its supercomputers without the performance and access issues from shared compute resources.
This is not the first cloud initiative Cray has embarked on. Earlier this year it announced a partnership with data center services provider Markley. The companies described the model as “supercomputing as a service,” enabling users to share access to a Cray Urika-GX big data appliance. Again, although the organizations describe it as a cloud model, each system is effectively only available to one end-user organization at a time.
But while Cray may be hedging its bets somewhat as it engages with cloud providers, it is still engaging. If even the most specialist high-end supercomputers are now available via public cloud, then another justification for enterprise-owned compute looks more precarious.
A lot of companies will continue to take a hybrid approach but the distribution across cloud, colo, and enterprise will continue to shift. The tipping point for organizations where public cloud is the default and enterprise-owned compute is the exception can’t be too far in the future.
Read more about:
Data Center KnowledgeAbout the Author
You May Also Like