A Look at Royal Bank of Canada's Homegrown GPU Farm
Royal Bank of Canada's use of IT already put it ahead of the curve compared to other banks – and then it decided to build its own AI-focused GPU farm.
September 10, 2020
When Royal Bank of Canada made news about a month ago for firing up its own on-premises GPU farm for artificial intelligence and machine learning purposes, it put Canada's largest bank ahead of the curve in fintech, which for years has been seen as overly cautious and slow to change.
This also wasn't a typical upgrade, which usually would involve purchasing some ready-to-rack GPU hardware to augment already in place servers, and then buying off-the-shelf AI/ML software to get the ball rolling. Instead, RBC bought GPUs directly from Nvidia, built their own GPU servers, and began developing its own AI/ML software in-house to meet its own unique needs.
Going from a legacy monolithic environment to building its own GPU-focused data center didn't happen in a single step. When RBC decided to get into the GPU farm AI/ML arena, it had already made many of the moves that made this next step possible. The company had already containerized much of its workloads and built its own private cloud using Red Hat's OpenShift.
One Step at a Time
In 2016, the company began exploring the potential of artificial intelligence when it hired Foteini Agrafioti to head the company's newly formed Borealis AI project as RBC's chief science officer.
"I think we're pretty much leading edge there," said Mike Tardif, RBC's senior vice president of global technology infrastructure, speaking of the company's AI/ML efforts leading into this latest GPU farm project. "Foteini's team has had some pretty good requirements for GPUs, next generation computing and that stuff, so we've done two or three different implementations over the last few years."
The next step, according to Tardif, was to get "more elasticity in our GPUs," which do most of the heavy lifting in AI. RBC figured this was best accomplished by building custom-designed GPU servers and to incorporate the cloud native containerized technology the bank was already using into their AI workloads.
"We're a pretty big OpenShift customer and we think containers are the future," he said, adding that he made building the GPU farm contingent on container use.
"If we deliver this platform and you are willing to move to a container and microservices strategy, we'd be able to keep up with your demand for more and more processing power," he told Agrafioti and her team. "We would be able to keep up with it in a more elastic and frictionless way by being able to incrementally grow if we built this GPU farm with OpenShift and containers."
Working With Partners
After the decision was made to go forward with the project, RBC didn't go it alone. It enlisted the expertise of Nvidia, Red Hat and Pure Storage, which supplies RBC's flash storage.
"At a science level, we're actually very connected with Nvidia," Agrafioti said. "They are extremely active in machine learning research as well, so we make sure we're very connected with their teams.
"They are doing research that showcases how they're leveraging their own hardware," she added. "We're learning from that. We're looking at how we can do the same with infrastructure we now have in house as well. So it's just a great partnership in regards that we talk about what's coming next, what changes are happening in the hardware, and how we can leverage those so we prepare for it."
From an infrastructure point of view it's been great for RBC, Tardif added. "We did find a bug between their processor and Red Hat Linux, but it was just kind of fun in a way, and interesting, because we actually were doing something other people weren't doing yet."
Tardif said the bug involved RHEL's kernel crashing because of excessive load being placed on it by the GPUs.
"We were just exhausting the full amount of the capability, which was a fairly straightforward fix for Red Hat to do," he said. "On one hand you never want to run into bugs, but it does mean you're out there being the first doing something. They did a hot fix for us. They were lockstep with us, which was good."
Tardif said that Red Hat had already been working with RBC for about three years by this time, having been brought in for one of the bank's first on-prem clouds and for their move to containers. Working with Nvidia for the GPU farm was a new experience, however.
"Internal IT wasn't used to dealing with Nvidia because they've always just been a chip provider," he said. "Our first few implementations of Nvidia chips were with Lenovo, Intel boxes and IBM boxes – the normal vendors that we do business with. Going directly with Nvidia as a computer and software supplier was new and it worked pretty well. The partnership of us; Red Hat, who we've always had a good tight relationship with; and now Nvidia was pretty good. No complaints there."
Up And Running With an Eye on the Future
Agrafioti said that the data center is being harnessed for various AI projects, including a language processing project that performs real-time analysis of text from news articles and blogs for the company's analysts and financial advisors, as well as for developing fraud models.
"You can then launch them and run them live so that we can detect fraud behind products like credit cards," she said. "We're also working on trading. We're looking into models that can make autonomous buy/sell decisions in the market by modeling market conditions and finding the right time to execute client orders. These are also very complex environments and there's a lot of data in the markets that are extremely complex as well."
She also said that the new GPU-based hardware can run some workloads in 20 minutes that in the past would've taken a day.
So far, what's missing from the equation are public clouds.
"We're a bank; we're cautious," Tardif said. "We're using [the public cloud] just for non-critical data, for cheap CPU if capital markets want to do risk calcs or something they can spin up and spin down. But we're growing. We know there's advantages to public cloud and even in this space with Foteini there'll probably be a point where it makes sense to have GPUs and public cloud."
About the Author
You May Also Like