Is SONiC, the Open Source Network OS, Ready for Mainstream?
Key features features being added and support from traditional vendors growing, the disaggregated-network OS may be ready to grow beyond hyperscale platforms.
Asking recently to join the Linux security mailing list, one of the Linux distributions Microsoft highlighted was SONiC, or Software for Open Networking in the Cloud.
The Microsoft-born open source cloud-scale network operating system, built on the SAI switch-programming API (SAI stands for Switch Abstraction Interface) is one of the brightest-shining stars in the open networking movement that’s been spreading beyond hyperscale data center operators and into the worlds of telcos, large enterprises, and someday maybe even small and medium-size businesses.
Using its purchasing might, Microsoft has induced networking hardware vendors to support SONiC. Nearly 70 percent of hardware platforms now support the OS, includes Arm-based systems. First adopters other than Microsoft have been other hyperscalers, such as Alibaba and Tencent, but that circle is staring to widen as more organizations look for the benefits of network disaggregation. Adding fuel to the momentum are introduction of new features to the open source project, such as configuration management and the integration of Kubernetes, and the spread of commercial and enterprise support by traditional enterprise-networking vendors.
Open, software-driven networking is growing. According to analysts at the 650 Group, it will be a $1.35 billion market by 2023 (not counting the hyperscalers), growing 33 percent per year on average between now and then. The broad market trends and activity around the SONiC project position it well for soon becoming a mainstream technology.
Owning the Stack
LinkedIn started using SONiC before Microsoft acquired the social networking company in 2016. Its data centers may have been organized differently from Microsoft’s Azure infrastructure, but they had similar issues, and LinkedIn had expertise on hand to use SONiC to solve its issues.
“We had the vision of building our own networking OS, and we really want to own the whole stack to take advantage of ASIC capabilities to run our own applications”, LinkedIn software engineer Zhengen Xu said at this year’s OCP Global Summit.
Owning the stack would accelerate debugging, help control risk, and save on switch hardware. The team wanted a Linux-based system, which would enable them to use Linux debugging tools, and support for containers so they could pick the best components available. Linux being platform-agnostic, moving to a different device or a different vendor wouldn’t have to require changing the software stack.
“When we're running a multi-vendor environment, we are limited to the common denominator across the vendors in order to be able to support the features that we want to do,” added Shawn Zandi, LinkedIn’s principal architect for infrastructure engineering.
Microsoft, it turned out, wanted all the same things when creating SONiC. “The SONiC philosophy fits our philosophy very well,” Xu said.
Disaggregation Improves Availability
One of the reasons Microsoft built the OS was to improve its network’s resiliency.
“We were finding that the firmware the software that runs on the switches in our network was getting in our way in the case of reliability and availability,” Microsoft distinguished engineer David Maltz, who runs the Azure physical network team, explained in an OCP session.
Azure uses network hardware from multiple vendors, but even a single vendor may supply multiple firmware versions. When Microsoft found a bug on one device, it was probably on others too. “We would have to work with each of the other vendors to find and fix that, prolonging the time that our customers could be impacted by it,” Matlz said.
Another example are “gray” failures, where a switch advertises that it’s healthy but is dropping a fraction of packets. Handling these quickly is key for reliability. “Doing that without support from the switch firmware itself was very hard,” Maltz said.
Finally, adding new features to a traditional monolithic network OS would take too long. “Working across multiple companies meant that it could take … months to design, develop, test, and deploy a new feature.” Now that time is down to weeks, according to him.
Shedding the Hyperscale Image
To date, SONiC has been running in the Azure core network, enabling Microsoft to keep legacy 40G switches in an older data center, upgrade to new zero-packet-loss 100G switches in a new data center, and run the same software on both.
But the company is in the process of broadening its use. “We're taking all the roles inside of our data centers and moving those over to SONiC,” Maltz said. “We're extending SONiC to new scenarios like gateways, like our management networks, our wide-area networks, and others.” In the future, SONiC will support AI clusters, gaming, streaming, and other new use cases.
The OS is starting to appeal to service providers and Software-as-a-Service companies, especially with integrations like the ones done by Big Switch Networks. Big Switch has integrated SONiC with its own Open Network Linux (ONL) to create a commodity NOS stack designed for multi-tier L3 BGP fabrics, complete with configuration automation and monitoring through Ansible and Big Switch’s SDN controller.
The going is slower in the enterprise space, where SONiC’s flexibility – its core strength – might hold back adoption by enterprises, even ones already putting effort into Kubernetes and service meshes, Roy Illsley, a distinguished analyst at Ovum, warned.
While it may look like “cool tech to break lock-in, it is needing some big use cases to demonstrate it is real and represents a future direction, and is not just another open source goo concept that lacks any applicability to an enterprise that does not have an army of people to work on it,” Illsley told us.
SONiC is “the first solution to break monolithic switch software into multiple containerized components,” and before they adopt it more widely enterprises need more time “to understand all the elements of manging, securing, protecting, and operating containers at scale,” he said.
Hyperscale and Enterprise Network Needs Converge
One of the major contributors to the SONiC project is Mellanox, the networking vendor Nvidia earlier this year agreed to acquire for $7 billion. Mellanox uses Microsoft’s OS on its Spectrum switches and ConnectX NICs to make connecting non-virtualized data centers to the Azure cloud simpler.
John Kim, Mellanox’s director of storage marketing, said he thinks SONiC is ready to move beyond the cloud market and into the enterprise space. “It’s a combination of both the networking OS becoming more suited for enterprise and the enterprises becoming more ready for open Ethernet networking solutions like SONiC,” he told Data Center Knowledge.
Tier-two cloud vendors and very large enterprises that have strong networking expertise because it’s critical to their business are already adopting SONiC, Kim said, while telcos and enterprises a size below the very large ones are considering it.
Traditionally different, enterprise network needs and cloud network needs “have been moving closer together.” Seeing enterprises use more multi-cloud and hybrid cloud architectures, hyperscalers have been adding more traditional enterprise-networking features, he explained. At the same time, enterprises are switching to “a more cloud-like networking model, running fewer protocols and building large leaf-spine networks that scale to support large numbers of servers with virtualization and containers.”
SONiC started out mainly as software for top-of-rack switches (ToRs) but can now also support leaf and spine switches. Support for “superspine” switches is on the roadmap. Kim also pointed to recent container-based upgrades to SONiC, including VLAN Trunking, Virtual Routing and Forwarding (VRF), and RoCE (RDMA over Converged Ethernet), all of which can be added without changing other functions such as BGP, IPv6, or SNMP. “This gives data center architects the flexibility to deploy the networking features they want exactly where and when they want them.”
Enterprises Want Traditional Support
SONiC may have the features enterprises need, the self-service, community-supported approach taken by Alibaba, LinkedIn, and Tencent isn’t something most of them want to take. Instead, they look to familiar vendors like Mellanox; Juniper, which added native SONiC support this year; Dell EMC, which recently started offering commercial support for SONiC to customers other than Microsoft; and Apstra, which integrated SONiC with its intent-based networking last year. Dell EMC says it has already done several trials with Fortune 100 companies in the service provider, financial services, and web services markets.
Apstra supports SONiC in mixed network vendor environments which include traditional hardware vendors, open source, and white box network hardware. The company sees growing interest in the enterprise and is already involved in several customer engagements, Mike Wood, its chief marketing officer, told us.
Adding enterprise-class management capabilities with automation and analytics to a system that previously required a high level of expertise will help adoption, Wood said. Apstra is seeing most of the traction for SONiC in North America and Asia Pacific, driven largely by enterprise digital-transformation initiatives, Internet of Things, and the promise of cost savings, according to him.
“We anticipate that soon enterprises will be able to buy modern 25/50/100/200 Gigabit Ethernet switches with the SONiC NOS installed and get support from their switch vendor, just as Red Hat offers support for Linux, and Cloudera/Hortonworks offers support for Hadoop,” Kim predicted. “Increasing numbers of switch vendors are whole-heartedly supporting SAI and SONiC, and both vendors and customers are contributing substantially to the code and the community.”
SONiC, “is changing the way customers and vendors look at networking.”
Read more about:
Data Center KnowledgeAbout the Authors
You May Also Like