2023 SRE Report Identifies Site Reliability Engineering Best Practices

The SRE report doesn't see AIOps as the solution for ITOps — and tool sprawl apparently isn't a terrible thing after all.

Sean Michael Kerner, Contributor

November 9, 2022

4 Min Read
best practices button on keyboard
Alamy

The practice of site reliability engineering (SRE) has become increasingly central to IT operations in recent years.

SRE is all about having the right tools and processes in place to ensure the reliability and resilience of the applications and services that IT operations deliver and support. According to the 2023 SRE Report put together by SRE vendors Catchpoint and Blameless, there are a lot of different tools that IT operations teams can use — and that's not a bad thing. The report found that 54% of organizations use three or more tools to get telemetry from their operations, including application network and infrastructure resources.

Also of note, the SRE report found that about 46% of organizations said they get no or little value from AIOps tools.

"The low value received from AIOps was not a surprise to me, but it may be a surprise to some readers," Leo Vasiliou, director of product marketing at Catchpoint, told ITPro Today. "We did caution to not ignore underlying AIOps capabilities, but we did caution to ignore the hype as people consider those capabilities as part of larger observability implementations."

Related: What Does the Future Hold for Role of SRE?

The Challenge of SRE Tool Sprawl

Across multiple segments of IT operations, IT sprawl is often identified as a primary challenge. For example, the recent GitLab DevSecOps survey and one from ESG and Mezmo both identified tool sprawl as a challenge, specifically for DevSecOps.

Practitioners need different tools to accomplish different tasks at different points in time — and as long as the value received from tools in the stack is greater than their cost, then there is no tool sprawl problem, Vasiliou said.

"When it comes to skilled labor, or operations perhaps, you want teams to be able to reach for the right tool at the right time, not to be impeded by earlier decisions about what they think they might need in the future."

— Steve McGhee, reliability advocate, SRE, Google Cloud

In the report's conclusions, Steve McGhee, reliability advocate, SRE, for Google Cloud, wrote that when an individual goes to a mechanic, they don't look for the place with the fewest tools on the wall.

McGhee suggests that SREs should not be forced to rationalize every tool in an attempt to prevent overlap.

"When it comes to skilled labor, or operations perhaps, you want teams to be able to reach for the right tool at the right time, not to be impeded by earlier decisions about what they think they might need in the future," McGhee wrote.

Identifying the Top SRE Challenges

Vasiliou said the biggest challenges, as listed empirically, from this year's report are:

  • finding talent

  • complex architectures

  • realizing business value

  • lack of end-to-end visibility

  • and alignment or prioritization

The challenge of addressing the top issues largely concerns managing bias and predisposition, he said.

Related:Why Site Reliability Engineering Is Key to Modern DevOps

"Too many other research papers unidirectionally say, SREs/IT need to add business value," Vasiliou said. "Saying SREs/IT need to add business value is nefarious nothingness and does not help SREs/IT know which speeds and feeds are important."

On the other hand, he noted that SREs need to also know that geeking out over speeds and feeds does not help executives understand why they are valuable. The bridging of this gap has to do with new or better conversations around capabilities, which are an important middle ground between both ends of the spectrum.

"The only way to have conversations around required capabilities will be if involved parties let go of their bias; otherwise, the IT to business gap will always remain," he said.

About the Author

Sean Michael Kerner

Contributor

Sean Michael Kerner is an IT consultant, technology enthusiast and tinkerer. He consults to industry and media organizations on technology issues.

https://www.linkedin.com/in/seanmkerner/

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like