FanGraphs Tags a Cloud Database to Keep Up with the Big Show

Website for baseball analysis turned to MariaDB SkySQL as it looks to take on more game data from domestic and international sources.

Los Angeles Dodgers vs Oakland Athletics at RingCentral Coliseum on April 06, 2021, in Oakland, California
Getty Images

Baseball data analysis website FanGraphs adopted the MariaDB SkySQL cloud database recently to work with fluctuating and ever-growing information coming out of the sport. FanGraphs, which gathers granular data including the velocity of pitches thrown during games, is using the cloud database to process statistics, complex queries, projections, and models of playoff odds.

“Anything that’s baseball, we’re taking a look at,” says David Appelman, CEO and founder of FanGraphs.

Now that the 2021 season of Major League Baseball is underway, he says there is new Statcast data introduced by the league that must be accommodated. “The data can be pretty wide,” Appelman says. “There’s a lot of records for each individual event that happens in baseball. On a season-level, there’s something in the realm of a million records a season for data for every individual pitch thrown.”

There is also data from minor league teams as well as baseball leagues overseas to be ingested by FanGraphs, he says. “It’s a fairly sizeable amount of data.” FanGraphs tends to run thousands of queries per second on its database to serve its audience, Appelman says. Adding more international data is a priority for FanGraphs, he says, along with more Statcast data from MLB.

Founded in 2005, Appelman says he personally managed the FanGraphs database until 2019. Over the years his company has tried to work with different resources to improve its efficiency with varied results. FanGraphs first migrated to MariaDB about seven years ago, Appelman says, then considered exploring a migration to Linux, but that brought up several potential headaches. “I didn’t want to deal with migration,” he says. “Optimizing the database for Windows is one thing. Optimizing it on a Linux box is a completely different thing.”

Appelman says he did not have time to devote to sort that out while other operations required attention. FanGraphs considered other options, such as moving the database to a turnkey solution. “I looked at Amazon Relational Database Service and Cloud SQL,” he says.

About the time FanGraphs was looking to move and offload all its database administration, Appelman got a tech briefing for MariaDB SkySQL that opened up new possibilities. “It was fast. It seemed it would handle all my needs,” he says.

FanGraphs entered a contract with MariaDB to migrate first to Linux, and then in February of this year migrated to SkySQL. This also led to FanGraphs moving from dedicated servers to the Google Cloud Platform. “We just needed more flexibility,” Appelman says. The infrastructure migration to GCP included app servers and data loading servers.

This was not FanGraphs first attempt at taking advantage of the cloud. In 2017, the company tried to migrate to a smaller cloud provider, Appelman says, trying to match exact resources such as RAM and processing power. “We ran into big problems,” he says. “The next morning, I had to migrate back. What I didn’t quite realize was that with the service I moved to, the hypervisor was causing really bad I/O. The database became this huge bottleneck.”

Appelman says he was also reluctant to move his infrastructure to AWS because of the learning curve he faced with its resources. He needed another option. “GCP fit a nice middle ground,” Appelman says. “I found it a little bit easier to set up than AWS.”

There were still performance questions raised with the move. The migration of FanGraphs from a 4xSSD RAID 10 array in a dedicated machine to the cloud, Appelman says, seemed at first to be a downgrade in raw power. “That doesn’t seem to be the case anymore,” he says. “Things are running great. We had no problems migrating to SkySQL and GCP this time.”

FanGraphs is now considering additional SkySQL resources it might tap into, Appelman says, such as its data warehousing technology. “We need second or low-second or sub-second responses for a lot of our queries,” he says. “We want people to be able to do very fast, ad hoc data analysis. With certain types of MLB data, there’s now a lot more than it used to be -- we’re hoping to take advantage of that to bring our users a lot more granular and customizable analysis without having to wait a while to get the results.” Other resources from SkySQL might be leveraged in the future to run multithreaded, single queries for more efficient processing time, Appelman says.

There are a few wish-list items he wants to explore now that FanGraphs has committed to the cloud. Appelman says he has yet to scratch the surface with GCP’s resources that might be of interest, such as machine learning. So far, he is eager to see continued development of reporting tools on the SkySQL database. “Knowing exactly where the bottlenecks are in our application makes a big difference for me,” Appelman says. “I’ve used some third-party tools to figure out which queries I’ve botched. Having that available in the reporting section would be useful.”

Read more about:

InformationWeek

About the Authors

Joao-Pierre S. Ruth

Senior writer, InformationWeek

Joao-Pierre S. Ruth has spent his career immersed in business and technology journalism. He first covered local industries in New Jersey and later became the New York editor for Xconomy, where he delved into the city's tech startup community. He also freelanced for such outlets as TheStreet, Investopedia and Street Fight. Joao-Pierre earned his bachelor's in English from Rutgers University. 

InformationWeek

InformationWeek, a sister site to ITPro Today, is a trusted source for CIOs and IT leaders seeking comprehensive and authentic coverage of the constantly evolving world of technology and its impact on business. Our experienced and ethical journalists conduct in-depth examinations of crucial issues and the impact of global events on IT operations and strategies, helping forward-thinking executives stay at the forefront of their industries. InformationWeek also provides a platform for enterprise IT leaders and leading tech companies to share their insights and experiences through exclusive interviews, opinion pieces, and events, offering firsthand accounts of strategies, trends, and innovations.

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like