Google Dumps MapReduce in Favor of New Hyper-Scale Analytics System update from June 2014Google Dumps MapReduce in Favor of New Hyper-Scale Analytics System update from June 2014

Says old distributed computing system does not handle petabyte-scale analytics well enough

June 25, 2014

2 Min Read

Google Dumps MapReduce in Favor of New Hyper-Scale Analytics System

Urs Hölzle, Google’s senior VP of technical infrastructure at Google, at 2014 Google I/O conference in San Francisco.

Google has abandoned MapReduce, the system for running data analytics jobs spread across many servers the company developed and later open sourced, in favor of a new cloud analytics system it has built called Cloud Dataflow.

MapReduce has been a highly popular infrastructure and programming model for doing parallelized distributed computing on server clusters. It is the basis of Apache Hadoop, the Big Data infrastructure platform that has enjoyed widespread deployment and become core of many companies’ commercial products.

The technology is unable to handle the amounts of data Google wants to analyze these days, however. Urs Hölzle, senior vice president of technical infrastructure at the Mountain View, California-based giant, said it got too cumbersome once the size of the data reached a few petabytes.

“We don’t really use MapReduce anymore,” Hölzle said in his keynote presentation at the Google I/O conference in San Francisco Wednesday. The company stopped using the system “years ago.”

Cloud Dataflow, which Google will also offer as a service for developers using its cloud platform, does not have the scaling restrictions of MapReduce.

“Cloud Dataflow is the result of over a decade of experience in analytics,” Hölzle said. “It will run faster and scale better than pretty much any other system out there.”

It is a fully managed service that is automatically optimized, deployed, managed and scaled. It enables developers to easily create complex pipelines using unified programming for both batch and streaming services, he said.

All these characteristics address what Google thinks does not work in MapReduce: it is hard to ingest data rapidly, it requires a lot of different technology, batch and streaming are unrelated, and deployment and operation of MapReduce clusters is always required.

Hölzle announced other new services on Google’s cloud platform at the show:

Cloud Save is an API that enables an application to save an individual user’s data in the cloud or elsewhere and use it without requiring any server-side coding. Users of Google’s Platform-as-a-Service offering App Engine and Infrastructure-as-a-Service offering Compute Engine can build apps using this feature.
Cloud Debugging makes it easier to sift through lines of code deployed across many servers in the cloud to identify software bugs.
Cloud Tracing provides latency statistics across different groups (latency of database service calls for example) and provides analysis reports.
Cloud Monitoring is an intelligent monitoring system that is a result of integration with Stackdriver, a cloud monitoring startup Google bought in May. The feature monitors cloud infrastructure resources, such as disks and virtual machines, as well as service levels for Google’s services as well as more than a dozen non-Google open source packages.

About the Author

Data Center Knowledge

Data Center Knowledge, a sister site to ITPro Today, is a leading online source of daily news and analysis about the data center industry. Areas of coverage include power and cooling technology, processor and server architecture, networks, storage, the colocation industry, data center company stocks, cloud, the modern hyper-scale data center space, edge computing, infrastructure for machine learning, and virtual and augmented reality. Each month, hundreds of thousands of data center professionals (C-level, business, IT and facilities decision-makers) turn to DCK to help them develop data center strategies and/or design, build and manage world-class data centers. These buyers and decision-makers rely on DCK as a trusted source of breaking news and expertise on these specialized facilities.

See more from Data Center Knowledge

Related Topics

Recent in Cloud

Related Topics

Recent in OS

Related Topics

Recent in IT Mgmt

Related Topics

Recent in Career

Related Topics

Recent in Storage

Related Topics

Recent in Security

Related Topics

Recent in Dev

Related Topics

Recent in DX

Related Topics

Recent in Infrastructure

Related Topics

Google Dumps MapReduce in Favor of New Hyper-Scale Analytics System update from June 2014Google Dumps MapReduce in Favor of New Hyper-Scale Analytics System update from June 2014

About the Author

Editor's Choice

Featured Technical Explainers

Recent What Is