MapR Hadoop Distribution Now Shipping With Apache Drill
Apache Drill allows self-service SQL analytics without requiring pre-defined schema definitions, meaning better real time insights without having to prepare the data
May 19, 2015
Apache Drill 1.0 is now shipping with the MapR Apache Hadoop distribution. Apache Drill is a Schema-free SQL engine for Big Data, which opens up self-service data exploration capabilities to a wider audience.
Drill makes self-service SQL analytics available without requiring pre-defined schema definitions. It eliminates dependence on IT to get schemas ready prior to exploration – the tool works directly on the data without need to process and transform the data to a table-like structure.
The big pillars of big data are massive analytics at scale, real-time data and interactive querying on the data. “What Apache Drill represents is the interactive query,” said Jack Norris, chief marketing officer, MapR. “While there have been different SQL on Hadoop offerings, what makes this so powerful is it’s a schema free engine.”
Drill interacts with data both in legacy transaction systems and new data sources, such as Internet of Things (IoT) sensors, web click-streams and other semi-structured data. It supports popular business intelligence (BI) and data visualization tools.
Data volumes are picking up, and this is becoming a big issue. Customers need to analyze the data, but don’t know what’s in it because the structure needs to be set up. Drill frees up the data for self-service.
“The availability of Apache Drill in the MapR Distribution is a major milestone for the SQL-on-Hadoop project, which is significant in delivering real-time insights from complex data formats without requiring any data preparation," said Matt Aslett, research director, data platforms and analytics, 451 Research in a press release. "Apache Drill is an example of MapR collaborating with others as part of the Apache development process on new technologies to expand the Hadoop portfolio."
Drill also includes granular security and governance controls required for multi-tenant data lakes or enterprise data hubs.
“If you’re going to have data exploration being able to provide security is imperative,” said Norris. “As you look at servicing the broad population and moving it into different use cases, often there’s a security aspect. The ability for granular security control is a big deal. The same file can be accessed by different users, and users can access different portions with different permissions.”
MapR growth has doubled year over year, according Norris. Perhaps more importantly, the company is seeing acceleration in mission critical and real-time deployments, with several customers having multiple use cases on a single cluster.
“That really points to the journey we’ve seen. Many companies start with a cluster for data scientists and experimental use, and then move into production use and real time applications. This growth isn’t about reporting or asking bigger questions, it’s about companies impacting business as it happens. We recognized this at the beginning for production and real-time uses.”
The company rolled out on-demand training for Hadoop earlier this year, which saw over 20,000 participants.
About the Author
You May Also Like