ELT with Pig: Managing Data Transformations in "Load First" Environments

ITPro Today

September 14, 2015

ELT with Pig: Managing Data Transformations in "Load First" Environments

Speaker: Joshua Fennessy One of the key distinctions of so-called data lake (or enterprise data hub) scenarios is that data is first loaded to the environment, then transformed into useful information. This is often opposite of traditional EDW systems where the data must be transformed before it's loaded. In this Hadoop-focused session, we'll use an Apache project called Pig to load unformatted data and wrangle it into a fashion that can be used by data architects who might be building an EDW. We'll look at several different types of data and a few common scenarios and methods of cleansing that will apply to many different projects. Practical code examples will be shared and explained that will empower Hadoop developers from beginner to advanced and allow them to make some sense of all the raw data living in HDFS.

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like