ELT with Pig: Managing Data Transformations in "Load First" Environments
September 14, 2015
Speaker: Joshua Fennessy One of the key distinctions of so-called data lake (or enterprise data hub) scenarios is that data is first loaded to the environment, then transformed into useful information. This is often opposite of traditional EDW systems where the data must be transformed before it's loaded. In this Hadoop-focused session, we'll use an Apache project called Pig to load unformatted data and wrangle it into a fashion that can be used by data architects who might be building an EDW. We'll look at several different types of data and a few common scenarios and methods of cleansing that will apply to many different projects. Practical code examples will be shared and explained that will empower Hadoop developers from beginner to advanced and allow them to make some sense of all the raw data living in HDFS.
About the Author
You May Also Like