Loading QWI Data from the U.S. Census Bureau into Hadoop
Step-by-step instructions on how to download U.S. Census Burearu Quarterly Workforce Indicator data files into a Hadoop cluster.
October 16, 2014
Quarterly Workforce Indicators (QWI) data can be downloaded from the U.S. Census Bureau, as shown in Figure 1.
My example uses files representing a state level summary of private workforce data by employee sex and age, firm size, and industry group. The direct links for Texas, California, and Nebraska are here:
Note that there are additional, smaller files that describe the various age, firm, and industry group categories. These files could also be downloaded and inserted into Hadoop to represent additional tables. In my example, I simply downloaded these additional files directly into an Excel PowerPivot workbook.
Figure 1: QWI Data Download
Once you have the three .gz files downloaded, you need to get them into your Hadoop cluster. For HDInsight, you'll want to upload the files to an Azure Blob Container within the storage account associated with the cluster. I used a free tool from codeplex, the Azure Storage Explorer, to upload the files (see Figure 2). In a production environment, you would likely use the Azure Storage APIs and/or Power Shell.
Figure 2: Azure Storage Explorer
Using HDP Sandbox
If you are using the HDP Sandbox, you can use the Hadoop command line interface—or you can upload files by using Hue—an included Web interface for Hadoop (Note: Hue is not available for a HDP installation on Windows). Figure 3 shows Hue, accessed from my host machine's browser (the sandbox is running as a guest VM).
Figure 3: Hue
Installed HDP on Windows OS
If you've chosen to install HDP on a Windows operating system, you can use the Hadoop command line to load files into a folder. Figure 4 shows the steps needed to create a folder and then upload the three files.
Figure 4: Hadoop Command Line
Main article: Integrating Hadoop with SQL Server
About the Author
You May Also Like