Improving Hive Performance
Similar to SQL Server, Hive has a number of performance-related features.
October 16, 2014
Similar to SQL Server, Hive has a number of performance-related features. Hive supports indexes and, like the clustered ColumnStore index introduced in SQL Server 2014, an optimized table format called Optimized Row Columnar (ORC). Note: the use of ORC likely means abandoning the external table format—and explicitly loading data into an ORC designated table.
Hive also supports table partitioning. Partitions may be applied to external tables—you need to store the files in subfolders and then issue ALTER TABLE statements after creating the table. For example, we can store each state QWI file in a specific subfolder, and then add partitions as follows:
ALTER TABLE censusdb.qwi2 ADD PARTITION (state = ‘TX') LOCATION '/user/hadoop/censusqwi/TX';
After the partition is created, a new column (e.g., state) is added to the table schema and is available for use in queries.
Main article: Integrating Hadoop with SQL Server
About the Author
You May Also Like