Improving Hive Performance

Similar to SQL Server, Hive has a number of performance-related features.

Tyler Chessman

October 16, 2014

1 Min Read
bar graph showing improvement

Similar to SQL Server, Hive has a number of performance-related features. Hive supports indexes and, like the clustered ColumnStore index introduced in SQL Server 2014, an optimized table format called Optimized Row Columnar (ORC). Note: the use of ORC likely means abandoning the external table format—and explicitly loading data into an ORC designated table.

Hive also supports table partitioning. Partitions may be applied to external tables—you need to store the files in subfolders and then issue ALTER TABLE statements after creating the table. For example, we can store each state QWI file in a specific subfolder, and then add partitions as follows:

ALTER TABLE censusdb.qwi2 ADD PARTITION (state = ‘TX') LOCATION '/user/hadoop/censusqwi/TX';

After the partition is created, a new column (e.g., state) is added to the table schema and is available for use in queries.

Main article: Integrating Hadoop with SQL Server

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like