Wednesday, May 13, 2015

PARTITIONING vs BUCKTING-IN-HIVE

Basically both Partitioning and Bucketing slice the data for executing the query much more efficiently than on the non-sliced data. The major difference is that the number of slices will keep on changing in the case of partitioning as data is modified, but with bucketing the number of slices are fixed which are specified while creating the table.


Bucketing happen by using a Hash algorithm and then a modulo on the number of buckets. So, a row might get inserted into any of the bucket. Bucketing can be used for sampling of data, as well also for joining two data sets much more effectively and much more.

No comments:

Post a Comment