Wednesday, May 13, 2015

hadoop hive cluster by

CLUSTER BY
==========
1.CLUSTER BY clause is a short-hand way of expressing the DISTRIBUTE BY with SORT BY

PFB example how we have changed the way of DISTRIBUTE BY with SORT BY
---------------------------------------------------------------------


hive> select empid , ename , esal from disttab CLUSTER BY ename;                                
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Starting Job = job_201304160610_0009, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201304160610_0009
Kill Command = /usr/lib/hadoop/bin/hadoop job  -Dmapred.job.tracker=localhost:8021 -kill job_201304160610_0009
2013-04-16 07:07:57,918 Stage-1 map = 0%,  reduce = 0%
2013-04-16 07:08:05,362 Stage-1 map = 100%,  reduce = 0%
2013-04-16 07:08:14,526 Stage-1 map = 100%,  reduce = 33%
2013-04-16 07:08:15,535 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201304160610_0009
OK
NULL  NULL  NULL
108   Gopal 22000
109   Gopal 22500
110   Gopal 23500
111   Gopal 24340
105   Kanth 56000
113   Karan 45600
112   Karan 45000
104   Madan 45555
103   Murali      23000
102   Prakash     34000
101   Ravi  46000
106   Varma 33333
Time taken: 32.426 seconds
hive>

 

No comments:

Post a Comment