Tuesday, April 28, 2015

Counter In MapReduce

Counters:
======

  Counters are the useful channel for gathering statistics about the job

* for the quality control OR application level statitics
* For Problem diagnosis.

Built-in Counters:
---------------------

Hadoop maintains some built-in counters for every job, which report various metrics  for our job.

Ex:  expected amount of INPUT consumed

      expected amount of OUTPUT produced

Some built in counters:

 1. Map input records --> num of input records consumed by all the maps in the JOB. Incremented every time a record is
                                    read from InputSplit (thru RecordReader) before passing to map() method of Mapper.

2. Map output records--> num of outputt records produced by all the maps in the JOB. Incremented every time a collect()
                                    method is called on Context object

 like wise,

  3. Reduce input records

  4. Reduce output records


** Counters are maintained by the task with which they are associated, and periodically sent to the task tracker and
    then to Job Tracker. So they all can be globally aggregated.

    The built-in Job Counters are actually maintained by the job tracker, so they do not need to be sent across the network
    unlike the all other counters ,including the user defined ones.

No comments:

Post a Comment