Sunday, December 26, 2010

Hadoop Log

When Hadoop is started, it sets hadoop.log.dir using -Dhadoop.log.dir=$HADOOP_LOG_DIR.
If you don't set environment variable HADOOP_LOG_DIR explicitly, it will be $HADOOP_HOME/logs. If you don't specify HADOOP_HOME, Hadoop will guess it by using path of the script that you use to start Hadoop. So if you install Hadoop to dir <hadoop_dir>, and HADOOP_LOG_DIR is not set, then the log dir is <hadoop_dir>/logs.

If you want to change root log dir, change file 'conf/hadoop-env.sh'. Add a line similar to

export HADOOP_LOG_DIR=/your/local/log/dir

In following table, you should replace those variables which are enclosed in angle brackets.
<jobid>: id of a job
<username>: username of the user who starts up Hadoop.
<host>: host name of the node which runs the process.

Direcotory

Description

Related config parameters

<hadoop.log.dir> Log of various daemons  
  hadoop-<username>-jobtracker-<host>.log Log of jobtracker daemon  
  hadoop-<username>-namenode-<host>.log Log of namenode daemon  
  hadoop-<username>-secondarynamenode-<host>.log Log of secondarynamenode daemon  
  hadoop-<username>-tasktracker-<host>.log Log of tasktracker daemon  
  hadoop-<username>-datanode-<host>.log Log of datanode daemon  
  job_<jobid>_conf.xml Configuration file of a job Only exists when the job is running.
 
<hadoop.log.dir>/history  
mapreduce.jobtracker.jobhistory.location

  job_<jobid>_conf.xml

    Only exists when the job is running.

  job_<jobid>_<username>

    Only exists when the job is running.
 
<hadoop.log.dir>/done log of completed jobs
mapreduce.jobtracker.jobhistory.completed.location
  job_<jobid>_<username> Event logging. It includes all events of the job (e.g. job started, task started).  
  job_<jobid>_conf.xml Job conf file. It includes all configurations of the job.  
 
<hadoop.log.dir>/userlogs Log of attempts. Stored on each task tracker.  
  job_<jobid> Each directory contains log of all attempts of the job.  
 
/jobtracker/jobsInfo (in HDFS) Job Status Store
mapreduce.jobtracker.persist.jobstatus.active
mapreduce.jobtracker.persist.jobstatus.hours
mapreduce.jobtracker.persist.jobstatus.dir
  <jobId>.info job status of a job  

Job logs in <hadoop.log.dir>/history/done directory are kept for mapreduce.jobtracker.jobhistory.maxage. Default value is 1 week.