Gerald Guo CGL Report: December 2010

When Hadoop is started, it sets hadoop.log.dir using -Dhadoop.log.dir=$HADOOP_LOG_DIR.
If you don't set environment variable HADOOP_LOG_DIR explicitly, it will be $HADOOP_HOME/logs. If you don't specify HADOOP_HOME, Hadoop will guess it by using path of the script that you use to start Hadoop. So if you install Hadoop to dir <hadoop_dir>, and HADOOP_LOG_DIR is not set, then the log dir is <hadoop_dir>/logs.

If you want to change root log dir, change file 'conf/hadoop-env.sh'. Add a line similar to

export HADOOP_LOG_DIR=/your/local/log/dir

In following table, you should replace those variables which are enclosed in angle brackets.
<jobid>: id of a job
<username>: username of the user who starts up Hadoop.
<host>: host name of the node which runs the process.

Direcotory	Description	Related config parameters
<hadoop.log.dir>	Log of various daemons
hadoop-<username>-jobtracker-<host>.log	Log of jobtracker daemon
hadoop-<username>-namenode-<host>.log	Log of namenode daemon
hadoop-<username>-secondarynamenode-<host>.log	Log of secondarynamenode daemon
hadoop-<username>-tasktracker-<host>.log	Log of tasktracker daemon
hadoop-<username>-datanode-<host>.log	Log of datanode daemon
job_<jobid>_conf.xml	Configuration file of a job	Only exists when the job is running.

<hadoop.log.dir>/history		mapreduce.jobtracker.jobhistory.location
job_<jobid>_conf.xml		Only exists when the job is running.
job_<jobid>_<username>		Only exists when the job is running.

<hadoop.log.dir>/done	log of completed jobs	mapreduce.jobtracker.jobhistory.completed.location
job_<jobid>_<username>	Event logging. It includes all events of the job (e.g. job started, task started).
job_<jobid>_conf.xml	Job conf file. It includes all configurations of the job.

<hadoop.log.dir>/userlogs	Log of attempts. Stored on each task tracker.
job_<jobid>	Each directory contains log of all attempts of the job.

/jobtracker/jobsInfo (in HDFS)	Job Status Store	mapreduce.jobtracker.persist.jobstatus.active mapreduce.jobtracker.persist.jobstatus.hours mapreduce.jobtracker.persist.jobstatus.dir
<jobId>.info	job status of a job

Job logs in <hadoop.log.dir>/history/done directory are kept for mapreduce.jobtracker.jobhistory.maxage. Default value is 1 week.

Gerald Guo CGL Report

Sunday, December 26, 2010

Hadoop Log

About Me

Blog Archive

Web Related

Data Formatter

Html-Code convert

Misc

Keywords/Tags

Followers