Sunday, December 26, 2010

Hadoop Log

When Hadoop is started, it sets hadoop.log.dir using -Dhadoop.log.dir=$HADOOP_LOG_DIR.
If you don't set environment variable HADOOP_LOG_DIR explicitly, it will be $HADOOP_HOME/logs. If you don't specify HADOOP_HOME, Hadoop will guess it by using path of the script that you use to start Hadoop. So if you install Hadoop to dir <hadoop_dir>, and HADOOP_LOG_DIR is not set, then the log dir is <hadoop_dir>/logs.

If you want to change root log dir, change file 'conf/hadoop-env.sh'. Add a line similar to

export HADOOP_LOG_DIR=/your/local/log/dir

In following table, you should replace those variables which are enclosed in angle brackets.
<jobid>: id of a job
<username>: username of the user who starts up Hadoop.
<host>: host name of the node which runs the process.

Direcotory

Description

Related config parameters

<hadoop.log.dir> Log of various daemons  
  hadoop-<username>-jobtracker-<host>.log Log of jobtracker daemon  
  hadoop-<username>-namenode-<host>.log Log of namenode daemon  
  hadoop-<username>-secondarynamenode-<host>.log Log of secondarynamenode daemon  
  hadoop-<username>-tasktracker-<host>.log Log of tasktracker daemon  
  hadoop-<username>-datanode-<host>.log Log of datanode daemon  
  job_<jobid>_conf.xml Configuration file of a job Only exists when the job is running.
 
<hadoop.log.dir>/history  
mapreduce.jobtracker.jobhistory.location

  job_<jobid>_conf.xml

    Only exists when the job is running.

  job_<jobid>_<username>

    Only exists when the job is running.
 
<hadoop.log.dir>/done log of completed jobs
mapreduce.jobtracker.jobhistory.completed.location
  job_<jobid>_<username> Event logging. It includes all events of the job (e.g. job started, task started).  
  job_<jobid>_conf.xml Job conf file. It includes all configurations of the job.  
 
<hadoop.log.dir>/userlogs Log of attempts. Stored on each task tracker.  
  job_<jobid> Each directory contains log of all attempts of the job.  
 
/jobtracker/jobsInfo (in HDFS) Job Status Store
mapreduce.jobtracker.persist.jobstatus.active
mapreduce.jobtracker.persist.jobstatus.hours
mapreduce.jobtracker.persist.jobstatus.dir
  <jobId>.info job status of a job  

Job logs in <hadoop.log.dir>/history/done directory are kept for mapreduce.jobtracker.jobhistory.maxage. Default value is 1 week.

6 comments:

Unknown said...

Thanks for your post. It exactly explains my questions.

Hadoop online training said...

Hi,
Thanks for providing nice information the best way to learn big data training on
hadoop online training
also provides real time projects

Unknown said...

Uniqe informative article and of course True words, thanks for sharing. Today I see myself proud to be a hadoop professional with strong dedication and will power by blasting the obstacles. Thanks to Hadoop Training Chennai

Unknown said...

Thanks for sharing the information about the hadoop.I get a lot of great information from this blog.
AWS Training in chennai | AWS Training chennai | AWS course in chennai

Unknown said...

I known the lot of information and how it works then what are benefits by applying this application through this article.A great thanks for a valuable information.
VMWare Training in chennai | VMWare Training chennai | VMWare course in chennai

Mervin Parmar said...

Using big data analytics may give the companies many fruitful results, the findings can be implemented in their business decisions so as to minimize their risk and to cut the costs.
hadoop training in chennai|big data training|big data training in chennai