Hadoop
InstallationHadoop installation instructions: http://hadoop.apache.org/core/docs/current/quickstart.html and http://hadoop.apache.org/core/docs/current/cluster_setup.html.
To set up hadoop cluster, generally two configuration files should be modified:
hadoop-site.xml and slaves.
(1) My hadoop-site.xml looks like:
<configuration> <property> <name>fs.default.name</name> <value>pg3:9000</value> </property> <property> <name>mapred.job.tracker</name> <value>pg3:9001</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>Read file hadoop-default.xml for all available options.
(2) My slaves file looks like:
localhost pg1 pg2
I need to install Hadoop on three machines now and I use rsync to make these machines synchronized with eath other in terms of configuration.
Commands
(*) Format a new file system: hadoop namenode -format
(*) Start/stop Hadoopstart-dfs.sh/stop-dfs.sh
start up the distributed file system (HDFS)start-mapred.sh/stop-mapred.sh
start up map reduce service.start-all.sh/stop-all.sh
start up both HDFS and map reduce service
Hadoop reads content in file slaves to get all nodes and then starts up all these nodes.
Check status of the services
HDFS: http://domain:50070/
MapReduce: http://domain:50030/
HBase
Installation instructions: http://hadoop.apache.org/hbase/docs/current/api/overview-summary.html#overview_descriptionThe configuration file is hbase-site.xml. My hbase-site.xml looks like
<configuration> <property> <name>hbase.master</name> <value>pg3:60000</value> <description>The host and port that the HBase master runs at.</description> </property> <property> <name>hbase.rootdir</name> <value>hdfs://pg3.ucs.indiana.edu:9000/hbase</value> <description>The directory shared by region servers.</description> </property> </configuration>
Commandsstart-hbase.sh
starts up hbase servicestop-hbase.sh
stop hbase service
Note: hbase bases its functionalities on hadoop. Sometimes it is necessary for hbase to know the configuration of hadoop. Following statements are excerpted from hbase document which I think is important:
"Of note, if you have made HDFS client configuration on your hadoop cluster, hbase will not see this configuration unless you do one of the following:An example of such an HDFS client configuration is
- Add a pointer to your
HADOOP_CONF_DIR
toCLASSPATH
inhbase-env.sh
- Add a copy of
hadoop-site.xml
to${HBASE_HOME}/conf
, or- If only a small set of HDFS client configurations, add them to
hbase-site.xml
dfs.replication
. If for example, you want to run with a replication factor of 5, hbase will create files with the default of 3 unless you do the above to make the configuration available to hbase. "
No comments:
Post a Comment