Sunday, December 26, 2010

Hadoop Log

When Hadoop is started, it sets hadoop.log.dir using -Dhadoop.log.dir=$HADOOP_LOG_DIR.
If you don't set environment variable HADOOP_LOG_DIR explicitly, it will be $HADOOP_HOME/logs. If you don't specify HADOOP_HOME, Hadoop will guess it by using path of the script that you use to start Hadoop. So if you install Hadoop to dir <hadoop_dir>, and HADOOP_LOG_DIR is not set, then the log dir is <hadoop_dir>/logs.

If you want to change root log dir, change file 'conf/hadoop-env.sh'. Add a line similar to

export HADOOP_LOG_DIR=/your/local/log/dir

In following table, you should replace those variables which are enclosed in angle brackets.
<jobid>: id of a job
<username>: username of the user who starts up Hadoop.
<host>: host name of the node which runs the process.

Direcotory

Description

Related config parameters

<hadoop.log.dir> Log of various daemons  
  hadoop-<username>-jobtracker-<host>.log Log of jobtracker daemon  
  hadoop-<username>-namenode-<host>.log Log of namenode daemon  
  hadoop-<username>-secondarynamenode-<host>.log Log of secondarynamenode daemon  
  hadoop-<username>-tasktracker-<host>.log Log of tasktracker daemon  
  hadoop-<username>-datanode-<host>.log Log of datanode daemon  
  job_<jobid>_conf.xml Configuration file of a job Only exists when the job is running.
 
<hadoop.log.dir>/history  
mapreduce.jobtracker.jobhistory.location

  job_<jobid>_conf.xml

    Only exists when the job is running.

  job_<jobid>_<username>

    Only exists when the job is running.
 
<hadoop.log.dir>/done log of completed jobs
mapreduce.jobtracker.jobhistory.completed.location
  job_<jobid>_<username> Event logging. It includes all events of the job (e.g. job started, task started).  
  job_<jobid>_conf.xml Job conf file. It includes all configurations of the job.  
 
<hadoop.log.dir>/userlogs Log of attempts. Stored on each task tracker.  
  job_<jobid> Each directory contains log of all attempts of the job.  
 
/jobtracker/jobsInfo (in HDFS) Job Status Store
mapreduce.jobtracker.persist.jobstatus.active
mapreduce.jobtracker.persist.jobstatus.hours
mapreduce.jobtracker.persist.jobstatus.dir
  <jobId>.info job status of a job  

Job logs in <hadoop.log.dir>/history/done directory are kept for mapreduce.jobtracker.jobhistory.maxage. Default value is 1 week.

Monday, November 29, 2010

Schedulers in supercomputers

LoadLeveler

http://www.ccs.ornl.gov/eagle/LL.html
http://www.bu.edu/tech/research/training/scv-software-packages/loadleveler/
http://kb.iu.edu/data/azvs.html

     
Cluster status llstatus  
List jobs in the queues/classes llq  
List information of classes/queues llclass  
Reservation check llqres This may or may not work. Probably you should command showres

PBS

pbsnodes

Moab

http://www.clusterresources.com/products/mwm/docs/a.gcommandoverview.shtml
http://www.clusterresources.com/products/maui/docs/16.1simulationoverview.shtml

mshow -a  # show available resources

Saturday, November 27, 2010

IOMeter

Document: http://iometer.cvs.sourceforge.net/viewvc/iometer/iometer/Docs/Iometer.pdf

Windows: install IOMeter (it includes both dynamo and iometer GUI)

Linux: the linux package only includes dynamo (the runtime engine). It does not include GUI. So you need to connect dynamo to the Windows box on which IOMeter is installed. Use following command to start dynamo

sudo ./dynamo -i iometer_computer_host -n manager_name -m manager_computer_host

Parameters:
iometer_computer_host: host/ip where IOMeter GUI is run (Windows box)
manager_computer_host: host/ip where dynamo is run
manager_name: name of the dynamo manager (can be arbitrary. used to distinguish different dynamo managers connecting to the same IOMeter server)

Several options that are confusing

  1. Disk Targets > # of Outstanding I/Os
    "specifies the maximum number of outstanding asynchronous I/O operations per disk the selected worker(s) will attempt to have active at one time."
    "This control can be overridden by the # of Outstanding I/Os control group in the Test
    Setup tab (depending on the Cycling Options)."
  2. Test Setup > Cycling Options
    "The Cycling Options control group specifies the series of tests that is run for each access specification."
    • Number of workers per manager
      Workers are added in the order they are shown in the Topology panel
    • Number of targets per worker
      selected targets are added to each worker in the order they are shown in the Disk Targets or Network Targets tab for that worker.
    • Number of outstanding I/O operations per target (disk workers only).
    Cycling Options
    1. Normal - run all selected targets for all workers.
      In all managers, all works are active and each work uses all its selected targets. "The number of outstanding I/Os per disk is specified by the # of Outstanding I/Os field in the Disk Targets tab."
    2. Cycle Workers -- add step workers using all selected targets at a time
      "This test type increases the number of workers for each manager in each test.". In each test, the first N (depending on iteration and step) workers listed in Topology panel are used.
    3. Cycle Targets -- add step targets for all workers at a time.
      This test type increases the number of targets for each worker in each test.
    4. Increment Targets Parallel -- add step targets to all managers at a time.
      Test # Manager 1  Manager 2  Targets per Manager
      Worker 1  Worker 2  Worker 1  Worker 2 
      1 1 target  0 targets  1 target  0 targets  1 target 
      2 2 targets  0 targets  2 targets  0 targets  2 targets 
      3 2 targets  1 target  2 targets  1 target  3 targets 
      4 2 targets  2 targets  2 targets  2 targets  4 targets 
    5. Increment Targets Serial -- add step targets at a time.
       Test #    Manager 1    Manager 2    Total Targets 
       Worker 1    Worker 2    Worker 1    Worker 2    
       1    1 target    0 targets    0 targets    0 targets    1 target  
       2    2 targets    0 targets    0 targets    0 targets    2 targets  
       3    2 targets    1 targets    0 targets    0 targets    3 targets  
       4    2 targets    2 targets    0 targets    0 targets    4 targets  
       5    2 targets    2 targets    1 target    0 targets    5 targets  
       6    2 targets    2 targets    2 targets    0 targets    6 targets  
       7    2 targets    2 targets    2 targets    1 target    7 targets  
       8    2 targets    2 targets    2 targets    2 targets    8 targets  
    6. Cycle Targets and Workers -- add step targets at a time spread across workers
      Test  #  Worker 1  Worker 2  Worker 3  Targets  per Manager 
      1 1 target  0 targets  0 targets  1 target 
      2 2 targets  0 targets  0 targets  2 targets 
      3 1 target  1 target  0 targets  2 targets 
      4 2 targets  2 targets  0 targets  4 targets 
      5 1 target  1 target  1 target  3 targets 
      6 2 targets  2 targets  2 targets  6 targets 
    7. Cycle # Outstanding I/Os -- run step outstanding I/Os on all disks at a time.
    8. Cycle # Outstanding I/Os and Targets -- run step outstanding I/Os on step targets at a time

Not all managers have the same number of works. Not all works have the same number of targets. Read following paragraph to see how IOMeter handles it.

"The ending value for each sequence is determined differently for each quantity. For the number of workers per manager, it is determined by the manager with the most workers in the Topology panel. For the number of targets per worker, it is determined by the worker with the most targets selected in the Disk Targets or Network Targets tab. For the outstanding I/Os per target, it is determined by the # of Outstanding I/Os control group in the Test Setup tab.
    If not all managers have the same number of workers, or not all workers have the same number of targets, those managers/workers with less than the maximum number of workers/targets will repeat at their highest value while other managers/workers continue to increase.  "

In Disk Targets Tab, "A yellow icon with a red slash through it means that the drive needs to be prepared before the test starts".

"Yellow disk icons represent logical drives (i.e. those with a drive letter). They are shown only if they are writable. Disk workers access logical drives by reading and writing a file called iobw.tst in the root directory of the drive. If this file exists, the drive is shown with a plain yellow icon; if the file does not exist, the drive is shown with a red slash through the icon. (If this file exists but is not writable, the drive is considered read-only and is not shown at all.)
    If you select a drive that does not have an iobw.tst file, Iometer will begin the test by creating this file and expanding it until the drive is full (this is shown as “Preparing Drives” in the status bar). You can change the size of this file to control how much of the disk Iometer can use by setting the Maximum Disk Size control to the desired size (in sectors).
    Blue disk icons represent physical drives. They are shown only if they contain nothing but free space (no defined partitions). Disk workers access physical drives by writing direct to the raw disk. Physical drives do not use an iobw.tst file. Running with physical drives is recommended."

Friday, November 12, 2010

JsUnit Maven Plugin

Document is at http://jsunit.berlios.de/maven2.html. It's too brief, especially following paragraph:

The type of the test suite, one of the following values:

ALLTESTS
Looks for a class AllTests derived from TestSuite and runs its suite.
TESTSUITES
Looks for all classes ending with TestSuite and that are derived from TestSuite and run their suites.
TESTCASES
Looks for all classes ending with TestCase and that are derived from TestCase and runs them (the default).

The problem is what "derived from" means and how to to that. I will show in detail how to use JsUnit plugin.

1) sample test file

Following is a dummy test file. It should be put into src/test/js.

var dummyobj = dummyobj || {};

function DummyTest(name) {
  TestCase.call(this, name);
};

DummyTest.inherits(TestCase);

DummyTest.prototype.setUp = function() {
    dummyobj.name = "Gerald";
};

DummyTest.prototype.tearDown = function() {
    delete dummyobj.name;
};

DummyTest.prototype.testDummy = function() {
  this.assertEquals('Gerald', dummyobj.name);
};

DummyTest is the test case.
It "inherits" from class TestCase. All of its functions whose name start with "test" will be tested.

2) Inherit implementation

Following is implementation of inherit borrowedfrom Shindig code. It can be put in a file inherit_implementation.js under directory src/main/js.

Function.prototype.inherits = function(parentCtor) {
    function tempCtor() {};
    tempCtor.prototype = parentCtor.prototype;
    this.superClass_ = parentCtor.prototype;
    this.prototype = new tempCtor();
    this.prototype.constructor = this;
};

3) Pom.xml

<plugin>
    <groupId>de.berlios.jsunit</groupId>
    <artifactId>jsunit-maven2-plugin</artifactId>
    <executions>
        <execution>
            <id>test</id>
            <configuration>
                <sourceDirectory>${basedir}/src/main/js</sourceDirectory>
                <sources>
                    <source>inherit_implementation.js</source> 
<source>file_to_be_tested.js</source> </sources> <testSourceDirectory>${basedir}/src/test/js</testSourceDirectory> <testSuites> <testSuite> <name>SampleSuite</name> <type>TESTCASES</type> <includes> <include>*/*test.js</include> </includes> </testSuite> </testSuites> </configuration> <goals> <goal>jsunit-test</goal> </goals> </execution> </executions> </plugin>

Resources

For some tests, you need to provide fake window object, XmlHttpRequest object, DOM objects, etc. Project env-js (site) is exactly designed for this purpose. It provides a simulated browser environment.

Monday, November 08, 2010

How to run/debug Shindig in Eclipse

Instructions here (http://shindig.apache.org/developers/java/build.html#Setting_up_an_Eclipse_project_to_build_Apache_Shindig) are for old versions of Eclipse.

I am using Eclipse Helios. You should install Maven plugin m2eclipse before following following instructions.

  1. Download or check out shindig code.
  2. File –> Import
    Maven –> Existing Maven Projects
    Specify the root directory of shindig code.
  3. Right click top level imported project (*-project):  Debug As –> Debug Configuration
    Create a new configuration for “Maven Build”.
    Base directory: root directory of shindig code
    Profiles: run
    Unselect “Skip Tests”.
    If you want to use a port number rather than 8080, add a parameter “jetty.port”.
    Clicek “Debug”.
  4. Jetty server should run successfully. Look at your console for possible error messages.

Add breakpoints, then send a request to Jetty server. Eclipse complains that it cannot find source code for the debugged app, and prompt you to add source lookup directories.

Wednesday, September 15, 2010

OAuth in OGCE gadget container

OGCE gadget container makes support of OAuth as easy as possible.

Follow two files are the most important related to OAuth configuration.

  1. config/shindig/shindig.properties
    Most important config parameter is shindig.signing.global-callback-url. It has been rewritten correctly so the end users don’t need to worry about it.
  2. config/shindig/oauth.json
    This files contains information of all consumers of which the OGCE gadget container is a proxy.

If you want to write OAuth gadget, follow instructions showed on this page: http://code.google.com/apis/gadgets/docs/oauth.html.
More complicated cases: http://sites.google.com/site/oauthgoog/oauth-proxy/social-oauthproxy

Wednesday, January 06, 2010

Tomcat SSL configuration. Import certificate chain to keystore (emit error “keytool error: java.lang.Exception: Input not an X.509 certificate”)

I seldom use mutual authentication in the context of SSL. Recently, in our derived project (integration with MyOSG), we need to enforce use of mutual SSL authentication.

Enable client authentication in Tomcat server

At first, I enabled it in tomcat server configuration file

An important option is "clientAuth”.
Note: before this step, I have generated and imported a certificate for tomcat server.

Import user certificate

This is done by the end users who wish to access the protected services. They need to

  1. import received certificate from CA to their browser
    Actually, both the private key and certificate need to be imported.
    For Firefox, only pkcs#12 format is supported.  If the private key has been imported, you just need to import certificate whose format can be PEM, binary, etc.
  2. import server’s certificate into trusted ca repository in browser
    The aim is to make the browser trust the certificate received from remote service. It’s useful when the service certificate is not issued by a well-known top-level CA.

After those two steps, I directed my browser to the service url. Unfortunately, I got the following error:

image

After digging a little bit, I found out the cause was that tomcat server does not trust certificate sent by my browser. So the solution is simple: add certificate chain related to my certificate to tomcat keystore.

I got the certificate chain from the issuer of my certificate. It’s in PKCS#7 format and contains two certificates in the file. You can view PKCS7-formatted file using following commands:
1) openssl pkcs7 -print_certs -text < pkcs7_cert_chain.pem
2) keytool -printcert –file pkcs7_cert_chain.pem

When I tried to import it into keystore using following command
    keytool -importcert -file pkcs7_cert_chain.pem -keystore keystore -alias test-cert –trustcacerts
I got the following error
    keytool error: java.lang.Exception: Input not an X.509 certificate
I am sure keytool can recognize the file because the following command prints out information in the file correctly.
    keytool -printcert –file pkcs7_cert_chain.pem

Solution

After reading keytool manual carefully, I found following statements:

Importing a New Trusted Certificate

    Before adding the certificate to the keystore, keytool tries to verify it by attempting to construct
    a chain of trust from that certificate to a self-signed certificate (belonging to a root CA), using
    trusted certificates that are already available in the keystore.

Importing a Certificate Reply

    ……

      o If the reply is a PKCS#7 formatted certificate chain, the chain is first ordered (with the user
        certificate first and the self-signed root CA certificate last), before keytool attempts to
        match the root CA certificate provided in the reply with any of the trusted certificates in the
        keystore or the "cacerts" keystore file (if the -trustcacerts option was specified). If no match
        can be found, the information of the root CA certificate is printed out, and the user is
        prompted to verify it, e.g., by comparing the displayed certificate fingerprints with the fin-
        gerprints obtained from some other (trusted) source of information, which might be the root CA
        itself. The user then has the option of aborting the import operation. If the -noprompt option
        is given, however, there will be no interaction with the user.

So, what I was doing is to import a trusted certificate chain. This is not allowed directly. keytool just accepts cert file that includes a single certificate.

So I extracted two certificates into two files, and fed them into keytool one by one. Details:
Use command
    openssl pkcs7 -print_certs < pkcs7_cert_chain.pem
to display the two certificates in the original pkcs7 file. And then copied and pasted each cert into an individual file.

Note: when you are importing a certificate reply from a CA, certificate chain can be imported directly into keystore. However, before doing that, you must guarantee that the corresponding private key has already been imported in to the same keystore.