Tuesday, November 27, 2007

Task Execution in CogKit

CogKit hides complexity of backend grid services and provides a uniform interface. To use CogKit, the first question should be: how to submit jobs.
CogKit provides several methods for users to submit jobs, which is flexible enough to satisfy almost all types of requirements.
(1) API

This interface is used by programmers. Note, in my case, I need built-in mechanism provided by CogKit to capture events in real time so that progress can be reported to end users. So, event support is considered in every kind of API.
(1.1) build jobs in program
By this set of classes, programmers can specify every aspect of a job in programs. Main classes that are involved here include Task, Specification(JobSpecification,...), Service, TaskHandler, ServiceContact, SecurityContext ... I have simply described these classes at http://zhenhua-guo.blogspot.com/2007/10/first-cog-program.html.
Sample program which uses this interface:
//create a task
Task task = new TaskImpl("mytest", Task.JOB_SUBMISSION);

//build specification about the job
JobSpecification spec = new JobSpecificationImpl();
spec.setExecutable("/bin/ls");
spec.setStdInput(null);
spec.setRedirected(false);
spec.setStdOutput("abstractions-testOutput");
spec.setBatchJob(true);

//create servic object which is local representation of remote service
Service service = new ServiceImpl(Service.JOB_SUBMISSION);
service.setProvider("GT2");

SecurityContext sc = null;
try{
 sc = AbstractionFactory.newSecurityContext("GT2");
}catch( Exception e ){
 System.exit(1);
}
sc.setCredentials(null);

ServiceContact scontact= new ServiceContactImpl("abc.com", 1234);

service.setSecurityContext(sc);
service.setServiceContact(scontact);

task.setSpecification(spec);
task.setService(Service.JOB_SUBMISSION_SERVICE,service);

TaskHandler handler = new GenericTaskHandler();

try {
    handler.submit( task );
} catch (Exception e){
 System.exit(1);
}
Event:
To add status listener, addStatusListener method can be utilized. The concrete status listener must implement StatusListener interface. However, the granularity of status change report does not satisfy my requirement. Only status "started/completed/failed" of the whole workflow can be captured. In other words, we can not get detailed progress about how things are going on inside the workflow.
(1.2)Karajan workflow language support
When using interface described above, users spend more time on writing and debugging programs than on logical representation of the job. This is not what we expect. As a result, it is not convenient and efficient to program that way. To solve this problem, CogKit team provides an additional support for workflow composition --- Karajan workflow engine. The workflow description can be written in both native format and XML format. It supports all basic elements that should be supported in a workflow engine: user-defined element, variable, function, condition statements( if...else...), loop statement(for, while, ...), sequential execution, parallel execution... In a word, the language is very powerful. Now, users mainly focus on composition of workflow instead of writing and debugging programs. Here, another question crops up: how to submit workflow description to engine?
(1.2.1) class KarajanWorkflow
org.globus.cog.karajan.KarajanWorkflow can be used to submit jobs.
Sample code likes this:
KarajanWorkflow workflow = new KarajanWorkflow();
String filename = "karajan.xml";
File workflowfile = new File(filename);
if( !workflowfile.exists() ){
 System.out.println("The karajan workflow file " + filename +" does not exist!!");
 return ;
}
workflow.setSpecification( workflowfile );
workflow.start();
workflow.waitFor();
Event:
However, there exists a big drawback here. As far as I know, programmers have no way to capture events generated during the execution of the workflow.
(1.2.2) class ExecutionContext
Actually, this calss is used by class KarajanWorkflow internally. I figured out it when I read the source code. This class provides detailed event reports about the internal execution progress of a workflow.
Sample code looks like:
//load workflow description from a file and construct a tree based
//on the content as logical representation.
ElementTree tree = Loader.load("karajan.xml");
//create execution context
ExecutionContext ec = new ExecutionContext(tree);
ec.start();
ec.waitFor();
Sample code with event handling:
//load workflow description from a file and construct a tree based
//on the content as logical representation.
ElementTree tree = Loader.load("karajan.xml");
//create execution context
ExecutionContext ec = new ExecutionContext(tree);
ec.addEventListener(this); //specify event listener
ec.setMonitoringEnabled(true);
ec.start();
ec.waitFor();
The class which handles event must implement EventListener interface. The only function must be implemented is:
public void event(Event e){
if (e instanceof StatusMonitoringEvent) {
 //do some operations
    }else if (e instanceof ProgressMonitoringEvent) {
 //do other operations
    }
}
Generally, users want to know the event is generated by which element/node in the workflow. There is a special class called FlowElement which represents a subpart(execution/transfer/echo/...) of a workflow. You can get the element corresponding to an event by invoking event.getFlowElement() function. In addition, it provides methods to get its children so that you can do traversal.
Note: After the workflow is loaded by the system, it will be converted by an internal format which is more complex and contains more elements than those you write. As a result, a lot of events will be generated even if the workflow description file is very simple. So some filtering work is needed here. My solution: all elements are stored in an ElementTree. Then when an event is received, the target/subject of the event must be checked to see whether it is part of the ElementTree. If not, just ignore it.
(2) desktop interface
This interface satisfies requirements of common users, not programmers. CogKit provides both command line tool and graphic user interface. These functionalities are written in script. Those script files first do some configuration work (mainly CLASSPATH configuration) and then execute built-in .class files in the CogKit package. In other words, it is just a thin wrapper around API.

No comments: