Thursday, February 21, 2008

How to access results?

Now, it is time to consider how to make users easily and conveniently access the results of their workflows. There are several questions here:
(1) How to track output files in Karajan workflows?

The first option is to analyze content of the Karajan workflow to figure out output files. For example, for element execute, attribute stdout indicates the name of output file.
<execute executable="/bin/date" stdout="thedate" host="gf1.ucs.indiana.edu" provider="GT2" redirect="false"/>
However, if we use this method to track all output files, it is difficult and time-consuming because it is possible that many elements generate output files. As a result, we must capture possible output files from all these elements.
Another option I can think of is kind of tricky. The newly submitted workflow is executed in a newly created directory. After execution, the files (except workflow file) in the directory are output files. This is the method I am using in my implementation.
(2) How to organize output files?
For the same workflow, we can categorize it based on different criteria. For example, we can categorize a workflow based on the date on which it is submitted, or the date on which it is completed... I would like to make use of workflow id and user id to categorize the workflows. All workflows submitted by a user belong to the same group which can be accessed by this user. Within these workflows, workflow id is used by the user to access a specified workflow. The id of every workflow belonging to a user is unique.
So, the directory layout may look like this:
users/user1/workflow_122/output_file1
users/user1/workflow_122/output_file2
users/user2/workflow_1/output_file1
...
(3) How can users access output files?
After talking with Marlon, I would like to provide RESTful interface by which users can retrieve output files. In my implementation, URLs to access output files look like this:
http://domain:port/resources/user_name/workflow_id/ This retrieves list of all output files for the corresponding workflow.
http://domain:port/resources/user_name/workflow_id/output_file This will retrieve the specified output file directly.

No comments: