Sunday, January 20, 2008

Add Support for Job Dependency Edit

Lately, I have been working on support for client-side job queue management and job dependency management.
This two parts can be designed and implemented separately. But I think putting them together is better which is more user-friendly.
Before jobs are submitted, they are maintained at client side.
Currently, following functionalities are supported:
(1) add a job to job queue
(2) remove a job from job queue
(3) edit dependency between jobs.
To make the system easy to use, I provide visual widget interface.

Main interface:
job_management_small
Job addition:
After a use inputs the workflow description in the text area and workflow name in the text field, he/she can add the workflow to job queue by clicking button "Add to Queue". If a job with that name has already existed, a prompt window is popped up. Then user can choose to overwrite current job or modify the name.
By clicking button "Job Management" or tab "Job Management", user can be redirected to job management panel.
Note: name of every workflow must be specified. In other words, value of text filed "Workflow Name" can not be left blank. Moreover, different workflows/jobs can not have a same name. So name of every job must be different.

Job Management Panel:
As described in the picture, every rectangle represents a job and every line with arrow represents dependency between two jobs.
When a use adds job1, job2,...jobn, the default relation is that job2 depends on job1, job3 depends on job2 and jobn depends on jobn-1.
When you move curse over a rectangle for a few seconds, a pop-up window is displayed which contains the content of that job.
job_dependency_panel 

When you right click (click with right button) a rectangle, a context menu is displayed. This menu contains several items.
job_dependency_panel_contextmenu
Currently, items are "from", "to", "delete" and "edit".
(1) If menu item "delete" is clicked, the corresponding job will be deleted permanently from job queue.
   When a job is deleted, all related dependency is deleted as well. There are two kinds of dependency: one is that some other jobs depend on this job; the other is that this job depend on some other jobs.
(2) If menu item "from" is clicked, the corresponding job is marked as the starting point of dependency. Assume it is called parentJob.
(3) If menu item "to" is clicked, the corresponding job is markded as the end point of dependency. Assume it is called childJob.
Then there are three possible steps:
  (3.1) If parentJob is null.
    In this case, it means the user has not selected a job by clicking menu item "from". Then nothing will happen.
  (3.2) If job parentJob is not prerequisite of job childJob.
    And then a line is drawn from the rectangle which represents job parentJob and the rectangle which represents job childJob. And job parentJob is considered as prerequisite of job childJob.
  (3.3) If job parentJob is already prerequisite of job childJob.
    In this case, there must be an existing line drawn from the rectangle which represents job parentJob and the rectangle which represents job childJob. Then this relation is deleted and the line is removed from display.
(4) If menu item "edit" is clicked, the system will redirect user to the workflow edit panel.

User can use drag-and-drop to move the rectangles to anywhere in the screen. Related lines (two situations: one is that some lines starts from this job; the other is some lines ends at this job) are moved as well. Note: You can not use drag-and-drop to move the lines.
Following picture is a sample job dependency graph I got:
job_dependency_sample
Next step:
Currently, all operations above are carried out at client side and no interaction with server is involved.
Issue:
Next step is related to how to send the job queue to server side.
In Karajan, I don't think workflow-level composition is supported directly. However, Karajan provides two elements parallel and sequential which can control the execution sequence of subtasks in a workflow.
So one idea is that all jobs in a job queue are put into a single big workflow which uses elements parallel and sequential to represent original relationship.  One natural question is that whether elements parallel and sequential are enough to express any possible relationship among jobs.
My answer is no.
For the job dependency shown above, I can not think of a way to represent it  with Karajan elements parallel and sequential.
Solution 1:
So, if my conclusion is correct, we can implement a sub system which manages sequence of job submission to underlying grid infrastructure. In my opinion, it is better to be put at server side.
Solution 2:
Aother solution is that we can simplify this issue at cost of losing performance. We can get a job submission sequence by using topological sort. In other words, all jobs are submitted sequentially. Obviously, performance is not the best because some jobs actually can be executed in parallel.

No comments: