Thursday, February 21, 2008

DAG construction

After communication with Mike, I got the dag.k file. Then I implemented the construction of DAG workflow based on all jobs and their relationship in a workflow. In other words, I put together all this information in a large Karajan workflow by using DAG. Note: this work is done at client side, not at server side. After construction of the huge workflow, it can be submitted to server just as a small job. Then id of the new submitted workflow will be returned. Based on this workflow id, a user can query state of the workflow. In addition, the user can access output files of the workflow by using common HTTP GET request.

More details about DAG construction:
Assume that we have four jobs in a workflow: job1, job2, job3, job4.
And their dependencies are:
job1 -> job2 ( this means job2 depends on job1 )
job1 -> job3
job3 -> job4

job3 -> job5
job1 -> job5
These dependencies are represented in following graph:
sample_workflow 
Then constructed DAG workflow looks like this:
<project>
<include file="cogkit.k"/>
<include file="dag.k"/>
<discard>
    <dag>
        <node>
            <string>job1</string>            //Here is name of the job.
            <element>
                <quotedlist/>
               content of job1
            </element>
            <edges>
                <string>job2</string>         //Here, it describes that job1 is prerequisite of job2, job3 and job5
                <string>job3</string>
                <string>job5</string>
            </edges>
        </node>
        <node>
            <string>job2</string>
            <element>
                <quotedlist/>
                content of job2
            </element>
        </node>
        <node>
            <string>job3</string>
            <element>
                <quotedlist/>
                content of job3
            </element>
            <edges>
                <string>job4</string>
                <string>job5</string>
            </edges>
        </node>
        <node>
            <string>job4</string>
            <element>
                <quotedlist/>
                content of job4
            </element>
        </node>
        <node>
            <string>job4</string>
            <element>
                <quotedlist/>
                content of job4
            </element>
        </node>
    </dag>
</discard>
</project>

2 comments:

Unknown said...

Hi, your two posts about Karajan workflows were very helpful! I was also trying to describe DAGs in Karajan and got them working much faster after reading what you did. Thanks! :)

Gerald Guo said...

I am glad to hear that my post is helpful to you.