After communication with Mike, I got the dag.k file. Then I implemented the construction of DAG workflow based on all jobs and their relationship in a workflow. In other words, I put together all this information in a large Karajan workflow by using DAG. Note: this work is done at client side, not at server side. After construction of the huge workflow, it can be submitted to server just as a small job. Then id of the new submitted workflow will be returned. Based on this workflow id, a user can query state of the workflow. In addition, the user can access output files of the workflow by using common HTTP GET request.
More details about DAG construction:
Assume that we have four jobs in a workflow: job1, job2, job3, job4.
And their dependencies are:
job1 -> job2 ( this means job2 depends on job1 )
job1 -> job3
job3 -> job4
job3 -> job5
job1 -> job5
These dependencies are represented in following graph:
Then constructed DAG workflow looks like this:
<project>
<include file="cogkit.k"/>
<include file="dag.k"/>
<discard>
<dag>
<node>
<string>job1</string> //Here is name of the job.
<element>
<quotedlist/>
content of job1
</element>
<edges>
<string>job2</string> //Here, it describes that job1 is prerequisite of job2, job3 and job5
<string>job3</string>
<string>job5</string>
</edges>
</node>
<node>
<string>job2</string>
<element>
<quotedlist/>
content of job2
</element>
</node>
<node>
<string>job3</string>
<element>
<quotedlist/>
content of job3
</element>
<edges>
<string>job4</string>
<string>job5</string>
</edges>
</node>
<node>
<string>job4</string>
<element>
<quotedlist/>
content of job4
</element>
</node>
<node>
<string>job4</string>
<element>
<quotedlist/>
content of job4
</element>
</node>
</dag>
</discard>
</project>
2 comments:
Hi, your two posts about Karajan workflows were very helpful! I was also trying to describe DAGs in Karajan and got them working much faster after reading what you did. Thanks! :)
I am glad to hear that my post is helpful to you.
Post a Comment