Thursday, March 27, 2008

Workaround of bug in integration of db4o and axis2

I have been trying to solve the problem described in my last post.

I tried different versions of tomcat and Db4o which did not help me out. I posted this problem in forum of Db4o to ask for help. However, no one supplied right solution. I tried countless possible solutions. Finally, I got it to work. Following is the procedure how I discovered my solution.

To enhance speed of development , I tried to find some software support. I have been using Eclipse as my IDE. So naturally WTP(Web Tool Platform) is a great plug-in to support development of web applications in Eclipse. WTP provides support for Tomcat and Axis2 which are being used in my project. Here(http://www.eclipsecon.com/webtools/community/tutorials/BottomUpAxis2WebService/bu_tutorial.html) is a great tutorial about how to deploy and start Axis2 web service in Eclipse.

Then I doubt that the problem maybe results from Axis2. So I decided to write a simple servlet to do similar job. In other words, Axis2 was not used. And test result showed that everything worked correctly with Db4o. So this meaned that it is highly possible that Axis2 is cause of the problem. I did further investigation to uncover what is wrong under the hood. I built the same Axis2 service in Eclipse and deployed the service to Eclipse's temporary publish directory. Surprisingly, it worked!!! However, if I built web service using ADB(Apache Databinding) in Axis2, wrapped it to a .aar archive file and deployed it into specific directory (services), it did not work!!! So, I made sure that the procedure of deployment of Axis2 web service in Eclipse MUST be different from what I did before. Finally, I found that Eclipse does not wrap web service implementation into a .aar archive file. Instead the deployment directory layout is:

axis2
    - WEB-INF
        - lib
        - conf
        - services
            - Sample                        //Sample is the name of this web service
                - META-INF
                    - services.xml       //this file describes the information of this web service.
        - classes
            - package                     //this is path corresponding to package
                - *.class                    //These .class files are implementation of web service.
    - META-INF
    - axis2-web

The original layout is:
axis2
    - WEB-INF
        - lib
        - conf
        - services
            - Sample.aar               //the .aar archive file
        -classes
    - META-INF
    - axis2-web

Difference is highlighted in blue.
The services.xml is also different. The new services.xml is:

<service name="Sample" >
    <description>
        Please Type your service description here
    </description>
    <messageReceivers>
        <messageReceiver mep="http://www.w3.org/2004/08/wsdl/in-only" class="org.apache.axis2.rpc.receivers.RPCInOnlyMessageReceiver" />
        <messageReceiver  mep="http://www.w3.org/2004/08/wsdl/in-out"  class="org.apache.axis2.rpc.receivers.RPCMessageReceiver"/>
    </messageReceivers>
    <parameter name="ServiceClass" locked="false">package.Sample</parameter>
</service>

Besides, in my previous using of Axis2, I used tool provided by Axis2 to construct automatically a stub class and some other auxiliary classes of client side. Then I added my implementation code to that stub class (this stub class contains Axis2-specific stuff). In this new deployment, I did not rely on any Axis2-specific functionality. I just wrote my implementation and compiled it into .class files. And then copied these .class files to directory ${TOMCAT_ROOT}/axis2/classes/package-path/. In addition, I needed to manually create and edit corresponding services.xml file. In my previous deployment, this file is generated automatically by Axis2 tool.
To sum up, this new deployment method eases development of web service because no Axis2-specific stuff is involved during development. The drawback is it is not compact considering that those files are scattered in different places. With regard to this deployment, I did not find much useful information on web. Maybe this method is not recommended, who knows...

However, I have no choice because only one of them works well.

Monday, March 24, 2008

DB4O corrupts in Tomcat with Axis2

I integrated Db4o in our project. Db4o is used to persist state information of workflow. After I integrated it, it seemed to work OK. State information can be successfully stored into its object database. And information can be retrieved successfully. However, after I restarted Tomcat, Db4o corrupted during retrieval of data from database. I tried different kinds of query language Db4o supports. Unfortunately, none of these methods work.

For Native Query, the error in Tomcat log is:

java.lang.IllegalArgumentException: argument type mismatch
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at com.db4o.query.Predicate.appliesTo(Unknown Source)
    at com.db4o.inside.query.PredicateEvaluation.evaluate(Unknown Source)
    at com.db4o.Platform4.evaluationEvaluate(Unknown Source)
    at com.db4o.QConEvaluation.visit(Unknown Source)
    at com.db4o.Tree.traverse(Unknown Source)
    at com.db4o.QCandidates.filter(Unknown Source)
    at com.db4o.QConEvaluation.evaluateEvaluationsExec(Unknown Source)
    at com.db4o.QCon.evaluateEvaluations(Unknown Source)
    at com.db4o.QCandidates.evaluate(Unknown Source)
    at com.db4o.QCandidates.execute(Unknown Source)
    at com.db4o.QQueryBase.executeLocal(Unknown Source)
    at com.db4o.QQueryBase.execute1(Unknown Source)
    at com.db4o.QQueryBase.getQueryResult(Unknown Source)
    at com.db4o.QQueryBase.execute(Unknown Source)
    at com.db4o.inside.query.NativeQueryHandler.execute(Unknown Source)
    at com.db4o.YapStreamBase.query(Unknown Source)
    at com.db4o.YapStreamBase.query(Unknown Source)
    at org.cogkit.cyberaide.axis2ws.StatusDB.getStatusByUID(StatusServiceInterfaceSkeleton.java:1204)
    at org.cogkit.cyberaide.axis2ws.StatusServiceInterfaceSkeleton.getJSONStatusByUID(StatusServiceInterfaceSkeleton.java:184)
    at org.cogkit.cyberaide.axis2ws.StatusServiceInterfaceMessageReceiverInOut.invokeBusinessLogic(StatusServiceInterfaceMessageRece
iverInOut.java:80)
    at org.apache.axis2.receivers.AbstractInOutSyncMessageReceiver.invokeBusinessLogic(AbstractInOutSyncMessageReceiver.java:42)
    at org.apache.axis2.receivers.AbstractMessageReceiver.receive(AbstractMessageReceiver.java:96)
    at org.apache.axis2.engine.AxisEngine.receive(AxisEngine.java:145)
    at org.apache.axis2.transport.http.HTTPTransportUtils.processHTTPPostRequest(HTTPTransportUtils.java:275)
    at org.apache.axis2.transport.http.AxisServlet.doPost(AxisServlet.java:120)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:710)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:803)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
    at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
    at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
    at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
    at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
    at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
    at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:263)
    at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
    at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:584)
    at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
    at java.lang.Thread.run(Thread.java:595)

It seems that the error results from incompatible type conversion. I am sure I followed the instructions elaborated in the official document.

Then, I tried SODA. The output was weirder. The program could not function as I expected. State information could not be retrieved successfully. However, when I investigated tomcat log, no error report existed in log!!! So, I had no way to tell what happened under the hood. This was pretty annoying. It seemed that Db4o indeed retrieved something. However, the fields of the object retrieved are invalid.

I read almost all posts in Db4o forum/community. I tried all suggested solutions. But it still doesn't work. I tried different versions of Db4o, none of them solved my problem.

It has taken me lots of time. Currently, I am not sure whether I can solve it finally...

Monday, March 17, 2008

State Information Persistence (Status server)

Previously, state information of all workflows in status server is kept in memory. As you know, this strategy is not practical if there is too much information so that it cannot fit into memory. In other words, data persistence is necessary. Actually, after careful investigation, I found two solutions.
(1) Cache
    This strategy stores some data in hard disk. And the data recently used is saved in memory to enhance performance. So it is kind of data-centric mechanism making use of locality.
    Ehcache(http://ehcache.sourceforge.net/) is a project based on this strategy.
    In order to put data into cache, you MUST construct a element object which contains key and value. From ehcache 1.2 the key and value can be serializable objects which improve the flexibility. The unique way for data query is to specify a key object. Obviously, much work must be done to compose complex query. Ehcache provides ways for users to control every aspect of cache behaviors.
(2) Database
    This strategy makes use of database to store information. There is a corresponding name : object database. The database provides interfaces by which common database operations (insert/query/update/delete) can be invoked. However, this kind of database is different from traditional relation database. The unit of manipulation in object database is an object. In other words, you can insert/query/delete objects instead of tuples. In addition to retrieval of field value in a specific object, member functions of the object can also be invoked.
Db4o(http://www.db4o.com/) is an object database. Here is a simple introduction written by me: http://zhenhua-guo.blogspot.com/2008/03/db4o-introduction.html.

    Actually, these two strategies are not competitive. Implementation of object database may make use of cache mechanism. So it is not surprising to hear that some great applications (hibernate, Spring...) use Ehcache. For upper level programmers, I think object database should be easier to work on.

    At last, I decided to apply DB4O to our project. One important reason is that this strategy lets programmers write code in high level. I don't need to care about any details of database infrastructure. As a result, modification of my original code is little so that it can be done quickly. Moreover, maintenance of code is easier. We store both data and related operations in database instead of scattered places. What's more, support of query is more powerful in DB4O than in Ehcache. It supports three kinds of query languages: Query By Example, Native Queries and SODA. And complex query can be composed easily.

One lesson: when dealing with data retrieved from object database, users MUST be careful about type conversion so that incompatible conversion won't occur. Because this kind of errors don't appear until run time, debugging becomes more difficult especially for web applications. I was aware of cause of the error by careful reading lengthy Tomcat log.

Result:
   State information of workflows is stored in hard disk and cached in memory by using DB4O.

DB4O introduction

Db4o is a high-performance object database for Java and .NET.

Ø         Open db

     ObjectContainer db = Db4o.openFile(filename);

 

Ø         Insert

Objects are inserted by using set() method.

ClassName obj = new ClassName(parameters);

db.set(obj);

 

Ø         Retrieve

(1)    Query by Example (QBE)

       Create a prototypical object for db4o to use as an example of what you wish to retrieve. Db4o will return all of the objects which match all non-default field values. The results will be returned as an ObjectSet instance.

ClassName obj = new ClassName(values…); //prototypical object

ObjectSet result = db.get( obj );

listResult( result );

Db4o supplies a shortcut to retrieve all instances of a class:

ObjectSet result = db.get(ClassName.class);

Following code can be used to iterate over the results:

while( result.hasNext() ){

    System.out.println( result.next() );

}

(2)    Native Query(NQ) --- main db4o querying interface.

       Native Queries provide the ability to run one or more lines of code against all instances of a class. Native query expressions return true to mark specific instances as part of the result set.

    List<ClassName> objs = db.query( new Predicate<ClassName>() {

           public Boolean match(ClassName obj){

                  return obj.getProperty() == value;

           }

    }

       Users must be very careful with side effects --- especially those that might affect persistent objects.

(3)    SODA Query API

 

Ø         Update

       Updating objects is as easy as storing them. You use the same set() method to update objects: just call set() again modifying any object.

ObjectSet result = db.get(new ClassName(parameters));

ClassName found = (ClassName)result.next();

found.methodName(parameters);

db.set(found);

Note: we query the object first. If the object is not ‘known’ (having been previously stored or retrieved during the current session), db4o will insert a new object instead of updating existing object. In this case, db4o think that you want to insert a new object which has the same field values.

 

Ø         Delete

       Objects are removed by using delete() method.

ObjectSet result = db.get( new ClassName(…));

ClassName found = (ClassName)result.next();

db.delete( found );

If you want to tune DB4O to get higher performance, you need to change the default configuration.

Thursday, March 13, 2008

Some workflow related projects

DIET

http://graal.ens-lyon.fr/DIET/

DIET (Distributed Interactive Engineering Toolbox) seems to be similar to Condor. DIET is based on Grid-RPC. Clients submit computation requests to a scheduler whose goal is to find a server available on the grid. The aim of the DIET project is to develop a set of tools to build computational servers. Scheduling is frequently applied to balance the work among the servers and a list of available servers is sent back to the client; the client is then able to send the data and the request to one of the suggested servers to solve their problem. The Distributed Interactive Engineering Toolbox (DIET) project is focused on the development of scalable middleware with initial efforts focused on distributing the scheduling problem across multiple agents. DIET consists of a set of elements that can be used together to build applications using the Grid-RPC paradigm.

So the goal of DIET project is different from that of our project.


Taverna

Recently, I installed and tried Taverna. Then I investigated its functionality in detail.

Its manual is here: http://www.mygrid.org.uk/usermanual1.7.

Taverna is created by myGrid project and is a tool used for designing and executing workflows. “It provides a desktop authoring environment and enactment engine for scientific workflows expressed in SCUFL (Simple Conceptual Unified Flow Language).” SCUFL is proprietary. As a result, I can not find detailed information about SCUFL.

Services are connected with data links (providing data flow) and control links (coordination of services not connected through data flow).

Several features:

  1. Fault Tolerance

    • Retries for every processor

If a certain processor fails to execute, it will be retried several times. Users can specify maximum number of retries, delay of retries and backoff. Backoff is a factor determining how much the delay time increases for subsequent retries beyond the first.

    • Alternative processor

Users can specify an alternative processor or list of processors which perform the same task as the primary processor. And the alternate is used in place of the main processor if the latter has failed. Note: the alternate has its own definable parameters for ‘Retries’, ‘Delay’ and ‘Backoff’.

  1. Iteration

Taverna supports two kinds of iteration. They are dot and cross. Cross iteration is an all-against-all iteration which means it iterates over all combinations of input values. For dot iteration, the first item for one input is related to the first item in the other input and the second item for one input is related to the second item in the other input…

  1. Services

Taverna provides some built-in services. Among them are XML transformation, base64 encoding/decoding, write text file… Besides, beanshell and RShell are supported as well.

It also supports some well-known bioinformatics services including Soaplab, Biomart, Biomoby. I am not familiar with biology related tools. So I am not sure whether these services are based on web services.

Besides, Taverna supports a functionality called WSDL scavenger. Users can specify address of WSDL document and Taverna will automatically fetch the WSDL document and analyze its content to extract supported operations. Then supported operations in the WSDL document are added to list of available processors so that users can make use of them easily in course of workflow composition.

In addition, Taverna can scavenge existing workflow and extract processors.

Summary

Taverna is a tool designed specifically for Bioinformatics. However, some features may be also useful even if they are applied to more generic applications. These features include WSDL scavenger, dot/cross iteration, fault tolerance…

It is different from our project as Taverna is not based on Grid. The enactment engine is located in client-side machine. In our project, enactment engine is located at server side which makes use of Java CoG kit to manage execution of workflows.

Social website myExperiment

This web site supports finding and sharing of workflows and has special support for Scufl workflows. Users can download workflows posted on the site. For every workflow written in Scufl, there is a corresponding .svg image which is easier to understand and verbose xml Scufl document.

It seems that myExperiment supports almost all functionalities most web2.0 web site supports. They include user management, group management, workflow management, blog, forum, tagging, rating, and commenting. Moreover, some statistics (number of reviews, number of comments…) is done.

Moteur

This project is also based on SCUFL.


Karajan Workflow:

  1. Is there a way to invoke some operations described in a WSDL document?