Monday, March 17, 2008

State Information Persistence (Status server)

Previously, state information of all workflows in status server is kept in memory. As you know, this strategy is not practical if there is too much information so that it cannot fit into memory. In other words, data persistence is necessary. Actually, after careful investigation, I found two solutions.
(1) Cache
    This strategy stores some data in hard disk. And the data recently used is saved in memory to enhance performance. So it is kind of data-centric mechanism making use of locality.
    Ehcache(http://ehcache.sourceforge.net/) is a project based on this strategy.
    In order to put data into cache, you MUST construct a element object which contains key and value. From ehcache 1.2 the key and value can be serializable objects which improve the flexibility. The unique way for data query is to specify a key object. Obviously, much work must be done to compose complex query. Ehcache provides ways for users to control every aspect of cache behaviors.
(2) Database
    This strategy makes use of database to store information. There is a corresponding name : object database. The database provides interfaces by which common database operations (insert/query/update/delete) can be invoked. However, this kind of database is different from traditional relation database. The unit of manipulation in object database is an object. In other words, you can insert/query/delete objects instead of tuples. In addition to retrieval of field value in a specific object, member functions of the object can also be invoked.
Db4o(http://www.db4o.com/) is an object database. Here is a simple introduction written by me: http://zhenhua-guo.blogspot.com/2008/03/db4o-introduction.html.

    Actually, these two strategies are not competitive. Implementation of object database may make use of cache mechanism. So it is not surprising to hear that some great applications (hibernate, Spring...) use Ehcache. For upper level programmers, I think object database should be easier to work on.

    At last, I decided to apply DB4O to our project. One important reason is that this strategy lets programmers write code in high level. I don't need to care about any details of database infrastructure. As a result, modification of my original code is little so that it can be done quickly. Moreover, maintenance of code is easier. We store both data and related operations in database instead of scattered places. What's more, support of query is more powerful in DB4O than in Ehcache. It supports three kinds of query languages: Query By Example, Native Queries and SODA. And complex query can be composed easily.

One lesson: when dealing with data retrieved from object database, users MUST be careful about type conversion so that incompatible conversion won't occur. Because this kind of errors don't appear until run time, debugging becomes more difficult especially for web applications. I was aware of cause of the error by careful reading lengthy Tomcat log.

Result:
   State information of workflows is stored in hard disk and cached in memory by using DB4O.

2 comments:

Anonymous said...

Hi!
This is Nice Blog!

We offers conversion from various file formats and media to other. With our extensive technical expertise in this area, we are almost certain to cater to any of your complex xml/sgml
conversion
requirements, be it in any format, file types or media.

German Viscuso said...

Hey, it looks like a cool project! Is it open source? Can you by chance share it with the db4o community in projects.db4o.com?

Have you solved the problem that you mention here? http://zhenhua-guo.blogspot.com/2008/03/db4o-corrupts-in-tomcat-with-axis2.html

Best!

German Viscuso
db4o community manager