Gerald Guo CGL Report: database

Showing posts with label database. Show all posts

Monday, August 10, 2009

MySQL Connection Timeout Error in Hibernate

Problem

I am using Hibernate and MySQL in myour project. After our application is started, it just works well. However, if the application is not used for some time (like one day), it throws connection exception when it is used again. Trace of the exception:

com.mysql.jdbc.CommunicationsException: 
The last packet successfully received from the server was seconds ago.
The last packet sent successfully to the server was 407270 seconds ago, which  is longer than the server configured value of 'wait_timeout'. 
You should consider either expiring and/or testing connection validity before use in your application, increasing the server configured values for client timeouts, or using the Connector/J connection property 'autoReconnect=true' to avoid this problem.

I googled this problem, and it seems that many other persons also suffer this problem. I found some really nice blog posts about how to solve this problem.

According to the error message, I set the option autoReconnect (http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-configuration-properties.html) to true. But it still does not work. I searched in google and it shows this option works in some platforms while it doesn't work in other platforms.

Solution

Basically, the solution is 1) use a connection manager 2) tweak parameters of the connection manager.

C3P0 is a free JDBC connection manager.
Some C3p0 configuration parameters are mapped to Hibernate config. Those mapped configuration parameters are listed at http://www.mchange.com/projects/c3p0/index.html#hibernate-specific.

"Please note that configuration parameters that ARE mapped into hibernate's config file MUST be set within hibernate's configuration, or else they will be overridden by hibernate-specified defaults.
[The hibernate-mapped c3p0 parameters are minPoolSize, maxPoolSize, maxIdleTime, maxStatements, acquireIncrement, testConnectionOnCheckout, and idleConnectionTestPeriod. These map to the fllowing hibernate parameters: hibernate.c3p0.min_size, hibernate.c3p0.max_size, hibernate.c3p0.timeout, hibernate.c3p0.max_statements, hibernate.c3p0.acquire_increment, hibernate.c3p0.validate, and hibernate.c3p0.idle_test_period. DataSources configured in Tomcat should always use c3p0-native parameter names. But pools constructed under the covers by Hibernate must be configured in hibernate's config file with hibernate-defined parameter names where applicable.] "

For those C3p0 parameters that are NOT mapped, they should be specified in file WEB-INF/classes/c3p0.properties.
Note: C3P0 configuration parameter <c3p0.testConnectionsOnCheckout> is mapped to <hibernate.c3p0.validate> only in hibernate 2.x!
Also the configuration parameter c3p0.testConnectionsOnCheckout is really expensive. Users can use c3p0.maxIdleTime (it is mapped to hibernate.c3p0.timeout in hibernate) instead.

My hibernate.cfg.xml includes following configuration:

        <property name="connection.provider_class">org.hibernate.connection.C3P0ConnectionProvider</property>
        <property name="c3p0.min_size">5</property>
        <property name="c3p0.max_size">30</property>
        <property name="c3p0.timeout">600</property>
        <property name="c3p0.max_statements">0</property>
        <property name="c3p0.acquire_increment">5</property>
        <property name="c3p0.idle_test_period">60</property>

The idle test operation can be optimized by specifying option preferrredTestQuery.
I am using MySQL as underlying RDB and putting c3p0.preferredTestQuery = SELECT 1 into c3p0.properties works fine (this is stolen from post comment http://www.databasesandlife.com/automatic-reconnect-from-hibernate-to-mysql/#comment-3156).

Resources

A really good article about how to reconnect from Hibernate to MySQL: http://www.databasesandlife.com/automatic-reconnect-from-hibernate-to-mysql/
C3P0 (Connection manager): http://www.mchange.com/projects/c3p0/index.html
How to configure C3P0 in Hibernate: https://www.hibernate.org/214.html
A forum thread about C3P0 configuration in Hibernate: https://forum.hibernate.org/viewtopic.php?t=934779&highlight=c3p0

Tuesday, June 02, 2009

H2 Exception "Database may be already in use: Locked by another process"

I wrote two Java web servlet apps (oauth consumer app, and oauth server app) both of which rely on H2 as database engine. I want H2 to be run in embedded mode instead of server mode. According to the official document, following statement should be added to web.xml

<listener>
    <listener-class>org.h2.server.web.DbStarter</listener-class>
</listener>

This class has methods that listen to the servlet context event (context initialization, context destroyed).

When I deploy just one of the two web apps to tomcat, it works fine. When I deploy both of them, following exception is thrown

Exception in thread "main" org.h2.jdbc.JdbcSQLException: Database may be already in use: Locked by another process. Possible solutions: close all other connecti on(s); use the server mode [90020-63]

I am 100% sure that those two apps use different databases. To make myself 200% sure that my code does not create or use additional databases unintentionaly, I rechecked my code many many times. But my code looks just fine. So I doubted it may be because of the H2 database engine.
Then I used command 'lsof' to list opened files by tomcat java process. Surprisingly, a database named 'test' is created under my home directory.
I reread the document http://www.h2database.com/html/tutorial.html#web_applications, and I got what I missed before.

For details on how to access the database, see the file DbStarter.java. By default the DbStarter listener opens an embedded connection using the database URL 'jdbc:h2:~/test', user name 'sa', and password 'sa'.

Code of class DbStarter can be accessed at http://code.google.com/p/h2database/source/browse/trunk/h2/src/main/org/h2/server/web/DbStarter.java.
I am just curious about why a default database is created by class DbStarter.

Solutions
(1) remove the listener declaration in web.xml
In other words, remove
<listener> <listener-class>org.h2.server.web.DbStarter</listener-class> </listener>
(2) change url of created database.

  <context-param>
    <param-name>db.url</param-name>
    <param-value>jdbc:h2:your_new_db_url</param-value>
  </context-param>

Sunday, September 07, 2008

Hibernate installation (run on MySQL)

Recently, I tried to install Hibernate on gridfarm machine. The version I used was 3.3.0 SP1.
Configuration document can be found here. However, the document seems not to be updated.
(1) Configuration file hibernate.properties vs. hibernate.cfg.xml.
In the document, either one can be used. If both are used, configurations in hibernate.cfg.xml would override configurations in hibernate.properties. The document says using hibernate.properties is an easy way to configure Hibernate. So I chose it. However, after trial I found that in Hibernate 3.3.0 SP1 hibernate.cfg.xml. In other words, you MUST have a configuration file "hibernate.cfg.xml". Of course, you can use programmatical configuration which is not covered in this post.
(2) Logger
In documents, I saw "Hibernate logs various events using Apache commons-logging.". However, when I tried to run my Hibernate app, it complained that logger class can not be found. After inspection, I figured out that Hibernate 3.3.0 does NOT use Apache common-logging any more. Instead it uses slf4j. But in Hibernate 3.3.0 SP1 distribution only slf4j-api jar is included while an implementation is needed to run Hibernate(If you turn off logging, I don't know whether it will still complain). In other words, to use logging functionality, you must download an implementation of logging service supported by slf4j. I used slf4j-simple.
(3) JNDI bindings (for session factory and data source).
Because I just wanted to write a simple prototype without deploying in J2EE server, I chose not to use JNDI bindings for data source or session factory. After several hour trial, I could not get it to work!! I always got the error "org.hibernate.impl.SessionFactoryObjectFactory - Could not bind factory to JNDI".It seemed that Hibernate always was always trying to use JNDI to bind Session Factory. However, the official document says

"A JNDI bound Hibernate SessionFactory can simplify the lookup of the factory and the creation of new Sessions. Note that this is not related to a JNDI bound Datasource, both simply use the same registry!

If you wish to have the SessionFactory bound to a JNDI namespace, specify a name (eg. java:hibernate/SessionFactory) using the property hibernate.session_factory_name. If this property is omitted, the SessionFactory will not be bound to JNDI. (This is especially useful in environments with a read-only JNDI default implementation, e.g. Tomcat.)

When binding the SessionFactory to JNDI, Hibernate will use the values of hibernate.jndi.url, hibernate.jndi.class to instantiate an initial context. If they are not specified, the default InitialContext will be used.

Hibernate will automatically place the SessionFactory in JNDI after you call cfg.buildSessionFactory(). This means you will at least have this call in some startup code (or utility class) in your application, unless you use JMX deployment with the HibernateService (discussed later).

If you use a JNDI SessionFactory, an EJB or any other class may obtain the SessionFactory using a JNDI lookup.

We recommend that you bind the SessionFactory to JNDI in a managend environment and use a static singleton otherwise. To shield your application code from these details, we also recommend to hide the actual lookup code for a SessionFactory in a helper class, such as HibernateUtil.getSessionFactory(). Note that such a class is also a convenient way to startup Hibernate—see chapter 1. "

It turned out that a property called "hibernate.current_session_context_class" must be set.
My hibernate.cfg.xml looks like:

<!DOCTYPE hibernate-configuration PUBLIC
    "-//Hibernate/Hibernate Configuration DTD 3.0//EN"
    "http://hibernate.sourceforge.net/hibernate-configuration-3.0.dtd">

<hibernate-configuration>
    <session-factory>
        <property name="show_sql">true</property>
        <property name="current_session_context_class">thread</property>
        <property name="connection.username">username</property>
        <property name="connection.password">password</property>
        <property name="connection.url">jdbc:mysql://localhost/hibernate</property>
        <property name="connection.driver_class">com.mysql.jdbc.Driver</property>
        <property name="dialect">org.hibernate.dialect.MySQLDialect</property>
        <mapping resource="User.hbm.xml"/> 
    </session-factory>
</hibernate-configuration>

Then, I tried a very very simple read-only JNDI service provider - simple-jndi. My simple-jndi configuration file (jndi.properties which should be put into root of your classpath) looks like:

#The first (required) parameter, org.osjava.sj.root, is the location 
#of your simple-jndi root, which is the location in which simple-jndi
#looks for values when code asks for them.

org.osjava.sj.root=/home/zhguo/share/simple-JNDI/config
java.naming.factory.initial=org.osjava.sj.SimpleContextFactory

Under directory pointed to by org.osjava.sj.root property, I have a jndi data file called hibernate_jndi.properties:

mysql.type=javax.sql.DataSource
mysql.driver=com.mysql.jdbc.Driver
mysql.url=jdbc:mysql://localhost/hibernate
mysql.user=username
mysql.password=password

So, my new version of hibernate.cfg.xml looks like:

<hibernate-configuration>
    <session-factory>
        <property name="show_sql">true</property>
        <property name="current_session_context_class">thread</property>
        <property name="connection.datasource">hibernate_jndi.mysql</property>
        <property name="dialect">org.hibernate.dialect.MySQLDialect</property>
        <mapping resource="User.hbm.xml"/> 
    </session-factory>
</hibernate-configuration>

Note: in the file, session factory name is not set! This is because we are using a read-only JNDI. If you use a read-write JNDI, you can set the session factory name in one of the following ways:
(*) In hibernate.properties file:
hibernate.session_factory_name hibernate/session_factory
(*) In hibernate.cfg.xml file:
<session-factory name="your name goes here">

Wednesday, August 27, 2008

Hadoop and HBase port usage

Hadoop port usage

On name node
50090: org.apache.hadoop.dfs.SecondaryNameNode
33220: ditto.(This port is not fixed and may be changed when hadoop is restarted)

50070: org.apache.hadoop.dfs.NameNode
9000:ditto
46684: ditto(This port is not fixed and may be changed when hadoop is restarted)

9001: org.apache.hadoop.mapred.JobTracker

50030: ditto
60502: ditto(This port is not fixed and may be changed when hadoop is restarted)

On data node
50075: org.apache.hadoop.dfs.DataNode
50010: ditto
45868: ditto(This port is not fixed and may be changed when hadoop is restarted)

50060: org.apache.hadoop.mapred.TaskTracker
55027: ditto(This port is not fixed and may be changed when hadoop is restarted)

HBase port usage
On master
60000: org.apache.hadoop.hbase.master.HMaster start
60010: ditto

On data node
60030: org.apache.hadoop.hbase.regionserver.HRegionServer
60020: ditto

Monday, August 25, 2008

Insert pubchem data into HBase

HBase shell
HBase provides a shell utility which lets users to execute simple commands. The shell can be started up using:
${HBASE_HOME}/bin/hbase shell
Then input command help to get help document which describes usage of various supported commands. These commands can be used to manipulate data stored in HBase. E.g. command list can be used to list all tables in hbase. Command get can be used to get row or cell contents from hbase. Command put can be used to store data into a cell.

Data insertion
Data source is ftp://ftp.ncbi.nlm.nih.gov/pubchem/. I modified python scripts and C source code given by Rajarshi.
Data retrieval and processing steps:

Download all information about compounds from ftp://ftp.ncbi.nlm.nih.gov/pubchem/Compound/CURRENT-Full/SDF. I finally got 123 GB.
Decompress those files
Extract information from these .sdf files and write it to .txt files. Library openbabel is used to compile the C++ code.
Combine those .txt files generated in step 3 into one big .dat file
Write a ruby script to insert all data in the .dat file into HBase.
Command is like this: ${HBASE_HOME}/bin/hbase org.jruby.Main rubyscript.rb

Why did I write Ruby script instead of Java program in step 5?
HBase is written in Java and so provides Java API. However, to compile Java programs is kind of cumbersome -- set lengthy CLASSPATH ...
So I chose to write scripts which can be executed directly by HBase shell. I found useful information on this page. There is a section called "scripting" in that post. But the information there is far from complete. It does not tell readers how to write the scripts. At first, I wrote a script which included some shell commands, one command per line, and then fed it to hbase shell. Unfortunately, it didn't work. After enumerous trials, I found that Ruby scripts could be fed to shell. Ruby scripts cannot make use of existing shell commands directly. Ruby binding of original Java APIs must be used.

I have not learnt Ruby at all before. So I must teach myself to grasp basic knowledge about Ruby. Ruby is sort of different in terms of syntactic flexibility. It supports so many shorthands to improve productivity. Anyway, "Ruby is easy to learn, but hard to master". By the way, Ruby documents seem not to be abundant compared with Python, Perl...

How to write Ruby scripts for HBase?
This site http://wiki.apache.org/hadoop/Hbase/JRuby contains related information. But I could not run the sample script successfully because of errors in the script!!! Damn it! I wonder whether the author had tested the code before he released it. Some errors are so obvious.
After the ruby script is completed, it can be executed using:
${HBASE_HOME}/bin/hbase org.jruby.Main rubyscript.rb
Java API:
http://hadoop.apache.org/hbase/docs/current/api/index.html

My ruby script:

#!/usr/bin/ruby -w

include Java
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.HColumnDescriptor
import org.apache.hadoop.hbase.HTableDescriptor
import org.apache.hadoop.hbase.client.HBaseAdmin
import org.apache.hadoop.hbase.client.HTable
import org.apache.hadoop.hbase.io.BatchUpdate
import org.apache.hadoop.io.Text

pubchem_compound_fields = [
    'cid',
    'iupac_openeye_name',
    'iupac_cas_name',
    'iupac_name',
    'iupac_systematic_name',
    'iupac_traditional_name',
    'nist_inchi',
    'cactvs_xlogp',
    'cactvs_exact_mass',
    'openeye_mw',
    'openeye_can_smiles',
    'openeye_iso_smiles',
    'cactvs_tpsa',
    'total_charge',
    'heavy_atom_count']

compound_table_name = 'compound'

numfields = pubchem_compound_fields.length

path = "/home/zhguo/BigTable/BigTable-Pubchem/data/"
filename = "#{path}compound.dat"
file = File.new(filename, 'r')
counter = 0

conf = HBaseConfiguration.new
tablename = compound_table_name
tablename_text = Text.new(tablename)
desc = HTableDescriptor.new(tablename)
coltextarr = Array.new
pubchem_compound_fields.each_with_index do |v, i|
    if (i == 0) then next; end
    desc.addFamily(HColumnDescriptor.new("#{v}:"))
    coltextarr << Text.new("#{v}:")
end

admin = HBaseAdmin.new(conf)
if !admin.tableExists(tablename_text) then
    admin.createTable(desc)
=begin
    puts "deleting table #{tablename_text}"
    admin.disableTable(tablename_text)
    admin.deleteTable(tablename_text)
    puts "deleted table #{tablename_text} successfully"
=end
end

#admin.createTable(desc)
table = HTable.new(conf, tablename_text)

startind = 1641500 #from which line should we start.This
                   #is useful when you don't want to start
                   #from the beginning of the data file.

nlines = `cat #{filename} | wc -l`

logfilename = 'updatedb.log'
logfile = File.new(logfilename, "a")
while (line = file.gets) #&& (counter < 20)
    counter += 1
    if (counter < startind) then
        next
    end
    msg = "processing line #{counter}/#{nlines}"
    logfile.puts msg
    if counter%100 == 0 then
        print  msg
        STDOUT.flush
        logfile.flush
    end

    arr = line.split("\t")
    len = arr.length
        if (numfields != len) then
        next
    end
    rowindex = 0
    rowname = arr[rowindex]
    arr.delete_at(rowindex)
    row = Text.new(rowname)
    b = BatchUpdate.new(row)

    arr.each_with_index do |v, i|
        str = java.lang.String.new(v)
        b.put(coltextarr[i], str.getBytes("UTF-8"))
    end
    table.commit(b)
end

Sunday, August 24, 2008

Installation and configuration of Hadoop and Hbase

Hadoop

Installation
Hadoop installation instructions: http://hadoop.apache.org/core/docs/current/quickstart.html and http://hadoop.apache.org/core/docs/current/cluster_setup.html.
To set up hadoop cluster, generally two configuration files should be modified:
hadoop-site.xml and slaves.
(1) My hadoop-site.xml looks like:

<configuration>
  <property>
    <name>fs.default.name</name>
    <value>pg3:9000</value>
  </property>
  <property>
    <name>mapred.job.tracker</name>
    <value>pg3:9001</value>
  </property>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
</configuration>

Read file hadoop-default.xml for all available options.
(2) My slaves file looks like:

localhost
pg1
pg2

I need to install Hadoop on three machines now and I use rsync to make these machines synchronized with eath other in terms of configuration.

Commands
(*) Format a new file system: hadoop namenode -format
(*) Start/stop Hadoop
start-dfs.sh/stop-dfs.sh
start up the distributed file system (HDFS)
start-mapred.sh/stop-mapred.sh
start up map reduce service.
start-all.sh/stop-all.sh
start up both HDFS and map reduce service

Hadoop reads content in file slaves to get all nodes and then starts up all these nodes.

Check status of the services
HDFS: http://domain:50070/
MapReduce: http://domain:50030/

HBase

Installation instructions: http://hadoop.apache.org/hbase/docs/current/api/overview-summary.html#overview_description
The configuration file is hbase-site.xml. My hbase-site.xml looks like

<configuration>
  <property>
    <name>hbase.master</name>
    <value>pg3:60000</value>
    <description>The host and port that the HBase master runs at.</description>
  </property>

  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://pg3.ucs.indiana.edu:9000/hbase</value>
    <description>The directory shared by region servers.</description>
  </property>
</configuration>

Commands
start-hbase.sh starts up hbase service
stop-hbase.sh stop hbase service

Note: hbase bases its functionalities on hadoop. Sometimes it is necessary for hbase to know the configuration of hadoop. Following statements are excerpted from hbase document which I think is important:

"Of note, if you have made HDFS client configuration on your hadoop cluster, hbase will not see this configuration unless you do one of the following:

Add a pointer to your HADOOP_CONF_DIR to CLASSPATH in hbase-env.sh
Add a copy of hadoop-site.xml to ${HBASE_HOME}/conf, or
If only a small set of HDFS client configurations, add them to hbase-site.xml
An example of such an HDFS client configuration is dfs.replication. If for example, you want to run with a replication factor of 5, hbase will create files with the default of 3 unless you do the above to make the configuration available to hbase. "

Monday, March 17, 2008

DB4O introduction

Db4o is a high-performance object database for Java and .NET.

Ø Open db

ObjectContainer db = Db4o.openFile(filename);

Ø Insert

Objects are inserted by using set() method.

ClassName obj = new ClassName(parameters);

db.set(obj);

Ø Retrieve

(1) Query by Example (QBE)

Create a prototypical object for db4o to use as an example of what you wish to retrieve. Db4o will return all of the objects which match all non-default field values. The results will be returned as an ObjectSet instance.

ClassName obj = new ClassName(values…); //prototypical object

ObjectSet result = db.get( obj );

listResult( result );

Db4o supplies a shortcut to retrieve all instances of a class:

ObjectSet result = db.get(ClassName.class);

Following code can be used to iterate over the results:

while( result.hasNext() ){

System.out.println( result.next() );

}

(2) Native Query(NQ) --- main db4o querying interface.

Native Queries provide the ability to run one or more lines of code against all instances of a class. Native query expressions return true to mark specific instances as part of the result set.

List<ClassName> objs = db.query( new Predicate<ClassName>() {

public Boolean match(ClassName obj){

return obj.getProperty() == value;

}

Users must be very careful with side effects --- especially those that might affect persistent objects.

(3) SODA Query API

Ø Update

Updating objects is as easy as storing them. You use the same set() method to update objects: just call set() again modifying any object.

ObjectSet result = db.get(new ClassName(parameters));

ClassName found = (ClassName)result.next();

found.methodName(parameters);

db.set(found);

Note: we query the object first. If the object is not ‘known’ (having been previously stored or retrieved during the current session), db4o will insert a new object instead of updating existing object. In this case, db4o think that you want to insert a new object which has the same field values.

Ø Delete

Objects are removed by using delete() method.

ObjectSet result = db.get( new ClassName(…));

ClassName found = (ClassName)result.next();

db.delete( found );

If you want to tune DB4O to get higher performance, you need to change the default configuration.

Gerald Guo CGL Report

Monday, August 10, 2009

MySQL Connection Timeout Error in Hibernate

Problem

Solution

Resources

Tuesday, June 02, 2009

H2 Exception "Database may be already in use: Locked by another process"

Sunday, September 07, 2008

Hibernate installation (run on MySQL)

Wednesday, August 27, 2008

Hadoop and HBase port usage

Monday, August 25, 2008

Insert pubchem data into HBase

Sunday, August 24, 2008

Installation and configuration of Hadoop and Hbase

Hadoop

HBase

Monday, March 17, 2008

DB4O introduction

About Me

Blog Archive

Web Related

Data Formatter

Html-Code convert

Misc

Keywords/Tags

Followers

Monday, August 10, 2009

Problem

Solution

Resources

Tuesday, June 02, 2009

Sunday, September 07, 2008

Wednesday, August 27, 2008

Monday, August 25, 2008

Sunday, August 24, 2008

Hadoop

HBase

Monday, March 17, 2008

About Me

Subscribe To

Blog Archive

Web Related

Data Formatter

Html-Code convert

Misc

Keywords/Tags

Followers