db4o is pretty interesting Monday, December 05, 2005

I have done a lot of work with a number of Java persistence solutions including JDBC, JDO, Hibernate, JSR-220 Persistence and Prevayler (have just tinkered with Prevayler... no real work). Just recently I started investigating yet another persistence solution called db4objects (db4o). db4o has versions for Java, .NET and Mono. I have only investigated the Java version. In the time that I have spent investigating db4o I have found some pretty interesting stuff.

One thing that stands out is that db4o allows you to persist plain 'ol Java objects as plain 'ol Java objects. POJOs need not extend any magic base class or implement any special interface. POJOs need not have any special id field. POJOs need not have any special constructor. There is no requirement for a no-arg constructor or even a public constructor. db4o doesn't require any object descriptors (XML or otherwise) and doesn't require you to mark persistent classes up with annotations. db4o does not require that persistent fields have Java Bean compliant getters and setters. db4o pretty much will take your objects as they come.

db4o can run in embedded mode which means that all of db4o is inside of your applications process. There doesn't need to be a separate db4o process running somewhere. There can be if that suits your deployment needs, but there doesn't need to be.

db4o has some pretty robust schema evolution capabilities that allow fields to be added, removed and renamed. There is support for moving classes to new packages.

db4o is an OO database. db4o is not an object to relational mapping tool, the db is an OO db.

One of the issues I initially had concerns about was performance once the database accumulated large numbers of objects. To test some of this, I created a fairly simple object model to represent music cds. The classes look something like this...


public class CDArtist {
private String name;
private List<CD> cds = new ArrayList<CD>();
// constructor and methods snipped...
}


public class CD {
private String title;
private List<CDTrack> tracks = new ArrayList<CDTrack>();
// constructor and methods snipped...
}

public class CDTrack {
private String title;
private int trackNumber;
private CD cd;
// constructor and methods snipped...
}


I downloaded a gaboodle of data from freedb.org and wrote a simple parser to turn that data into instances of my classes and started dropping those into my db4o database. At present I have about 75,000 artists, 185,000 cds and over 2,000,000 tracks in the database. I realize that for a lot of situations even those 2,000,000+ tracks don't really amount to a lot of data but that is what I am currently working with. It is enough data to exercise some of the things I wanted to look at.

db4o supports 3 query techniques...



I will show a simple example of each of these here and include some performance figures.

Query By Example (QBE)



The following code uses QBE to retrieve all CDs in the database that contain a track with the name "The Trooper".


Db4o.configure().objectClass(CDTrack.class).maximumActivationDepth(0);
Db4o.configure().objectClass(CDTrack.class).objectField("title").indexed(true);

// my custom factory
ObjectContainer db = ObjectContainerFactory.get().createObjectContainer();

CDTrack myCandidateTrack = new CDTrack(null, 0, "The Trooper");
List<CDTrack> results = db.get(myCandidateTrack);
for (CDTrack track : results) {
CD theCD = track.getCd();
System.out.println(theCD);
}

db.close();



That query completes in less than 300-400 milliseconds. That is querying over 2,000,000 tracks, identifying the ones that match the name "The Trooper" and retrieving the cd that the track belongs to (note the call to track.getCd() inside of the loop).

S.O.D.A.




Db4o.configure().objectClass(CDTrack.class).maximumActivationDepth(0);
Db4o.configure().objectClass(CDTrack.class).objectField("title").indexed(true);

// my custom factory
ObjectContainer db = ObjectContainerFactory.get().createObjectContainer();

Query query = db.query();
query.constrain(CDTrack.class);
query.descend("title").constrain("The Trooper");
ObjectSet<CDTrack> results = query.execute();
for (CDTrack track : results) {
CD theCD = track.getCd();
System.out.println(theCD);
}

db.close();



That S.O.D.A. query executes in about 100 milliseconds. Again querying over 2,000,000 tracks and retrieving the matching tracks and their containing cds.

Native Query



This is what a native query might look like in db4o.


Db4o.configure().objectClass(CDTrack.class).maximumActivationDepth(0);
Db4o.configure().objectClass(CDTrack.class).objectField("title").indexed(true);

// my custom factory
ObjectContainer db = ObjectContainerFactory.get().createObjectContainer();

List<CDTrack> results = db.query(new Predicate<CDTrack>() {
public boolean match(CDTrack candidate) {
return candidate.getTitle().equals("The Trooper");
}
});
for (CDTrack track : results) {
CD theCD = track.getCd();
System.out.println(theCD);
}

db.close();


This approach has some nice benefits. One is that you get real compile time type safety. The query isn't some arbitrary string that might or might not be legal at runtime. The query is real Java code that gets compiled. That is nice. However, that Predicate looks a little suspect to me. Judging from looking at the code it seems that the db4o engine is going to have to create all of my CDTrack objects and pass each of them one a time to my match(CDTrack) method so I can decide which of them match my criteria. Since I have over 2,000,000 tracks that can't be efficient. The code above executes in about 100-150 milliseconds. I am still querying those 2,000,000+ tracks and retrieving all the same stuff I was retrieving in the previous examples. What is going on here at runtime is that db4o is doing some slick class loading voodoo and figuring out what my Predicate would do, then it optimizes all of that away by turning my Predicate into a S.O.D.A. query. Run the code in a debugger and find that my match(CDTrack) method never actually gets called. There are limits here. The optimizer does a good job of figuring out what you intended to do but you can do things in your Predicate that the optimizer can't figure out in which case the Predicate cannot be optimized away and then the engine will have to create all of those CDTracks and pass them to the match method. This is easy enough to sort out at development time if you need to make sure the Predicate will be optimized. While experimenting with this don't try to put a System.out.println call or logging calls in your Predicate to monitor if your Predicate is getting called or not. Those fall in the category of things that the optimizer can't handle and a side effect of them being there is that the method will not get optimized away. There is a callback mechanism you can hookup to retrieve notifications that indicate when a Predicate is optimized and when it isn't.


ObjectContainer db = ObjectContainerFactory.get().createObjectContainer();

((YapStream)db).getNativeQueryHandler().addListener(new Db4oQueryExecutionListener() {
public void notifyQueryExecuted(Predicate filter, String msg) {
}
});


That callback will be notified when db4o first deals with any particular Predicate. The msg argument will by "DYNOPTIMIZED" if the query has been dynamicallly optimized. msg will be "UNOPTIMIZED" if the query could not be optimized. msg will be "PREOPTIMIZED" if the query had been pre optimized. db4o has some bytecode manipulation tools to preoptimize Predicates but I have not investigated that.

They are working on some Hibernate replication modules that will allow a db4o database to be kept in synch with a relational database via Hibernate. That code is still in development and I haven't looked at any of that.

The discussion forums are pretty active and as far as I can tell there is a lot of momentum behind the effort right now.

db4o is distributed under a couple of different license. There is a GPL version available for open source projects, experimenting and internal projects. There is also a commercial license available for commercial, non-open source applications. As far as I know, the GPL version is the same software as the commercial version. The restrictions have to do with distribution, not the software itself.

No reasonable person is going to claim that db4o is the silver bullet of persistence but it is interesting stuff and probably makes a lot of sense for a lot of applications. If nothing else, it is a good thing to be aware of.