db4o Developer Community

db4o open source object database, native to Java and .NET
Welcome to db4o Developer Community Sign in | Join

ObjectContainer.set() is expensive?

  •  01-12-2008, 08:53 AM

    ObjectContainer.set() is expensive?


    ObjectContainer.set() is expensive?

    I was doing some performance tests for a timesheet application I am writing, and was simulating the addition of thousands of item additions through a web service. Ultimately, the logic of the simulation boils down to:
      public class Item
    {
    int id;
    String description;
    Date date;
    }
    ...
    final int NUM_ITEMS = 10000;
    System.out.println(new Date() + " creating " + NUM_ITEMS + " items");
    ArrayList4<Item> items = new ArrayList4<Item>();
    for(int i = 0; i < NUM_ITEMS; i++)
    {
    Item item = new Item();
    item.id = i;
    item.description = "item # " + i;
    item.date = new Date();
    db.set(item);
    items.add(item);
       db.set(items);
    if (i % 100 == 0)
    {
    System.out.println(new Date() + " set " + i + " items");
    }
    }
    db.commit();
    My output looks like:

    Sat Jan 12 00:11:47 PST 2008 creating 10000 items
    Sat Jan 12 00:11:47 PST 2008 set 0 items
    Sat Jan 12 00:11:48 PST 2008 set 100 items
    ...
    Sat Jan 12 00:11:52 PST 2008 set 700 items
    Sat Jan 12 00:11:53 PST 2008 set 800 items
    Sat Jan 12 00:11:55 PST 2008 set 900 items
    ...
    Sat Jan 12 00:23:48 PST 2008 set 2400 items
    Sat Jan 12 00:23:54 PST 2008 set 2500 items
    Sat Jan 12 00:24:00 PST 2008 set 2600 items
    Sat Jan 12 00:24:06 PST 2008 set 2700 items
    Sat Jan 12 00:24:13 PST 2008 set 2800 items
    Sat Jan 12 00:24:19 PST 2008 set 2900 items
    Sat Jan 12 00:24:27 PST 2008 set 3000 items

    As you can see, it takes longer and longer to set the items. Exploring a bit, I see that the list is being stored afresh on each set(), such that all of the list's items are repeatedly written out to memory.

    This means the doing a set() on a collection is an expensive operation. I would have expected the set() to have a translation cost only on the actual commit. I was expecting that repeated set() calls on the same object would cache the stored object and would only "expand" the list when it actually needs to be committed.

    Now, one could argue that in this case, I should have done the db.set(list) after the for loop. Indeed if I do that the test performs like lightning.

    However, remember this is a web service simulation. The scenario is that a request comes in, the item added to the list, and both item and the list are stored. The loop represents a large set of self-contained orthogonal updates, and each update necessarily includes an item and its list to be stored.

    In a large system, one cannot predict the numbers and sequences of update actions, or whether or not they will be impacting the same shared container collections. For correctness, one will be invoking set() on any change, so as to ensure the changes are persisted.

    I can work around this performance cost by building my own caching layer of changed objects, which then invokes set() on all changed objects immediately before a central commit. The problem is that Db4o's promise is an ease of use with reliable persistence and OO programing. If ObjectContainer.set() did better caching then repeated set() calls on the same object could be really cheap, and the advertised promise would be more of a reality. Programmers could simply "throw" changed objects into the Db4o bucket via set() and just not have to care about it any further -- they would know that things would just work, quickly, reliably, and efficiently.


View Complete Thread