I/O access times play a crucial role in the overall performance of a database.
That is why we introduce two tools to
- measure the actual I/O performance of a system as seen by db4o
- simulate the behaviour of a slower system on a faster one
All the code presented in this article can be found in the db4otools project / com.db4o.bench package.
SVN: https://source.db4o.com/db4o/trunk/db4otools/
The code can be compiled with JDK 1.3 and higher.
IoBenchmark
The main class of the benchmark is com.db4o.bench.IoBenchmark.
Let's have a look at it's run method to see what it does.
private void run(IoBenchmarkArgumentParser argumentParser) throws IOException {
runTargetApplication(argumentParser.objectCount());
prepareDbFile(argumentParser.objectCount());
runBenchmark(argumentParser.objectCount());
}As you can see from this code, the benchmark consists of 3 stages:
- Run a target application and log its I/O access pattern
- Replay the recorded I/O operations once to prepare a database file.
This step is necessary to ensure that during the grouped replay in the next step, none of the accesses will go beyond the currently existing file. - Replay the recorded I/O operations a second time. Operations are grouped by command type (read, write, seek, sync), and the total time executing all operations of a specific command type is measured. Grouping is necessary to avoid micro-benchmarking effects and to get time values above timer resolution.
We divide the numbers collected in stage 3 by the respective number of operations and we calculate the average time a particular command takes on the given system.
But enough of the theory for the moment, let's see how you can run the benchmark.
For this purpose there is the pair of an Ant script and a properties file:
- IoBenchmark.xml: The Ant script
- IoBenchmark.properties: Holding configurations for the Ant script
Both files are located in the root of db4otools.
To be able to run the benchmark from the Ant script, you have to put a db4o JAR file in the lib folder of the db4otools project. Insert the name of the JAR in the db4o.jar property in the property file, e.g.
db4o.jar=db4o-7.1.27.9109-java5.jar
and you are ready to go!
To give it a first try, you can run the run.benchmark.small target of the Ant script, which is also the default target.
You should get output similar to this:
=============================================================
Running db4o IoBenchmark
=============================================================
Running target application ...
Preparing DB file ...
Running benchmark ...
-------------------------------------------------------------
db4o IoBenchmark results with 1000 items
Statistics written to db4o-IoBenchmark-results-1000.log
-------------------------------------------------------------
Results for READ
> operations executed: 14331
> time elapsed: 16 ms
> operations per millisecond: 895.6875
> average duration per operation: 0.001116460819203126 ms
READ 1116 ns
Results for WRITE
> operations executed: 9508
> time elapsed: 16 ms
> operations per millisecond: 594.25
> average duration per operation: 0.0016827934371055953 ms
WRITE 1682 ns
Results for SYNC
> operations executed: 7821
> time elapsed: 921 ms
> operations per millisecond: 8.49185667752443
> average duration per operation: 0.11775987725354814 ms
SYNC 117759 ns
Results for SEEK
> operations executed: 23839
> time elapsed: 16 ms
> operations per millisecond: 1489.9375
> average duration per operation: 6.711690926632829E-4 ms
SEEK 671 ns
As the output indicates, the results of this benchmark run will also be written to a file called db4o-IoBenchmark-results-1000.log. You can find this file in the db4otools directory.
The ns (nanosecond) values are our benchmark standard for the respective operation. Smaller numbers are better.
Note: It may be possible, that you get some zero values for time elapsed, and therefore infinity for operations per ms. This can occur if your machine is fast enough to execute all operations under 1ms. To overcome this you can run the run.benchmark.medium target which operates with more objects and takes longer to complete.
Using Your Own Application to Generate the I/O Pattern
When you execute IoBenchmark, it uses a simple CRUD (create, read, update, delete) application as the target application. This application is located in the com.db4o.bench.crud package.
If you want to use your own application for generating the I/O access patterns, here's what you have to do:
-
Use a LoggingIoAdapter, delegating to your default IoAdapter:
RandomAccessFileAdapter rafAdapter = new RandomAccessFileAdapter();
IoAdapter ioAdapter = new LoggingIoAdapter(rafAdapter, "filename.log");
Configuration config = Db4o.cloneConfiguration();
config.io(ioAdapter);
You'll also find this code in com.db4o.bench.crud.CrudApplication#prepare().
-
Change IoBenchmark to call your application by modifying the runTargetApplication() method.
You also have to exchange the calls to CrudApplication.logFileName(itemCount) in prepareDbFile() and runBenchmark
with the file name of the log containing the I/O access pattern of your application.
Using the code from above, this log file will be called "filename.log".
If you want to generate your log by interacting with your application, rather than having IoBenchmark calling it, do as follows:
- Use a LoggingIoAdapter in your application
- Interact with your application to create the log
- Remove the stage 1 from IoBenchmark and make it start in stage 2 with your log.
If you are using your own application to generate the I/O log file, check out the section "
I/O Log File Statistics" further down.
Delaying I/O Operations to Simulate Slow Behaviour on a Fast Machine
The code for this section is located in the com.db4o.bench.delaying package.
To run delaying, the System.nanoTime() is needed. This method was introduced with Java 5.
If you only have older versions installed, get the latest here:
http://java.sun.com/javase/downloads/You also need a java5 db4o JAR file, otherwise you'll see a NotImplementedException when the benchmark tries to access nanoTime().
Think of the following scenario: You develop software with db4o for a target system, that has much slower I/O than your developer system (e.g. an embedded device).
Wouldn't it sometimes be nice getting a feel for the expected speed your application will work with on the target system without having to deploy to it?
In particular, if you want to profile your system with a profiler like
JProbe, simulating the expected slow I/O on a device will help you identifying the bottlenecks in your application.
This is where the results of IoBenchmark and a DelayingIoAdapter enter the arena. If you run IoBenchmark on both the embedded device and your developer machine you get two results files. Copy the file from the slower device to the db4otools folder on the faster machine and set both filenames in IoBenchmark.properties:
results.file.1=db4o-IoBenchmark-results-30000_faster.log
results.file.2=db4o-IoBenchmark-results-30000_slower.log
It's not necessary that results.file.1 holds the faster log, any order will work.
You are now set to run the benchmark in delayed mode. The expected result of such a run is, that the results of a delayed run on the faster machine should be close to those on the slow device.
To do a delayed run execute one of the run.delayed.benchmark.* targets of the Ant script.
At the beginning of the output - prior to the benchmark results - you'll notice additional information about the delaying:
=============================================================
Running db4o IoBenchmark
=============================================================
Delaying:
> machine1 (db4o-IoBenchmark-results-30000_faster.log) is faster!
> Required delays:
> [delays in nanoseconds] read: 8195 | write: 10669 | seek: 10098 | sync: 215121
> Adjusting delay timer to match required delays...
> Adjusted delays:
> [delays in nanoseconds] read: 4934 | write: 7387 | seek: 6849 | sync: 202203
Running target application ...
Preparing DB file ...
Running benchmark ...
[...]
Let's have a look at what exactly is going on when setting up delaying.
First there is a check for the validity of the two result files for delaying. To pass this check, one of the two supplied benchmark results file must contain the better values for all the 4 operations. This constraint exists because it's not possible to speed things up, only slowing them down.
Once this check is passed, the delays are calculated by simply subtracting the numbers found in the result files. The resulting numbers tell us, how long each I/O operation should be delayed on the faster machine
to get the same behaviour as on the slower one.
The problem is now that just simply waiting for the calculated amount of time will make us wait for too long. This is due to additional setup time for each wait (method calls) and the "at least" semantics of the
wait method itself.
To cope with this limitation there is a delay adjustment logic. It tries to find the actual delay to wait for such that the overall waiting time, including the setup method calls, matches the desired delay time.
However, there's a catch to this adjustment logic: On each machine there's a minimum delay that can be achieved with waiting, and this delay is not equal to zero (e.g. 400ns)! If the performance of the two machines is too close together, it is possible that when trying to adjust a delay, the outcome is below the minimum delay achievable.
In this case you'll see output like this:
>> Smallest achievable delay: 400
>> Required delay setting: 260
>> Using delay(0) to wait as short as possible.
>> Results will not be accurate.
To find out which delay actually was too small, and hence which results won't be accurate, take a look at the adjusted delays:
> Adjusted delays:
> [delays in nanoseconds] read: 0 | write: 7387 | seek: 6849 | sync: 202203
Here the read delay was too small and therefore the results for read are expected to be slower than targeted.
Once the delays are adjusted, they can be fed to the DelayingIoAdapter (as done in IoBenchmark#delayingIoAdapter):
IoAdapter rafFactory = new RandomAccessFileAdapter();
IoAdapter delFactory = new DelayingIoAdapter(rafFactory, _delays);
IoAdapter io = delFactory.open(dbFileName, false, 0, false);If you now configure db4o with the IoAdapter io from above, each I/O operation will be delayed by the respective delay stored in _delays!
The above IoAdapter setup is also exactly what you need in your own application to simulate the slower I/O of your target device on your faster machine.
I/O Log File Statistics
To get statistically meaningful results from the benchmark it is necessary that the I/O log file contains enough operations of each type. To get an overview on how well your I/O log file represents each operation, you can use the class LogStatistics in com.db4o.bench.logging.statistics.
Given the file name of an I/O log file LogStatistics will produce an HTML file that contains a table with statistics:
|
| Count |
| % |
| Bytes |
| % |
| Reads |
| 664'631 |
| 30.64 |
| 52'214'278 |
| 70.83 |
| Writes |
| 360'001 |
| 16.6 |
| 21'508'576 |
| 29.17 |
| Seeks |
| 1'024'632 |
| 47.23 |
|
|
| Syncs |
| 120'005 |
| 5.53 |
|
|
|
| Total |
| 2'169'269 |
|
|
| 73'722'854 |
|
Average byte count per read: 78
Average byte count per write: 59
This is the output of LogStatistics when run with the file generated by CrudApplication with 30k objects, which is the default setting for the Ant target run.benchmark.medium.
Typically sync operations are much rarer than seek operations. If you look at the source of CrudApplication you'll see that extra commit calls when deleting objects. These were inserted to to get
a higher sync count in the I/O log. It's possible that you also have to "tune" your application in a similar way to get good statistics.