InstantDB Project About InstantDB Project Mail Lists Short History Reporting Bugs Screen Shots 3rd Party Examples FAQs Software Downloads Documentation CVS Repositories Roadmap License

About Enhydra.org Who We Are News, Articles & Events Getting Involved Contact Us Community Demos Contributions Resources Case Studies On The Edge! -NEW- Commercial Vendors

Tuning

Column Caches

Changing a Column's Caching Algorithm

Additional Caches

ORDER BY
Speeding up Updates

Handling Large Queries

Index Usage

Performance

InstantDB has been tried out with tables of over one million rows. In these circumstances, setting the tuning parameters correctly can have a dramatic effect on performance.

Column Caches

The main thing that has to be tuned are the column caches. InstantDB doesn't cache "pages". Instead, it holds sections of individual columns in memory.

  Previous Release
     Basic

Release Notes
Installation
Tips
Data Types
     Advanced

Networking
Tuning
Transactions
Readonly Mode
Import & Export
JDBC™
Triggers
Encryption
User-Defined Functions
     Reference

Syntax
Database Properties
Javadoc
Debugging
Known Bugs

Note - in the current release only a single range of rows for any given column can be cached. Future versions will include multiple ranges.

InstantDB supports two algorithms for caching a column: CACHE_ROWS and CACHE_PERCENT. CACHE_ROWS caches a fixed number of rows from a column, whereas CACHE_PERCENT ensures that a fixed percentage is cached (which can in fact be 100%). The default algorithm can be altered using the cacheCondition property in the properties file. The amount to be cached is controlled using the cacheAmount property. Two additional properties, resultsSetCache and resultsSetCacheAmount fulfill a similar roll for results set tables.

InstantDB initially caches from row 1 up to some configurable limit. However, it also tries to find a better location. When a better cache location is found, the current cache is dropped and the new cache is set up. Two properties control when the cache gets reset. missesInCacheStats determines the number of recent lookups which are included in tests for better locations. For example, if this is set to 100, then the last 100 lookups are included in tests for a better cache location. cacheResetPercent controls how much better the alternative cache location has to be. For example, if this is set to 10, then the new cache site has to achieve a 10% better hit rate before a switch will take place. Dropping a cache will obviously lead to a short term drop in peformance. Setting cacheResetPercent appropriately ensures that the long term gain will compensate for this.

In addition to setting global defaults for the caching algorithm, InstantDB provides a non-standard SQL extension which can be applied at table creation time to control the caching of individual columns. Circumstances where this might prove useful would be where it would be impractical to cache all of a table, but where caching the primary key would improve the performance of joins. This additional column condition has the syntax:

CACHE rows {ROWS|PERCENT}

For example:

CREATE TABLE table1 (
   int1 int CACHE 10 PERCENT,
   int2 int CACHE 100 ROWS
)

It is not possible to control the caching of index columns. Indexes are always held entirely in memory and are only written to disk as a result of a clean shutdown. Failure to close the database cleanly, say as a result of an unhandled exception, causes the database to enter a recovery mode when it is next opened. During this recovery any indexes which have been modified are recreated from their target tables.

Changing a Column's Caching Algorithm

It is possible to change a column's caching algorithm after the table has been created. This can be accomplished by updating the $db$cols system table. This table has two columns that control the caching algorithm: CacheCond and CacheAmnt.

CacheCond is set to 1 to cache a fixed number of rows. It is set to 2 to cache a percentage of rows.
For example, in the "sample" database, all of the columns default to caching 512 rows. To change the "ProductID" field of the "import1" table to cache 100% instead, issue the following SQL statement:

UPDATE sample$db$Cols 
        SET cachecond=2,cacheamnt=100 
        WHERE colname="ProductID" 
        AND tableid=(
                SELECT t.tableid 
                FROM sample$db$tables t 
                WHERE tablename="import1")

The database must then be shutdown and restarted for the new caching algorithm to take effect.

Additional Caches

As of InstantDB version 1.4, the Recent Row cache has been removed. Instead, non-cached rows are instantiated directly from the table read-ahead buffer.

InstantDB reads from tables in blocks of rows at a time. The number of rows read in a single IO is governed by the rowCacheSize property. The optimum value of this property will vary depending on the configuration of your system and on the characteristics of your database. However, a value of roughly 128 to 256 often produces good results.

Order By

The disk cache has been designed with the assumption that tables are scanned from the first row to the last. This assumption is untrue where the ORDER BY clause is used. To increase performance in these circumstances, it is highly recommended that all rows in results sets be cached. i.e. Use the following settings in the database properties file:

resultsSetCache=CACHE_PERCENT
resultsSetCacheAmount=100

InstantDB orders results using a binary tree. This gives good results for random data. Although InstantDB performs tree balancig during a sort, partially sorted columns can lead to reduced performance. Developers should be aware of this when using the ORDER BY clause in a SELECT.

Speeding up Updates

InstantDB writes to the database after every add, delete or update gets committed. This can reduce performance in heavily updating applications. However, it ensures that, after a crash, the database is highly likely to be recoverable to a consistent state.

In some situations, recoverability is not important - say because the data can be obtained from another source, and the database can be easily rebuilt. In such cases, setting fastUpdate=1 in the database properties file will suppress must of the sync() calls and will rely instead on the underlying file system's algorithm for synchronising its cache with the disk. On some systems this can lead to a dramatic increase in performance for heavily updating systems.

Handling Large Queries

By default, InstantDB holds results sets entirely in memory. This improves query performance and is adequate for most applications.

However, for very large results sets, especially where those results sets must be retained for a relatively long period of time, this can lead to memory shortages. Setting the property resultsOnDisk=1 causes InstantDB to save its results sets on disk.

Index Usage

In the current release, the only time an index gets used is when it is an index on a single column and the expression involving that column is of the form: column=column, or column=constant. Attempt anything more complex and the index is simply ignored. In practise, this should cover most common requests.

Any use of the keyword OR, will cause all expressions in the tables in the OR clause to be run unoptimised. Trying to optimise OR has turned out to be a major headache which simply lead to a whole succession of bugs.

Performance

Generally speaking, quoting figures for database performance is notoriously error prone. The smallest variations in a system configuration can radically alter the results. The only way to determine conclusively how your application will fair under any database is to try it out.

With that said, the directory doc/performance contains a sample database properties file and a script which can be run to test out various types of lookup. The script creates a test table with five columns:

pk - An Integer column. An index is added to provide fast lookups.
int1 - Another Integer column. This is modified to cache 100% of its data.
int2 - Another Integer column. Uses the database properties default caching (512 rows).
str1 - A Char(20) column. This is modified to cache 100%.
str2 - Another Char(20) column. Uses default caching.

Rows are then auto-generated, 20,000 at a time. After each auto-generate, a simple query is performed against each column. Each query is performed twice to maximise the effect of caching.

The results for Windows NT 4, FAT file system, JDK 1.1.5 with the JIT enabled, using a P166, 96Mb, Intel 430HX chipset, PIO mode 4 IDE drive, are shown in the table below. The disk had nearly 1Gb of free space and had been freshly de-fragmented.

Column	20,000	40,000	60,000	80,000
pk	0.030	0.020	0.020	0.030
int1	0.621	0.651	1.022	0.741
int2	0.601	1.152	1.723	2.283
str1	0.601	0.651	0.701	0.771
str2	0.601	1.172	1.723	2.303

The figures with the JIT disabled are almost twice the above values.

On the same setup, but using the Microsoft Java Virtual Machine™ supplied with SDK 2.0:

Column	20,000	40,000	60,000	80,000
pk	0.020	0.020	0.020	0.040
int1	0.160	0.191	0.240	0.251
int2	0.161	0.310	0.460	0.581
str1	0.170	0.210	0.260	0.300
str2	0.171	0.320	0.460	0.611

What these tables show is the time (in seconds) to lookup between 1 and 4 rows from each column type, using a simple:

SELECT * FROM table WHERE col=value

As expected, indexed lookups are by far the fastest. With 30 to 50 lookups per second easily achievable.

The effects of caching a column are also clearly visible, becoming more and more pronounced as more rows are added. Full table scans are up to three times faster when the lookup column is held entirely in memory.

It's worth noting that, apart from an anomolous timing for int1 when 60,000 rows were present, the non-indexed lookups increase roughly linearly. This indicates that, for this class of queries at least, InstantDB should scale in a predictable fashion.

To run the performance tests on your machine, install InstantDB as described in downloads. Change to the doc/performance directory, and type the command:

java sample perf.txt

We'd be very interested to hear how these figures compare on various machines and configurations. If you feel like reporting your test results, then go through the configuration information given above, and try to include as much of the same information as you can. You can email your results to: webmaster@enhydra.org