Basic Topics | Advanced Topics | Reference Topics | |
Documentation Home Page |
Column Caches
Changing a Column's Caching Algorithm
Additional Caches
ORDER BY
Speeding up Updates
Handling Large Queries
Index Usage
Performance
InstantDB has been tried out with tables of over one million rows. In these circumstances, setting the tuning parameters correctly can have a dramatic effect on performance.
The main thing that has to be tuned are the column caches. InstantDB doesn't cache "pages". Instead, it holds sections of individual columns in memory.
Note - in the current release only a single range of rows for any given column can be cached. Future versions will include multiple ranges.
InstantDB supports two algorithms for caching a column: CACHE_ROWS and CACHE_PERCENT. CACHE_ROWS caches a fixed number of rows from a column, whereas CACHE_PERCENT ensures that a fixed percentage is cached (which can in fact be 100%). The default algorithm can be altered using the cacheCondition property in the properties file. The amount to be cached is controlled using the cacheAmount property. Two additional properties, resultsSetCache and resultsSetCacheAmount fulfill a similar roll for results set tables.
InstantDB initially caches from row 1 up to some configurable limit. However, it also tries to find a better location. When a better cache location is found, the current cache is dropped and the new cache is set up. Two properties control when the cache gets reset. missesInCacheStats determines the number of recent lookups which are included in tests for better locations. For example, if this is set to 100, then the last 100 lookups are included in tests for a better cache location. cacheResetPercent controls how much better the alternative cache location has to be. For example, if this is set to 10, then the new cache site has to achieve a 10% better hit rate before a switch will take place. Dropping a cache will obviously lead to a short term drop in performance. Setting cacheResetPercent appropriately ensures that the long term gain will compensate for this.
In addition to setting global defaults for the caching algorithm, InstantDB provides a non-standard SQL extension which can be applied at table creation time to control the caching of individual columns. Circumstances where this might prove useful would be where it would be impractical to cache all of a table, but where caching the primary key would improve the performance of joins. This additional column condition has the syntax:
CACHE rows {ROWS|PERCENT}
For example:
CREATE TABLE table1 ( int1 int CACHE 10 PERCENT, int2 int CACHE 100 ROWS )
It is not possible to control the caching of index columns. Indexes are always held entirely in memory and are only written to disk as a result of a clean shutdown. Failure to close the database cleanly, say as a result of an unhandled exception, causes the database to enter a recovery mode when it is next opened. During this recovery any indexes which have been modified are recreated from their target tables.
It is possible to change a column's caching algorithm after the table has been created. This can be accomplished by updating the $db$cols system table. This table has two columns that control the caching algorithm: CacheCond and CacheAmnt.
CacheCond is set to 1 to cache a fixed number of rows. It is set to 2 to cache a percentage of rows.
For example, in the "sample" database, all of the columns default to caching 512 rows. To change the "ProductID" field of the "import1" table to cache 100% instead, issue the following SQL statement:
UPDATE sample$db$Cols SET cachecond=2,cacheamnt=100 WHERE colname="ProductID" AND tableid=( SELECT t.tableid FROM sample$db$tables t WHERE tablename="import1")
The database must then be shutdown and restarted for the new caching algorithm to take effect.
As of InstantDB version 1.4, the Recent Row cache has been removed. Instead, non-cached rows are instantiated directly from the table read-ahead buffer.
InstantDB reads from tables in blocks of rows at a time. The number of rows read in a single IO is governed by the rowCacheSize property. The optimum value of this property will vary depending on the configuration of your system and on the characteristics of your database. However, a value of roughly 128 to 256 often produces good results.
The disk cache has been designed with the assumption that tables are scanned from the first row to the last. This assumption is untrue where the ORDER BY clause is used. To increase performance in these circumstances, it is highly recommended that all rows in results sets be cached. i.e., Use the following settings in the database properties file:
resultsSetCache=CACHE_PERCENT resultsSetCacheAmount=100
InstantDB orders results using a binary tree. This gives good results for random data. Although InstantDB performs tree balancing during a sort, partially sorted columns can lead to reduced performance. Developers should be aware of this when using the ORDER BY clause in a SELECT.
InstantDB writes to the database after every add, delete or update gets committed. This can reduce performance in heavily updating applications. However, it ensures that, after a crash, the database is highly likely to be recoverable to a consistent state.
In some situations, recoverability is not important - say because the data can be obtained from another source, and the database can be easily rebuilt. In such cases, setting fastUpdate=1 in the database properties file will suppress must of the sync() calls and will rely instead on the underlying file system's algorithm for synchronizing its cache with the disk. On some systems this can lead to a dramatic increase in performance for heavily updating systems.
By default, InstantDB holds results sets entirely in memory. This improves query performance and is adequate for most applications.
However, for very large results sets, especially where those results sets must be retained for a relatively long period of time, this can lead to memory shortages. Setting the property resultsOnDisk=1 causes InstantDB to save its results sets on disk.
In the current release, the only time an index gets used is when it is an index on a single column and the expression involving that column is of the form: column=column, or column=constant. Attempt anything more complex and the index is simply ignored. In practise, this should cover most common requests.
Any use of the keyword OR, will cause all expressions in the tables in the OR clause to be run unoptimized. Trying to optimize OR has turned out to be a major headache which simply lead to a whole succession of bugs.
Generally speaking, quoting figures for database performance is notoriously error prone. The smallest variations in a system configuration can radically alter the results. The only way to determine conclusively how your application will fair under any database is to try it out.
With that said, the directory doc/performance contains a sample database properties file and a script which can be run to test out various types of lookup. The script creates a test table with five columns:
Rows are then auto-generated, 20,000 at a time. After each auto-generate, a simple query is performed against each column. Each query is performed twice to maximize the effect of caching.
The results for Windows NT 4, FAT file system, JDK 1.1.5 with the JIT enabled, using a P166, 96Mb, Intel 430HX chipset, PIO mode 4 IDE drive, are shown in the table below. The disk had nearly 1Gb of free space and had been freshly defragmented.
Column | 20,000 | 40,000 | 60,000 | 80,000 |
---|---|---|---|---|
pk | 0.030 | 0.020 | 0.020 | 0.030 |
int1 | 0.621 | 0.651 | 1.022 | 0.741 |
int2 | 0.601 | 1.152 | 1.723 | 2.283 |
str1 | 0.601 | 0.651 | 0.701 | 0.771 |
str2 | 0.601 | 1.172 | 1.723 | 2.303 |
The figures with the JIT disabled are almost twice the above values.
On the same setup, but using the Microsoft Java Virtual Machine supplied with SDK 2.0:
Column | 20,000 | 40,000 | 60,000 | 80,000 |
---|---|---|---|---|
pk | 0.020 | 0.020 | 0.020 | 0.040 |
int1 | 0.160 | 0.191 | 0.240 | 0.251 |
int2 | 0.161 | 0.310 | 0.460 | 0.581 |
str1 | 0.170 | 0.210 | 0.260 | 0.300 |
str2 | 0.171 | 0.320 | 0.460 | 0.611 |
What these tables show is the time (in seconds) to lookup between 1 and 4 rows from each column type, using a simple:
SELECT * FROM table WHERE col=value
As expected, indexed lookups are by far the fastest. With 30 to 50 lookups per second easily achievable.
The effects of caching a column are also clearly visible, becoming more and more pronounced as more rows are added. Full table scans are up to three times faster when the lookup column is held entirely in memory.
It's worth noting that, apart from an anomalous timing for int1 when 60,000 rows were present, the non-indexed lookups increase roughly linearly. This indicates that, for this class of queries at least, InstantDB should scale in a predictable fashion.
To run the performance tests on your machine, install InstantDB as described in downloads. Change to the doc/performance directory, and type the command:
java sample perf.txt
We'd be very interested to hear how these figures compare on various machines and configurations. If you feel like reporting your test results, then go through the configuration information given above, and try to include as much of the same information as you can. You can email your results to: webmaster@enhydra.org