Note - in the current release only a single range of rows
for any given column can be cached. Future versions will include
multiple ranges.
InstantDB supports two algorithms for caching a column: CACHE_ROWS
and CACHE_PERCENT. CACHE_ROWS caches a fixed number of rows from
a column, whereas CACHE_PERCENT ensures that a fixed percentage
is cached (which can in fact be 100%). The default algorithm can
be altered using the cacheCondition property in the properties
file. The amount to be cached is controlled using the cacheAmount
property. Two additional properties, resultsSetCache and
resultsSetCacheAmount fulfill a similar roll for results
set tables.
InstantDB initially caches from row 1 up to some configurable
limit. However, it also tries to find a better location. When
a better cache location is found, the current cache is dropped
and the new cache is set up. Two properties control when the cache
gets reset. missesInCacheStats determines the number of
recent lookups which are included in tests for better locations.
For example, if this is set to 100, then the last 100 lookups
are included in tests for a better cache location. cacheResetPercent
controls how much better the alternative cache location has to
be. For example, if this is set to 10, then the new cache site
has to achieve a 10% better hit rate before a switch will take
place. Dropping a cache will obviously lead to a short term drop
in peformance. Setting cacheResetPercent appropriately ensures
that the long term gain will compensate for this.
In addition to setting global defaults for the caching algorithm,
InstantDB provides a non-standard SQL extension which can be applied
at table creation time to control the caching of individual columns.
Circumstances where this might prove useful would be where it
would be impractical to cache all of a table, but where caching
the primary key would improve the performance of joins. This additional
column condition has the syntax:
CACHE rows {ROWS|PERCENT}
For example:
CREATE TABLE table1 ( int1 int CACHE 10 PERCENT, int2 int CACHE 100 ROWS )
It is not possible to control the caching of index columns. Indexes
are always held entirely in memory and are only written to disk
as a result of a clean shutdown. Failure to close the database
cleanly, say as a result of an unhandled exception, causes the
database to enter a recovery mode when it is next opened. During
this recovery any indexes which have been modified are recreated
from their target tables.
Changing
a Column's Caching Algorithm
It is possible to change a column's caching algorithm after the
table has been created. This can be accomplished by updating the
$db$cols system table. This table has two columns that control
the caching algorithm: CacheCond and CacheAmnt.
CacheCond is set to 1 to cache a fixed number of rows. It is
set to 2 to cache a percentage of rows.
For example, in the "sample" database, all of the columns default
to caching 512 rows. To change the "ProductID" field of the "import1"
table to cache 100% instead, issue the following SQL statement:
UPDATE sample$db$Cols
SET cachecond=2,cacheamnt=100
WHERE colname="ProductID"
AND tableid=(
SELECT t.tableid
FROM sample$db$tables t
WHERE tablename="import1")
The database must then be shutdown and restarted for the new
caching algorithm to take effect.
Additional Caches
As of InstantDB version 1.4, the Recent Row cache has been removed.
Instead, non-cached rows are instantiated directly from the table
read-ahead buffer.
InstantDB reads from tables in blocks of rows at a time. The
number of rows read in a single IO is governed by the rowCacheSize
property. The optimum value of this property will vary depending
on the configuration of your system and on the characteristics
of your database. However, a value of roughly 128 to 256 often
produces good results.
Order By
The disk cache has been designed with the assumption
that tables are scanned from the first row to the last. This assumption
is untrue where the ORDER BY clause is used. To increase performance
in these circumstances, it is highly recommended that all rows in
results sets be cached. i.e. Use the following settings in the database
properties file:
resultsSetCache=CACHE_PERCENT
resultsSetCacheAmount=100
InstantDB orders results using a binary tree. This gives good
results for random data. Although InstantDB performs tree balancig
during a sort, partially sorted columns can lead to reduced performance.
Developers should be aware of this when using the ORDER BY clause
in a SELECT.
Speeding
up Updates
InstantDB writes to the database after every add, delete or update
gets committed. This can reduce performance in heavily updating
applications. However, it ensures that, after a crash, the database
is highly likely to be recoverable to a consistent state.
In some situations, recoverability is not important - say because
the data can be obtained from another source, and the database
can be easily rebuilt. In such cases, setting fastUpdate=1
in the database properties file will suppress must of the sync()
calls and will rely instead on the underlying file system's algorithm
for synchronising its cache with the disk. On some systems this
can lead to a dramatic increase in performance for heavily updating
systems.
Handling Large Queries
By default, InstantDB holds results sets entirely in memory.
This improves query performance and is adequate for most applications.
However, for very large results sets, especially where those
results sets must be retained for a relatively long period of
time, this can lead to memory shortages. Setting the property
resultsOnDisk=1 causes InstantDB to save its results sets
on disk.
Index Usage
In the current release, the only time an index gets used is when
it is an index on a single column and the expression involving
that column is of the form: column=column, or column=constant.
Attempt anything more complex and the index is simply ignored.
In practise, this should cover most common requests.
Any use of the keyword OR, will cause all expressions in the
tables in the OR clause to be run unoptimised. Trying to optimise
OR has turned out to be a major headache which simply lead to
a whole succession of bugs.
Performance
Generally speaking, quoting figures for database performance
is notoriously error prone. The smallest variations in a system
configuration can radically alter the results. The only way to
determine conclusively how your application will fair under any
database is to try it out.
With that said, the directory doc/performance contains a sample
database properties file and a script which can be run to test
out various types of lookup. The script creates a test table with
five columns:
- pk - An Integer column. An index is added to provide fast
lookups.
- int1 - Another Integer column. This is modified to cache 100%
of its data.
- int2 - Another Integer column. Uses the database properties
default caching (512 rows).
- str1 - A Char(20) column. This is modified to cache 100%.
- str2 - Another Char(20) column. Uses default caching.
Rows are then auto-generated, 20,000 at a time. After each auto-generate,
a simple query is performed against each column. Each query is
performed twice to maximise the effect of caching.
The results for Windows NT 4, FAT file system, JDK 1.1.5 with
the JIT enabled, using a P166, 96Mb, Intel 430HX chipset, PIO
mode 4 IDE drive, are shown in the table below. The disk had nearly
1Gb of free space and had been freshly de-fragmented.
Column |
20,000 |
40,000 |
60,000 |
80,000 |
pk |
0.030 |
0.020 |
0.020 |
0.030 |
int1 |
0.621 |
0.651 |
1.022 |
0.741 |
int2 |
0.601 |
1.152 |
1.723 |
2.283 |
str1 |
0.601 |
0.651 |
0.701 |
0.771 |
str2 |
0.601 |
1.172 |
1.723 |
2.303 |
The figures with the JIT disabled are almost twice the above
values.
On the same setup, but using the Microsoft Java Virtual Machine™
supplied with SDK 2.0:
Column |
20,000 |
40,000 |
60,000 |
80,000 |
pk |
0.020 |
0.020 |
0.020 |
0.040 |
int1 |
0.160 |
0.191 |
0.240 |
0.251 |
int2 |
0.161 |
0.310 |
0.460 |
0.581 |
str1 |
0.170 |
0.210 |
0.260 |
0.300 |
str2 |
0.171 |
0.320 |
0.460 |
0.611 |
What these tables show is the time (in seconds) to lookup between
1 and 4 rows from each column type, using a simple:
SELECT * FROM table WHERE col=value
As expected, indexed lookups are by far the fastest. With 30
to 50 lookups per second easily achievable.
The effects of caching a column are also clearly visible, becoming
more and more pronounced as more rows are added. Full table scans
are up to three times faster when the lookup column is held entirely
in memory.
It's worth noting that, apart from an anomolous timing for int1
when 60,000 rows were present, the non-indexed lookups increase
roughly linearly. This indicates that, for this class of queries
at least, InstantDB should scale in a predictable fashion.
To run the performance tests on your machine, install InstantDB
as described in downloads.
Change to the doc/performance directory, and type the command:
We'd be very interested to hear how these figures
compare on various machines and configurations. If you feel like
reporting your test results, then go through the configuration information
given above, and try to include as much of the same information
as you can. You can email your results to: webmaster@enhydra.org
|