The optional BUFFER keyword allows the number
of disk writes during an import to be reduced. The number of rows
specified are buffered in memory and only written out when the buffer
fills or the import is complete. Note that if a crash occurs during
a buffered import, the safest action is to drop the table and start
again.
As of InstantDB 1.8, each update to the database is explicitly
sync()'ed after it has been updated. It is extremely important
to add a large BUFFER value. Values of a few thousand should be
considered.
Importing
Data from JDBC™ Data Sources
InstantDB has the facility to import data from any JDBC™ compliant
data source (and therefore, via the JDBC™-ODBC bridge, from any
ODBC data source). An example import command is shown below:
IMPORT catalog FROM URL "jdbc:odbc:Northwind2"
USING "SELECT * FROM catalog" BUFFER 2048
This command imports a table called catalog from the JDBC™
URL jdbc:odbc:Northwind2. Note that, in order to access ODBC
data sources, you must have previously defined an ODBC Data Source
Name (DSN). In the above example, a DSN of Northwind2 is
assumed to have already been set up on the target system (see the
Windows Control Panel ODBC management tool for details). Remember
that you must also have the sun.jdbc.odbc.JdbcOdbcDriver
loaded (or Micrsoft's equivalent: com.ms.jdbc.odbc.JdbcOdbcDriver
if you are running under the Microsoft Java™ virtual machine).
The query shown will be executed against the JDBC™ data source
and the corresponding result set is used to populate the new table.
The JDBC™-ODBC bridge has been succesfully used to import
data from:
- flat text files;
- DBase files;
- Microsoft Excel workbooks;
- Microsoft Access database files.
Use of third party JDBC™/ODBC drivers allows data to be easily
imported from all of the major RDBMS vendors.
The column data types of the result set retrieved are used to determine
the columns of the new table. However, this can be unsatisfactory
in several ways. JDBC™ drivers do not always correctly report the
correct column details for the retrieved columns. Also, there is
no direct JDBC™ analogue of InstantDB's Currency type. Consequently,
InstantDB is forced to make assumptions about the intended data
types.
A better way to create the table, is to execute a CREATE TABLE
command before performing the IMPORT. When InstantDB detects that
the file being imported already exists, then it ignores the received
column data types and adds the retrieved data to the existing table.
Specifying Usernames and Passwords
for JDBC™ Data Sources
Data sources accessed via the JDBC™-ODBC bridge can
specify usernames and passwords in the DSN settings. However, this
is not always appropriate and is of no use to pure Java™ JDBC™
drivers.
Default values can be set for the usernames and passwords used
for imports from JDBC™ data sources using the SQL command:
SET DEFAULT_USERNAME uname [DEFAULT_PASSWORD pword]
These can be overidden for specific tables by including usernames
and passwords in the InstantDB database properties file. Usernames
and passwords supplied this way must be provided on a per table
basis and have the form:
tablename%username=value
tablename$password=value
For example, if you are importing to a table called
"orders" you might have entries similar to:
orders%username=fred
orders$password=xyz123
Note that the username setting uses % and the password setting
uses $ as their respective separators. This is necessary in JDK
1.1 and earlier where at least nine characters in a string must
differ in order to have a chance of generating unique hash values
and therefore unique properties settings.
Why are usernames and passwords table based rather than connection
based?
It would perhaps make more sense for usernames and passwords
to be applied to a connection rather than to individual tables,
especially if a large number of tables are being imported from
a single data source. However, this would either involve extending
the syntax of the IMPORT command to quite a combersome extent,
or it would involve including URLs in the properties file. The
latter option is not possible as java.util.Properties interprets
the ':' character as a special character.
Importing
Data from Text Files
An example command to import data from a text file is shown
below:
IMPORT import1 FROM "Import1.txt" USING "import_schema.txt"
This imports a table called import1 from data contained in
a file called Import1.txt. The format of the data file and
the columns in the table are defined in a schema file called import_schema.txt.
Note that the file extension (txt in the above example) must not correspond
to an InstantDB reserved word (e.g. import1.asc wouldn't work). The
schema file, import_schema.txt, might look something like this:
[Import1.txt]
ColNameHeader=False
CharacterSet=ANSI
Format=CSVDelimited
Col1=ProductID Integer
Col2=ProductName Char Width 40
Col3=SupplierID Integer
Col4=CategoryID Integer
Col5=QuantityPerUnit Char Width 20
Col6=UnitPrice Currency
Col7=UnitsInStock Short
Col8=UnitsOnOrder Short
Col9=ReorderLevel Short
Col10=Discontinued Bit
Notice that the name of the table being imported must match the
name of the file that the data is being imported from in the schema
header. Multiple import file and table definitions may be included
in the same file under their own [filename] sections.
In the above example, there are no column headers in the input
file and the columns are comma separated values. An example of the
start of the data file might look something like this:
1,"Chai",1,1,"10 boxes x 20 bags",$18.00,39,0,10,0
2,"Chang",1,1,"24 - 12 oz bottles",$19.00,17,40,25,0
3,"Aniseed Syrup",1,2,"12 - 550 ml bottles",$10.00,13,70,25,0
The schema file syntax is a subset of the syntax supported by Microsoft's
JET (c) database engine. An example schema file, import_schema.txt,
is held in the Sample sub-directory. The only thing not included
in the sample schema file is the alternative FORMAT directives.
The full syntax of the format line is:
FORMAT={FIXEDLENGTH|DELIMITED(delimiter)|TABDELIMITED|CSVDELIMITED|AUTO}
[STRICT]
Using the above FORMAT line, it is possible to import fixed length
fields, comma separated values (CSV), tab separated fields, or values
separated by a user specified field delimiter. The STRICT keyword
is used to allow column delimiters to take precedence over string
delimiters. By default, if the import encounters either a single
quote ', or a double quote ", this is taken as the start of a string
constant. The string only terminates when a corresponding quote
is found. The import will include any column delimiters in the string
constant and will even read across newlines.
There are two reasons for this default behaviour.
- It allows column delimiters to be included in string constants.
- It allows files exported by Microsoft Access (c) to be imported.
Access sometimes splits string constants across multiple lines
when large strings are exported.
The STRICT keyword ensures that whatever is between the chosen
column delimiters is always strictly included in the column. String
delimiters are then ignored.
The AUTO format indicates that a test table with the indicated
column properties should be created. When the schema specifies an
automatically generated table, then the file from which data is
to be imported should contain a single line containing the number
of rows to be generated.
The IMPORT command checks to see whether the table to be imported
already exists or not. If it does, then the data being imported
will be added to the existing table and the schema file is ignored.
Import files are assumed to be CSV delimited with no headers in
this case. Creating a table before performing the import has the
advantage that it allows primary keys, other indexes, and column
restrictions to be specified in advance of the import.
Exporting Data to Text Files
Data is exported by using the SET EXPORT sql command. All subsequent
results sets are exported to the given file. The full syntax of
the SET EXPORT command is:
SET EXPORT <filename> [CSVDELIMITED|FIXEDLENGTH|DELIMITED(<delimiter>)]
[COLNAMEHEADER] [ROWNUMBERS] [QUOTE <quoteChar>] [[NO]SQL] [CONTROLCOL]
[SUMMARYHEADER] [TRACE <# level> [CONSOLE][TIME]]};
The SET EXPORT command is not designed to produce database reports.
This can probably be better achieved using a JDBC™ based reporting
facility. It is primarily intended to allow data to be exchanged
with other programs which are not JDBC™ aware and therefore cannot
import data directly from InstantDB.
The QUOTE clause allows each field output to be quoted using the
given character. e.g. SET EXPORT "mylog.txt" CSVDELIMITED QUOTE
"\"", would put double quotes around each field as was output.
Setting the filename to NULL switches data export back off.
Note that the SET EXPORT command affects the current thread of
execution only. Each thread must execute its own SET EXPORT command
if it wishes to record SELECT statements.
Speeding
up Imports
If it is known in advance that a single database
session will be devoted to importing large amounts of data, then
a significant performance improvement can often be achieved by setting
the following properties in the .prp file.
- transLevel=0 This switches off transaction recording and writes
to the journal file.
- fastUpdate=1 This prevents InstantDB from synchronising disk
updates every time a table is modified.
Remember to include a large BUFFER value at the end of the IMPORT
statement. Values of 2048-8192 are recommended.
|