InstantDB Project
About InstantDB
Project Mail Lists
Short History
Reporting Bugs
Screen Shots
3rd Party Examples
FAQs

Software
Downloads
Documentation
CVS Repositories
Roadmap
License

About Enhydra.org
Who We Are
News, Articles & Events
Getting Involved
Contact Us

Community
Demos
Contributions
Resources
Case Studies
On The Edge! -NEW-
Commercial Vendors


Importing and Exporting Data



InstantDB provides the facility to import data from text files or from other JDBC™ data sources. Data can be exported to text files only. Imports are achieved using the SQL statement:

IMPORT table name FROM [URL] data source [USING schema] [BUFFER rows]

The data source defines the file or JDBC™ URL containing the data to be imported. For text imports, the schema is a text file which describes the columns to be added to the new table, and how they are held in the import file. For JDBC™ imports, the schema is a SELECT statement to be executed against the data source.


  Previous Release

     Basic

     Advanced      Reference

The optional BUFFER keyword allows the number of disk writes during an import to be reduced. The number of rows specified are buffered in memory and only written out when the buffer fills or the import is complete. Note that if a crash occurs during a buffered import, the safest action is to drop the table and start again.

As of InstantDB 1.8, each update to the database is explicitly sync()'ed after it has been updated. It is extremely important to add a large BUFFER value. Values of a few thousand should be considered.

Importing Data from JDBC™ Data Sources

InstantDB has the facility to import data from any JDBC™ compliant data source (and therefore, via the JDBC™-ODBC bridge, from any ODBC data source). An example import command is shown below:

IMPORT catalog FROM URL "jdbc:odbc:Northwind2"
USING "SELECT * FROM catalog" BUFFER 2048

This command imports a table called catalog from the JDBC™ URL jdbc:odbc:Northwind2. Note that, in order to access ODBC data sources, you must have previously defined an ODBC Data Source Name (DSN). In the above example, a DSN of Northwind2 is assumed to have already been set up on the target system (see the Windows Control Panel ODBC management tool for details). Remember that you must also have the sun.jdbc.odbc.JdbcOdbcDriver loaded (or Micrsoft's equivalent: com.ms.jdbc.odbc.JdbcOdbcDriver if you are running under the Microsoft Java™ virtual machine).

The query shown will be executed against the JDBC™ data source and the corresponding result set is used to populate the new table.

The JDBC™-ODBC bridge has been succesfully used to import data from:

  1. flat text files;
  2. DBase files;
  3. Microsoft Excel workbooks;
  4. Microsoft Access database files.

Use of third party JDBC™/ODBC drivers allows data to be easily imported from all of the major RDBMS vendors.

The column data types of the result set retrieved are used to determine the columns of the new table. However, this can be unsatisfactory in several ways. JDBC™ drivers do not always correctly report the correct column details for the retrieved columns. Also, there is no direct JDBC™ analogue of InstantDB's Currency type. Consequently, InstantDB is forced to make assumptions about the intended data types.

A better way to create the table, is to execute a CREATE TABLE command before performing the IMPORT. When InstantDB detects that the file being imported already exists, then it ignores the received column data types and adds the retrieved data to the existing table.

Specifying Usernames and Passwords for JDBC™ Data Sources

Data sources accessed via the JDBC™-ODBC bridge can specify usernames and passwords in the DSN settings. However, this is not always appropriate and is of no use to pure Java™ JDBC™ drivers.

Default values can be set for the usernames and passwords used for imports from JDBC™ data sources using the SQL command:

SET DEFAULT_USERNAME uname [DEFAULT_PASSWORD pword]

These can be overidden for specific tables by including usernames and passwords in the InstantDB database properties file. Usernames and passwords supplied this way must be provided on a per table basis and have the form:

tablename%username=value
tablename$password=value
For example, if you are importing to a table called "orders" you might have entries similar to:
orders%username=fred
orders$password=xyz123

Note that the username setting uses % and the password setting uses $ as their respective separators. This is necessary in JDK 1.1 and earlier where at least nine characters in a string must differ in order to have a chance of generating unique hash values and therefore unique properties settings.

Why are usernames and passwords table based rather than connection based?

It would perhaps make more sense for usernames and passwords to be applied to a connection rather than to individual tables, especially if a large number of tables are being imported from a single data source. However, this would either involve extending the syntax of the IMPORT command to quite a combersome extent, or it would involve including URLs in the properties file. The latter option is not possible as java.util.Properties interprets the ':' character as a special character.

Importing Data from Text Files

An example command to import data from a text file is shown below:

IMPORT import1 FROM "Import1.txt" USING "import_schema.txt"
This imports a table called import1 from data contained in a file called Import1.txt. The format of the data file and the columns in the table are defined in a schema file called import_schema.txt. Note that the file extension (txt in the above example) must not correspond to an InstantDB reserved word (e.g. import1.asc wouldn't work). The schema file, import_schema.txt, might look something like this:
[Import1.txt]
ColNameHeader=False
CharacterSet=ANSI Format=CSVDelimited Col1=ProductID Integer Col2=ProductName Char Width 40 Col3=SupplierID Integer Col4=CategoryID Integer Col5=QuantityPerUnit Char Width 20 Col6=UnitPrice Currency Col7=UnitsInStock Short Col8=UnitsOnOrder Short Col9=ReorderLevel Short Col10=Discontinued Bit

Notice that the name of the table being imported must match the name of the file that the data is being imported from in the schema header. Multiple import file and table definitions may be included in the same file under their own [filename] sections.

In the above example, there are no column headers in the input file and the columns are comma separated values. An example of the start of the data file might look something like this:

1,"Chai",1,1,"10 boxes x 20 bags",$18.00,39,0,10,0
2,"Chang",1,1,"24 - 12 oz bottles",$19.00,17,40,25,0
3,"Aniseed Syrup",1,2,"12 - 550 ml bottles",$10.00,13,70,25,0

The schema file syntax is a subset of the syntax supported by Microsoft's JET (c) database engine. An example schema file, import_schema.txt, is held in the Sample sub-directory. The only thing not included in the sample schema file is the alternative FORMAT directives. The full syntax of the format line is:

FORMAT={FIXEDLENGTH|DELIMITED(delimiter)|TABDELIMITED|CSVDELIMITED|AUTO} [STRICT]

Using the above FORMAT line, it is possible to import fixed length fields, comma separated values (CSV), tab separated fields, or values separated by a user specified field delimiter. The STRICT keyword is used to allow column delimiters to take precedence over string delimiters. By default, if the import encounters either a single quote ', or a double quote ", this is taken as the start of a string constant. The string only terminates when a corresponding quote is found. The import will include any column delimiters in the string constant and will even read across newlines.

There are two reasons for this default behaviour.

  1. It allows column delimiters to be included in string constants.
  2. It allows files exported by Microsoft Access (c) to be imported. Access sometimes splits string constants across multiple lines when large strings are exported.

The STRICT keyword ensures that whatever is between the chosen column delimiters is always strictly included in the column. String delimiters are then ignored.

The AUTO format indicates that a test table with the indicated column properties should be created. When the schema specifies an automatically generated table, then the file from which data is to be imported should contain a single line containing the number of rows to be generated.

The IMPORT command checks to see whether the table to be imported already exists or not. If it does, then the data being imported will be added to the existing table and the schema file is ignored. Import files are assumed to be CSV delimited with no headers in this case. Creating a table before performing the import has the advantage that it allows primary keys, other indexes, and column restrictions to be specified in advance of the import.

Exporting Data to Text Files

Data is exported by using the SET EXPORT sql command. All subsequent results sets are exported to the given file. The full syntax of the SET EXPORT command is:

SET EXPORT <filename> [CSVDELIMITED|FIXEDLENGTH|DELIMITED(<delimiter>)] [COLNAMEHEADER] [ROWNUMBERS] [QUOTE <quoteChar>] [[NO]SQL] [CONTROLCOL] [SUMMARYHEADER] [TRACE <# level> [CONSOLE][TIME]]};

The SET EXPORT command is not designed to produce database reports. This can probably be better achieved using a JDBC™ based reporting facility. It is primarily intended to allow data to be exchanged with other programs which are not JDBC™ aware and therefore cannot import data directly from InstantDB.

The QUOTE clause allows each field output to be quoted using the given character. e.g. SET EXPORT "mylog.txt" CSVDELIMITED QUOTE "\"", would put double quotes around each field as was output.

Setting the filename to NULL switches data export back off.

Note that the SET EXPORT command affects the current thread of execution only. Each thread must execute its own SET EXPORT command if it wishes to record SELECT statements.

Speeding up Imports

If it is known in advance that a single database session will be devoted to importing large amounts of data, then a significant performance improvement can often be achieved by setting the following properties in the .prp file.
  1. transLevel=0 This switches off transaction recording and writes to the journal file.
  2. fastUpdate=1 This prevents InstantDB from synchronising disk updates every time a table is modified.
Remember to include a large BUFFER value at the end of the IMPORT statement. Values of 2048-8192 are recommended.