Bulk Loading of Data

                         Document #: US-124927,EN
------------------------------------------------------------------------------

Major subject: data_mgmt    Minor subjects: access_methods

Keywords: copy

Abstract:
How to load large volumes of data quickly with minimal logging.
Expert Note:
SQL Statements To Load Large Volumes of Data
============================================

Three SQL statements can be used to load large volumes of data 
into INGRES tables: 
	COPY...FROM 
	CREATE...AS SELECT
	INSERT...SELECT. 

The speed and logging efficiency of these statements depends on 
which algorithm is used to store the data in the target table.  

The two available algorithms are bulk load and incremental load:

o bulk load 

  Bulk load is the fastest way to load data into a table and incurs 
  negligible logging (less than 500 bytes).  Bulk load is the faster 
  of the two algorithms because low level routines are used to store 
  the data in 16-page chunks, and because individual rows are not logged. 
  
  Bulk load is used only by the COPY...FROM and CREATE...AS SELECT 
  statements, and only when the table satisfies particular conditions - 
  the most significant of which is the requirement that the table be
  empty.

o incremental load 

  Incremental load is slower than bulk load because each row is 
  individually stored in the target table, and is individually logged. 
  Incremental logging involves a much greater logging volume than bulk 
  load.  Incremental load is always used by the INSERT...SELECT statement.

For optimal performance, you must ensure that the bulk load algorithm is
used.  The details of how to do so are discussed below.

When Using COPY...FROM
----------------------
The bulk load algorithm is used by COPY...FROM if the target table 
satisfies the following conditions:

o  The table is empty (has no rows).

o  The table contains less than 16 allocated pages.  The simplest way to 
   satisfy this condition is by using "modify...to truncated" before 
   beginning the load. Please note that HASH tables may exceed this limit
   because their data pages are preallocated.

o  The table is not journaled.  SET NOJOURNALING ON... to meet this
   condition.

o  The table has no secondary indexes. Use "modify...to truncated" to
   ensure that this condition is met.

o  The table has no system maintained columns.

o  The table is an INGRES table, not a Gateway table.

o  The COPY...FROM has or is able to acquire an exclusive table level
   lock on the target table.  This can be achieved by performing the
   load immediately after a MODIFY...TO TRUNCATED or by using SET
   LOCKMODE...LEVEL=TABLE.  If other users are accessing the target
   table, or SET LOCKMODE...LEVEL=PAGE applies to the target table,
   then COPY...FROM is prevented from acquiring this lock.

For best performance when loading non-HEAP tables, it is also
necessary to perform the MODIFY and LOAD operations in a specific
order.  The "Populating Tables" section in Chapter 6 of the "INGRES 
Database Administrator's Guide," is incorrect in suggesting that 
loading as HEAP results in the best performance in all cases.

If the target table structure is ISAM or BTREE, MODIFY the table to the
required structure before issuing the COPY...FROM for optimum performance.  
LOAD performance is also improved when the data is sorted into primary key 
order before the LOAD.  

If the target table structure is HASH, MODIFY the table to the 
required structure after loading the data as HEAP for optimum performance.

When using CREATE...AS SELECT
-----------------------------
Bulk load is used by CREATE...AS SELECT if the WITH JOURNALING clause is 
not used (including SET JOURNALING). The LOAD and MODIFY operations are 
automatically performed in the correct order.

To check whether the bulk load algorithm is being used, use SET
LOG_TRACE.  If a "Type: LOAD" record is logged then bulk load is in
use.

Note On SET NOLOGGING:
---------------------
SET NOLOGGING is sometimes used to speed up bulk loading and to remove the
threat of reaching the log file FORCE-ABORT limit. SET NOLOGGING should 
only be used if necessary and only with exclusive access to the database.  
In the event of a failure, the database must be recovered from a 
checkpoint.

SET NOLOGGING offers no advantage when the bulk load algorithm is used to 
load the table.

If the incremental load algorithm is in use, SET NOLOGGING saves on writes
to the log file; however, the load is still slower than a bulk load.


Releases affected:  6.4/00(all.all) -   Releases not affected: 
Errors:                                                             
Bugs/SIRS: 48456, 50626, 54980                                         
------------------------------------------------------------------------------
Ingres Database Reference
To William's Home Page

© William Yuan 2000

Email William