Document #: US-124927,EN ------------------------------------------------------------------------------ Major subject: data_mgmt Minor subjects: access_methods Keywords: copy Abstract: How to load large volumes of data quickly with minimal logging. Expert Note: SQL Statements To Load Large Volumes of Data ============================================ Three SQL statements can be used to load large volumes of data into INGRES tables: COPY...FROM CREATE...AS SELECT INSERT...SELECT. The speed and logging efficiency of these statements depends on which algorithm is used to store the data in the target table. The two available algorithms are bulk load and incremental load: o bulk load Bulk load is the fastest way to load data into a table and incurs negligible logging (less than 500 bytes). Bulk load is the faster of the two algorithms because low level routines are used to store the data in 16-page chunks, and because individual rows are not logged. Bulk load is used only by the COPY...FROM and CREATE...AS SELECT statements, and only when the table satisfies particular conditions - the most significant of which is the requirement that the table be empty. o incremental load Incremental load is slower than bulk load because each row is individually stored in the target table, and is individually logged. Incremental logging involves a much greater logging volume than bulk load. Incremental load is always used by the INSERT...SELECT statement. For optimal performance, you must ensure that the bulk load algorithm is used. The details of how to do so are discussed below. When Using COPY...FROM ---------------------- The bulk load algorithm is used by COPY...FROM if the target table satisfies the following conditions: o The table is empty (has no rows). o The table contains less than 16 allocated pages. The simplest way to satisfy this condition is by using "modify...to truncated" before beginning the load. Please note that HASH tables may exceed this limit because their data pages are preallocated. o The table is not journaled. SET NOJOURNALING ON... to meet this condition. o The table has no secondary indexes. Use "modify...to truncated" to ensure that this condition is met. o The table has no system maintained columns. o The table is an INGRES table, not a Gateway table. o The COPY...FROM has or is able to acquire an exclusive table level lock on the target table. This can be achieved by performing the load immediately after a MODIFY...TO TRUNCATED or by using SET LOCKMODE...LEVEL=TABLE. If other users are accessing the target table, or SET LOCKMODE...LEVEL=PAGE applies to the target table, then COPY...FROM is prevented from acquiring this lock. For best performance when loading non-HEAP tables, it is also necessary to perform the MODIFY and LOAD operations in a specific order. The "Populating Tables" section in Chapter 6 of the "INGRES Database Administrator's Guide," is incorrect in suggesting that loading as HEAP results in the best performance in all cases. If the target table structure is ISAM or BTREE, MODIFY the table to the required structure before issuing the COPY...FROM for optimum performance. LOAD performance is also improved when the data is sorted into primary key order before the LOAD. If the target table structure is HASH, MODIFY the table to the required structure after loading the data as HEAP for optimum performance. When using CREATE...AS SELECT ----------------------------- Bulk load is used by CREATE...AS SELECT if the WITH JOURNALING clause is not used (including SET JOURNALING). The LOAD and MODIFY operations are automatically performed in the correct order. To check whether the bulk load algorithm is being used, use SET LOG_TRACE. If a "Type: LOAD" record is logged then bulk load is in use. Note On SET NOLOGGING: --------------------- SET NOLOGGING is sometimes used to speed up bulk loading and to remove the threat of reaching the log file FORCE-ABORT limit. SET NOLOGGING should only be used if necessary and only with exclusive access to the database. In the event of a failure, the database must be recovered from a checkpoint. SET NOLOGGING offers no advantage when the bulk load algorithm is used to load the table. If the incremental load algorithm is in use, SET NOLOGGING saves on writes to the log file; however, the load is still slower than a bulk load. Releases affected: 6.4/00(all.all) - Releases not affected: Errors: Bugs/SIRS: 48456, 50626, 54980 ------------------------------------------------------------------------------
© William Yuan 2000
Email William