Bulk loading

> Does anyone have a definitive answer to whether you get better
> performance during the load and indexing by adding a btree index before
> doing a copy in to a new table or after.  I have heard people claim each
> and don't have a clue myself. In fact, instructors in two different
> classes I took during the last two years gave differing opinions.
> 
> Steve

Hi Steve,

    This may depend on what version of Ingres you are running. This is mainly
    due to the amount of logging that may go on. In 6.4 the only way to do
    this is to the copy first and then do the modify. The reason being that
    the copy into an empty heap table gets you the Bulk Loading logging ie
    very little is logged, this provides a massive performance improvement in
    its own right. If you put a structure on the table then you will get every
    insert into the table logged. This overhead is appalling. Furthermore by
    placing the structure on an empty table and then loading data into it the
    work going on behind the scenes to keep the index in order will be
    horriffic.

    Under OI1.2 I suspect the answer may be different due to the improved
    logging algorithms. Indeed I've noticed that Ingres's copydb routine
    always puts the structure on the table first and then copies into the
    table. I havent tested this recently ona large table to see what happens.
    But I must admit to being a triffle concerned about it.

    Martin Bowes

    PS If you want to see just how bad it can get try playing with a copy into
    an ISAM table.


Steve,

The best way to get a definitive answer is to conduct a test
in your specific environment due to the many number of 
variables (multiple disk locations for data and sort, hardware
performance, versions, etc.).  Within a few minutes you 
should have your answer.

My experience has shown the load into a pre-existing
structure can be significantly faster (sometimes as much
as 20-33%) than loading to a heap table and then modifying
that table to another structure.  Note:  This should NOT be 
confused with the performance of loading into a HEAP vs.
other structures when a modify is NOT required (the heap
will be faster).

The problem is that faster is not always better, and that 
is generally true in this case.  I would prefer to have a
balanced index that took a little longer to create than
a fast one that had overflow.

Chip Nickolett           ChipN@Comp-Soln.com
Comprehensive Consulting Solutions, Inc.   (www.Comp-Soln.com)
Phone:  262-544-9954     Fax:  262-544-1236



Fawthrop, Stephen G., PhD.  wrote:


Up through 6.4/04, you had to copy into an empty non-journalled non-indexed
heap to get a bulk load.  Anything else did row-at-a-time inserts
with logging.

In 6.4/05 and later, you can bulk copy into an empty non-journalled
non-indexed btree too.  It's faster than copying into a heap and then
modifying;  faster by about the time it takes to read and write the
data (once each).  I think you can bulk-load into empty hash tables
too, but there is some funny restriction (like it had to be under
16 pages).

OpenIngres is about the same as 6.4/05.  I think the hash bulk load
limitations were relaxed so that it just looks for an empty table.
I think there also might have been some additional heap cases that
now allow bulk-load, although I can't recall the details.

The DBA Guide and/or the SQL Reference should tell you when bulk-load
is allowable.  Bulk-load is what you want, if you're looking for speed.


Karl R. Schendel, Jr.
K/B Computer Associates   wiz@kbcomputer.com
Ingres and Unix Expertise
Ingres Q & A
Back to William's Home Page

© William Yuan 2000

Email William