Btree Implementation in Ingres

                         Document #: US-39363,EN
------------------------------------------------------------------------------

Major subject: analysis    Minor subjects: tech_notes

Keywords: dba_guide

Abstract:
"BTREE" is a section of the INGRES Database Administrator's 
Guide chapter "Choosing Storage Structures and Secondary 
Indexes". It discusses BTREE implementation in INGRES and 
is equivalent to the Release 6 Technical Note #7 or the 
Release 5 Technical Note # 59.


Expert note:
Btree
=====

o	Structure of a Btree Table
o	Associated Data Pages
o	Index Growth
o	Locking and Btree Tables
o	Sorted Order in a Btree Table
o	Deleted Rows: Data Pages
o	When to Use Btree

'Btree' is INGRES's most versatile storage structure. The
btree structure allows for keyed access and supports range
searches and pattern matching. The btree index is dynamic,
growing as the table grows. This eliminates the overflow
problems that static structures like isam and hash present
as they grow.

INGRES's btree's implementation not only provides a dynamic
index but also allows for maximum concurrent use of the
table. The design incorporates a sparse index that points to
pages in a leaf level. The leaf level is a dense index,
containing key and tid pairs pointing to the rows on the
data pages in the table. (For a discussion of tids, please
see the section "Tids" in the INGRES Database
Administrator's Guide.) The benefit of this indexing
approach is that it minimizes splitting cost: when splitting
does occur, the actual data rows need not move. Only the
leaf and index levels require reorganization.


Structure of a Btree Table
--------------------------
INGRES btrees can be viewed as four separate parts. There
is:

   o A free list header page, which is used to keep track of
     allocated pages that are not currently being used

   o One or more index pages

   o One or more leaf pages

   o One or more data pages, where the user data is actually
     stored

Index pages contain the key and a pointer (known as a tid)
to either lower-level index pages or a leaf page where the
key-tid pair will be found. Data pages contain the data rows
and do not contain pointers to other pages. The smallest
btree containing data will have four pages, one of each
type.

If a secondary index is modified to btree, it will not
contain data pages. Instead, the leaf pages of the secondary
index will contain tids referencing the main table's data
pages.

The number of index pages is dependent on the width of the
key and the number of leaf pages, because eventually the
index pages point to a particular leaf page. Usually the
index level is small, since it needs only to point to the
leaf pages. The index level is similar to the isam index,
except that the isam index points to data pages whereas the
btree index level points to leaf pages.

The leaf page level is a dense index. It is considered a
dense index because it holds a key and tid pair for every
row in the table. In dense indexes, rows on data pages do
not move during a split (which would cause their tids to
change). The index level is considered a sparse index,
because it contains only a key value and a pointer to a
page.

The figure below, a portrait of a btree, aids in
understanding the three btree levels: index, leaf and data
page.

The btree index level is similar to isam's index, except
that the index points to index or leaf pages instead of to
data pages. To look for "Kay", the search starts from the
root node and traverses down the left side of the index for
values preceding "McShane". The index holds leaf page
numbers and the range of key values to expect on the leaf
page. The index in this case shows that leaf page 2 is the
appropriate page to look at.

On leaf page 2, a scan is made through the key-tid pairs,
with Kay's record identified as being on page 4, row 0 (page
4 row 0 is actually a "tid"). Going directly to page 4, row
0, Kay's record is located.

Note: This diagram is good pictorially, but it may not be
realistic. In actuality, if the key width "name" were only
30 characters, the row width were 500 bytes, and there were
only 31 employees, this btree would have only a free list
header page, one index page, one leaf page, and 8 data pages
(instead of 4 leaf pages and 3 index pages).

          +-----------------------------------+
          |                 ROOT              |
          |                                   |
          |           <=   McShane            |
INDEX     +-----------------------------------+
LEVEL            /                        \
                /                          \
+---------------------------+ +---------------------------+
| Key             Leaf Page | | Key             Leaf Page |
|                           | |                           |
| <= Giller              1  | | > McShane <= Shigio    3  |
| > Giller <= McShane    2  | | > Shigio               4  |
+---------------------------+ +---------------------------+

LEAF PAGE LEVEL

Leaf Page 1    Leaf Page 2    Leaf Page 3    Leaf Page 4
Aitken    1,0  Gordon    3,0  McTigue   5,0  Smith      7,0
Blumberg  1,1  Green     3,3  Ming      5,1  Stannich   7,1
Brodie    1,3  Gregori   3,2  Ramos     5,2  Stein      7,2
Cameron   1,2  Huber     3,1  Robinson  5,3  Stover     7,3
Clark     2,0  Kay       4,0  Ross      6,0  Sullivan   8,0
Curan     2,1  Kreseski  4,1  Sabel     6,1  Verducci   8,1
Curry     2,2  Mandic    4,2  Saxena    6,2  Zimmerman  8,2
Giller    2,3  McShane   4,3  Shigio    6,3

DATA PAGE LEVEL
          +-------------------------------------+
Page 1  0 |Aitken    |       1|    49| 50000.000|
        1 |Blumberg  |       2|    33| 32000.000|
        2 |Cameron   |       4|    37| 35000.000|
        3 |Brodie    |       3|    42| 40000.000|
          |-------------------------------------|
Page 2  0 |Clark     |       5|    43| 40000.000| Associated
        1 |Curan     |       6|    30| 30000.000| Data Page
        2 |Curry     |       7|    34| 32000.000| for Leaf
        3 |Giller    |       8|    47| 46000.000| Page 1
          |-------------------------------------|
Page 3  0 |Gordon    |       9|    28| 27000.000|
        1 |Huber     |      12|    35| 32000.000|
        2 |Gregori   |      11|    32| 31000.000|
        3 |Green     |      10|    27| 26000.000|
          |-------------------------------------|
Page 4  0 |Kay       |      13|    41| 38000.000| Associated
        1 |Kreseski  |      14|    25| 24000.000| Data Page
        2 |Mandic    |      15|    46| 43000.000| for Leaf
        3 |McShane   |      16|    22| 22000.000| Page 2
          |-------------------------------------|
Page 5  0 +McTigue   |      17|    44| 41000.000+

                FIGURE:  Portrait of a Btree


Associated Data Pages
----------------------
Every leaf page has an associated data page.  This is where 
new rows are added.  When an associated data page fills up, a 
new associated data page is attached to the leaf page. If you 
delete rows that exist on the current associated data page, 
the deleted space is reused.

Having one associated data page per leaf page provides a
good chance for rows with similar key ranges to exist on the
same data page, thereby increasing the likelihood that data
references will occur on the same data page.


Index Growth
------------
The major difference between isam and btree is that the
btree index grows as the table grows. Consider what would
happen if you added these five new employees to the isam
"employee" table, keyed on name: Zanadu, Zentura, Zilla,
Zorro, Zumu. These names would all be put on the last page
of the isam table, and since they would not all fit on the
last page, they would be put onto an overflow page attached
to the last page.

If you added these five new employees to the btree table,
you would add the new names to the appropriate leaf page
(page 4, in this case) and their records would go on the
associated data page for leaf page #4. Since the associated
data page would fill up, a new associated data page would be
assigned to page #4. If the leaf page were full, and could
not hold all five names, the leaf page would split into two
leaf pages, and a reference to the new leaf page in the
index would be added. If the index page could no longer hold
a reference to another leaf page, the index would be split
as well.

Splitting occurs fairly frequently while the table is small
and growing. As the table gets larger, splitting occurs less
frequently (unless a sequential key is used) and usually
only in the leaf or lowest index level.


Locking and Btree Tables
------------------------
During normal btree traversal, leaf and data pages are
logically locked until the end of the transaction. Btree
index pages are only temporarily locked during query
execution. The index page lock is released after the page
has been searched.

When searching the btree index, ladder locking is used: a
lock is taken on the first index page, which points to
another index page. The next index page is locked and, once
it is locked, the first lock is dropped; and so on down the
index to the leaf level.

INGRES always locks the leaf and data pages when accessing
btree tables. Because of this, locking pages in a btree
table requires twice as many locks as locking an isam or
hash table. It may be wise to set the maxlocks escalation
factor higher than the default when using the btree storage
structure. For details refer to the 'set lockmode' statement
in your query language reference manual.


Sorted Order in a Btree Table
-----------------------------
Looking back to the figure, note that the rows for "Huber"
and "Green" in the btree diagram are not in sorted order on
the data page. This would happen if Huber's record was
appended before Green's. They both end up on the same data
page, but slightly out of order. This would happen in isam
as well. However, if you tried the following:

     select * from employee
          where employee.name like 'G%';

you would retrieve the rows in sorted order if the employee
table was a btree, since the leaf pages are used to point to
the data rows, and the leaf pages maintain a sorted sequence
table of all the key/tid pairs on the leaf page. The data on
the data pages is not guaranteed to be sorted, but the
access, which is always through the leaf pages, guarantees
that the data is retrieved in sorted order. (This is not
true for isam.)

Although btree data is retrieved in sorted order, currently
the 'max' aggregate does not automatically go to the maximum
key in the btree. The 'max' aggregate scans the entire btree
table.


Deleted Rows: Data Pages
-----------------------
If rows are deleted on the associated data page, the space
is re-used the next time a row is appended to that page. If
rows are deleted from a data page that is no longer
associated, the space is not re-used. If all the rows on a
non-associated data page are deleted, the 'modify to merge'
statement places the page on the free list, and when a new
associated data page is needed, the page is reused. The
'modify to btree' statement is the only statement that
completely frees up unused data pages and disk space.

The reason that deleted space on a non-associated data page
is not automatically reused is to speed the append
operation. Appending to one particular page (the "associated
data page") is faster than tracking and checking all the
available free space on non-associated data pages; appending
to the associated data page also provides better key
clustering when data addition occurs in sorted key order.
Since appends generally occur more frequently than do
deletes, preserving the performance of the append operation
seems wiser than reusing deleted space from non-associated
data pages.


When to Use Btree
-----------------
Btree is the most versatile storage structure, as it
supports both exact match and range retrievals and includes
a dynamic index, so that frequent remodification may not be
necessary.

Btree is a good storage structure to use in any of these
cases:

   o The table is growing at a rapid rate.

   o You use pattern matching.

   o You retrieve ranges of key values.

   o You retrieve using only the leftmost part of a multi-
     column key.

Btree is a poor storage structure to use if:

   o The table is relatively static.

   o The table is small, static and access is heavily
     concurrent.

The following are problems encountered with the btree
storage structure, and their solutions.

                Btree Problems and Solutions

Problem:  You tried to use pattern matching, but did not
          specify the leftmost character.
Solution: Specify the leftmost part of the key; *F* does not
          use the btree index, but F* does. If you cannot
          modify the search condition, the entire table must
          be scanned.

Problem:  You tried to use just part of a multi-column key,
          but did not specify the leftmost column.
Solution: Specify the leftmost column of the multi-column
          key. If you cannot modify the search condition,
          create a secondary index with only the columns on
          which you are searching.

Problem:  You are deleting frequently, as well as adding
          data.
Solution: Use 'modify to merge' or 'modify' periodically to
          reclaim space.



Releases affected:  6.0/00(all.all) -   Releases not affected: 
Errors:                                                             
Bugs/SIRS:                                                             
------------------------------------------------------------------------------
Ingres Database Reference
To William's Home Page
Email William