Document #: US-39363,EN ------------------------------------------------------------------------------ Major subject: analysis Minor subjects: tech_notes Keywords: dba_guide Abstract: "BTREE" is a section of the INGRES Database Administrator's Guide chapter "Choosing Storage Structures and Secondary Indexes". It discusses BTREE implementation in INGRES and is equivalent to the Release 6 Technical Note #7 or the Release 5 Technical Note # 59. Expert note: Btree ===== o Structure of a Btree Table o Associated Data Pages o Index Growth o Locking and Btree Tables o Sorted Order in a Btree Table o Deleted Rows: Data Pages o When to Use Btree 'Btree' is INGRES's most versatile storage structure. The btree structure allows for keyed access and supports range searches and pattern matching. The btree index is dynamic, growing as the table grows. This eliminates the overflow problems that static structures like isam and hash present as they grow. INGRES's btree's implementation not only provides a dynamic index but also allows for maximum concurrent use of the table. The design incorporates a sparse index that points to pages in a leaf level. The leaf level is a dense index, containing key and tid pairs pointing to the rows on the data pages in the table. (For a discussion of tids, please see the section "Tids" in the INGRES Database Administrator's Guide.) The benefit of this indexing approach is that it minimizes splitting cost: when splitting does occur, the actual data rows need not move. Only the leaf and index levels require reorganization. Structure of a Btree Table -------------------------- INGRES btrees can be viewed as four separate parts. There is: o A free list header page, which is used to keep track of allocated pages that are not currently being used o One or more index pages o One or more leaf pages o One or more data pages, where the user data is actually stored Index pages contain the key and a pointer (known as a tid) to either lower-level index pages or a leaf page where the key-tid pair will be found. Data pages contain the data rows and do not contain pointers to other pages. The smallest btree containing data will have four pages, one of each type. If a secondary index is modified to btree, it will not contain data pages. Instead, the leaf pages of the secondary index will contain tids referencing the main table's data pages. The number of index pages is dependent on the width of the key and the number of leaf pages, because eventually the index pages point to a particular leaf page. Usually the index level is small, since it needs only to point to the leaf pages. The index level is similar to the isam index, except that the isam index points to data pages whereas the btree index level points to leaf pages. The leaf page level is a dense index. It is considered a dense index because it holds a key and tid pair for every row in the table. In dense indexes, rows on data pages do not move during a split (which would cause their tids to change). The index level is considered a sparse index, because it contains only a key value and a pointer to a page. The figure below, a portrait of a btree, aids in understanding the three btree levels: index, leaf and data page. The btree index level is similar to isam's index, except that the index points to index or leaf pages instead of to data pages. To look for "Kay", the search starts from the root node and traverses down the left side of the index for values preceding "McShane". The index holds leaf page numbers and the range of key values to expect on the leaf page. The index in this case shows that leaf page 2 is the appropriate page to look at. On leaf page 2, a scan is made through the key-tid pairs, with Kay's record identified as being on page 4, row 0 (page 4 row 0 is actually a "tid"). Going directly to page 4, row 0, Kay's record is located. Note: This diagram is good pictorially, but it may not be realistic. In actuality, if the key width "name" were only 30 characters, the row width were 500 bytes, and there were only 31 employees, this btree would have only a free list header page, one index page, one leaf page, and 8 data pages (instead of 4 leaf pages and 3 index pages). +-----------------------------------+ | ROOT | | | | <= McShane | INDEX +-----------------------------------+ LEVEL / \ / \ +---------------------------+ +---------------------------+ | Key Leaf Page | | Key Leaf Page | | | | | | <= Giller 1 | | > McShane <= Shigio 3 | | > Giller <= McShane 2 | | > Shigio 4 | +---------------------------+ +---------------------------+ LEAF PAGE LEVEL Leaf Page 1 Leaf Page 2 Leaf Page 3 Leaf Page 4 Aitken 1,0 Gordon 3,0 McTigue 5,0 Smith 7,0 Blumberg 1,1 Green 3,3 Ming 5,1 Stannich 7,1 Brodie 1,3 Gregori 3,2 Ramos 5,2 Stein 7,2 Cameron 1,2 Huber 3,1 Robinson 5,3 Stover 7,3 Clark 2,0 Kay 4,0 Ross 6,0 Sullivan 8,0 Curan 2,1 Kreseski 4,1 Sabel 6,1 Verducci 8,1 Curry 2,2 Mandic 4,2 Saxena 6,2 Zimmerman 8,2 Giller 2,3 McShane 4,3 Shigio 6,3 DATA PAGE LEVEL +-------------------------------------+ Page 1 0 |Aitken | 1| 49| 50000.000| 1 |Blumberg | 2| 33| 32000.000| 2 |Cameron | 4| 37| 35000.000| 3 |Brodie | 3| 42| 40000.000| |-------------------------------------| Page 2 0 |Clark | 5| 43| 40000.000| Associated 1 |Curan | 6| 30| 30000.000| Data Page 2 |Curry | 7| 34| 32000.000| for Leaf 3 |Giller | 8| 47| 46000.000| Page 1 |-------------------------------------| Page 3 0 |Gordon | 9| 28| 27000.000| 1 |Huber | 12| 35| 32000.000| 2 |Gregori | 11| 32| 31000.000| 3 |Green | 10| 27| 26000.000| |-------------------------------------| Page 4 0 |Kay | 13| 41| 38000.000| Associated 1 |Kreseski | 14| 25| 24000.000| Data Page 2 |Mandic | 15| 46| 43000.000| for Leaf 3 |McShane | 16| 22| 22000.000| Page 2 |-------------------------------------| Page 5 0 +McTigue | 17| 44| 41000.000+ FIGURE: Portrait of a Btree Associated Data Pages ---------------------- Every leaf page has an associated data page. This is where new rows are added. When an associated data page fills up, a new associated data page is attached to the leaf page. If you delete rows that exist on the current associated data page, the deleted space is reused. Having one associated data page per leaf page provides a good chance for rows with similar key ranges to exist on the same data page, thereby increasing the likelihood that data references will occur on the same data page. Index Growth ------------ The major difference between isam and btree is that the btree index grows as the table grows. Consider what would happen if you added these five new employees to the isam "employee" table, keyed on name: Zanadu, Zentura, Zilla, Zorro, Zumu. These names would all be put on the last page of the isam table, and since they would not all fit on the last page, they would be put onto an overflow page attached to the last page. If you added these five new employees to the btree table, you would add the new names to the appropriate leaf page (page 4, in this case) and their records would go on the associated data page for leaf page #4. Since the associated data page would fill up, a new associated data page would be assigned to page #4. If the leaf page were full, and could not hold all five names, the leaf page would split into two leaf pages, and a reference to the new leaf page in the index would be added. If the index page could no longer hold a reference to another leaf page, the index would be split as well. Splitting occurs fairly frequently while the table is small and growing. As the table gets larger, splitting occurs less frequently (unless a sequential key is used) and usually only in the leaf or lowest index level. Locking and Btree Tables ------------------------ During normal btree traversal, leaf and data pages are logically locked until the end of the transaction. Btree index pages are only temporarily locked during query execution. The index page lock is released after the page has been searched. When searching the btree index, ladder locking is used: a lock is taken on the first index page, which points to another index page. The next index page is locked and, once it is locked, the first lock is dropped; and so on down the index to the leaf level. INGRES always locks the leaf and data pages when accessing btree tables. Because of this, locking pages in a btree table requires twice as many locks as locking an isam or hash table. It may be wise to set the maxlocks escalation factor higher than the default when using the btree storage structure. For details refer to the 'set lockmode' statement in your query language reference manual. Sorted Order in a Btree Table ----------------------------- Looking back to the figure, note that the rows for "Huber" and "Green" in the btree diagram are not in sorted order on the data page. This would happen if Huber's record was appended before Green's. They both end up on the same data page, but slightly out of order. This would happen in isam as well. However, if you tried the following: select * from employee where employee.name like 'G%'; you would retrieve the rows in sorted order if the employee table was a btree, since the leaf pages are used to point to the data rows, and the leaf pages maintain a sorted sequence table of all the key/tid pairs on the leaf page. The data on the data pages is not guaranteed to be sorted, but the access, which is always through the leaf pages, guarantees that the data is retrieved in sorted order. (This is not true for isam.) Although btree data is retrieved in sorted order, currently the 'max' aggregate does not automatically go to the maximum key in the btree. The 'max' aggregate scans the entire btree table. Deleted Rows: Data Pages ----------------------- If rows are deleted on the associated data page, the space is re-used the next time a row is appended to that page. If rows are deleted from a data page that is no longer associated, the space is not re-used. If all the rows on a non-associated data page are deleted, the 'modify to merge' statement places the page on the free list, and when a new associated data page is needed, the page is reused. The 'modify to btree' statement is the only statement that completely frees up unused data pages and disk space. The reason that deleted space on a non-associated data page is not automatically reused is to speed the append operation. Appending to one particular page (the "associated data page") is faster than tracking and checking all the available free space on non-associated data pages; appending to the associated data page also provides better key clustering when data addition occurs in sorted key order. Since appends generally occur more frequently than do deletes, preserving the performance of the append operation seems wiser than reusing deleted space from non-associated data pages. When to Use Btree ----------------- Btree is the most versatile storage structure, as it supports both exact match and range retrievals and includes a dynamic index, so that frequent remodification may not be necessary. Btree is a good storage structure to use in any of these cases: o The table is growing at a rapid rate. o You use pattern matching. o You retrieve ranges of key values. o You retrieve using only the leftmost part of a multi- column key. Btree is a poor storage structure to use if: o The table is relatively static. o The table is small, static and access is heavily concurrent. The following are problems encountered with the btree storage structure, and their solutions. Btree Problems and Solutions Problem: You tried to use pattern matching, but did not specify the leftmost character. Solution: Specify the leftmost part of the key; *F* does not use the btree index, but F* does. If you cannot modify the search condition, the entire table must be scanned. Problem: You tried to use just part of a multi-column key, but did not specify the leftmost column. Solution: Specify the leftmost column of the multi-column key. If you cannot modify the search condition, create a secondary index with only the columns on which you are searching. Problem: You are deleting frequently, as well as adding data. Solution: Use 'modify to merge' or 'modify' periodically to reclaim space. Releases affected: 6.0/00(all.all) - Releases not affected: Errors: Bugs/SIRS: ------------------------------------------------------------------------------
© William Yuan 2000
Email William