Maxpages parameter

On Aug 5, 14:19, Beedle,Paul(STP) wrote:
> Subject: RE: MAXPAGES
> [re: why does MAXPAGES exist?]
>
> I have used:
>
>    with MAXPAGES = 1
>
> for tables that contain sequential key values for use in other tables.   
> I want to make sure that the table contains only one row and that it will
> never grow, so I pad  the rest of the row with a column of 1024 bytes.

I'm not sure what benefit you are getting here.  MAXPAGES does not limit
your tablesize permanently -- it only limits the number of primary hash
pages allocated when you modify the table.  I would suggest that you not
bother with either the padding column or the HASH structure, and simply
use a HEAP with one row and one column.  (or, you can use   
ii_sequence_values, which is used by the SEQUENCE_VALUE 4GL builtin procedure).

 --
Karl Schendel            Phone: (412) 963-8844
Telesis Computer Corp      Fax: (412) 963-1373
wiz@telesis.com
I wrote:


> I suppose you could use MAXPAGES to allow some space for new rows in the
> main pages, while limiting the table size by forcing existing rows into
> overflow pages. ...  So in other words, fooling
> around with MAXPAGES might be a way to get two different fillfactors,
> allowing you to bias the modified table favorably towards inserts.

Well, my curiousity got the best of me, so I tried some experiments.
I modified a table to hash unique on a key column (which just happened
to be a TABLE_KEY WITH SYSTEM_MAINTAINED, ugh) and got a 512 page table.
I then modified it to hash on the same key with MAXPAGES=64, FILLFACTOR=25.
Guess what?  The fillfactor was completely ignored!  It would appear
that Ingres decided to fill up all the primary pages 100%, and then filled
the overflow pages to whatever extent was necessary.


So my clever and useless idea is in fact incorrect in the existing
implementation.  (6.4/05 was what i tested with.)  My current thinking
is that MAXPAGES is a strange synonym for "Ignore that there fillfactor
setting..."
Actually I think Roy hit it right on with:
> ... Or was it just an irrational loathing of the
> asymmetry of having only a MINPAGES parameter without a matching MAXPAGES?

By the way, it looks to me like the fillfactor in a HASH modify is used
only to compute a guess at the required number of primary pages.  It seems
to ignore fillfactor when actually filling the pages.  So if you fool
with the primary page count by specifying MINPAGES or MAXPAGES, you're
probably wasting your time specifying FILLFACTOR too.
This sort of seems like a misfeature to me, perhaps even (dare I say it)
a bug.


-- 
Karl Schendel            Phone: (412) 963-8844
Telesis Computer Corp      Fax: (412) 963-1373
wiz@telesis.com


Karl,

From what I've learned, HASH tables reclaim deleted row space, while HEAP   
continues to grow, leaving deleted row space.  I use HASH for tables that   
are delete heavy and maintain a relatively stable size.  For sequential   
keys, I use update statements rather that delete-insert, and I originally   
used HEAP to avoid optimizer overhead.  I switched to HASH because of my   
mistaken notion that MAXPAGES would prevent table growth, and because the   
overhead associated with that small of a table minuscule compared to the   
remainder of my database transactions.

Regarding MAXPAGES not preventing overflow, I was unaware.  Thanks for   
the info Karl and Roy.

Paul Beedle

Roy Hann wrote:
> 
> On Mon, 5 Aug 1996, Beedle,Paul(STP) wrote:
> 
> > I have used:
> >
> >    with MAXPAGES = 1
> >
> > for tables that contain sequential key values for use in other tables.  I
> > want to make sure that the table contains only one row and that it will
> > never grow, so I pad  the rest of the row with a column of 1024 bytes.
> 
> An interesting notion, but I don't think it limits growth at all.  All it
> does is limit you to one primary page.  Growth would still be permitted
> but it would be along an overflow chain.  A 1 page hash table is an
> interesting concept also.
> 
> No, sorry, good try. :-)  I'm still not satisfied that this is what the
> designers had in mind when they included the MAXPAGES option.  Is there
> anyone else who can convince me it is not just an aberration due to a surfeit
> of lobster-and-gorgonzola pizza one late night?
> 
> ========================================================================
> Roy Hann                                   rhann@tnc.com
> BKB Engineering Ltd.,                      NAIUA Member-at-Large
> 11211-76th Avenue,
> Edmonton, Alberta                          http://www.tnc.com/bkb
> T6G 0K2                                    Tel: (403)497-2409
> Canada                                     FAX: (403)436-6055
> ========================================================================

Hi Roy,

Because the number of main pages in a hash table is always rounded to
the _nearest_ power of two (it works best with the hashing algorithm), 
hash tables double in size when they pass a threshold such that a 
higher power of two is nearest to the number of pages required. If
you have very large hash tables then in order to manage your disk 
space, you may want to inhibit this doubling of size by using
maxpages. 

Sample Calculation
------------------

	10000 rows at 240 bytes per row
	
	Number of rows per page = (2010 / (240 + 2)) = 8

	Number of pages required = 10000/8
				 = 12500

	50% fillfactor for hash  = 25000 allocated.

	BUT 25000 is not a power of two. The powers of two which
	lie either side of the 25000 are 16384, and 32768. Of the
	two, 32768 is nearer to 25000 so that is the number of
	main pages that would be allocated. 

	If the number of pages required had worked out at 24575
	rather than 25000, then we would have allocated 16384
	main pages instead. 

So at some point as my table grows, it passes through the 24575
barrier and doubles in main pages from 16384 to 32768. That's 
where maxpages comes in handy, you may want to prevent this
doubling until either you have more space, or you can eliminate
some old data.

Hope this satisfies the requirements for it to be useful :) !

Tony

------------------------------------------------------------------------
Tony Smith
tsmith@ingres.com
------------------------------------------------------------------------



Well folks, I got this rather intriguing justification for MAXPAGES from
someone else who's shy and doesn't want his name used:

> ... Consider a situation where for a key
> choice you have a finite set of values (Some look-up table, 50 states, etc)
> You also have gobs of data.  Not using maxpages for your set of data
> will produce an excessive number of hash buckets, but you really only
> need 50 distinct ones.  You want all the data for each value to be in the
> overflow chain.  If you set maxpages, you will limit the number of "wasted"
> main pages.

Assuming the above is accurate, and I have no reason to disbelieve it,
what is happening is that MODIFY does not know anything about the
hash key distribution before it chooses the number of primary pages.
I suppose that's a realistic implementation of MODIFY.
If you have decided to hash on a sufficiently non-unique key, Ingres
will by default think that it needs lots of primary pages, and then will
leave most of them empty because rows will hash to only a few buckets.
As soon as those primary pages fill, overflow pages will occur, but the
rest of the primary pages will remain empty and useless.

I suppose with reasonably wide rows, you could get into the above situation
even without massive key duplication  (ie you don't necessarily need a
stupid key choice to run into the page wastage).  So it would seem that
MAXPAGES solves a real world problem after all.
Kudos to Mr Anonymous who thought that one up.

(Now we need this knowledge inserted into the DBA Guide in place of the
bogosity that's there now.)

-- 
Karl Schendel            Phone: (412) 963-8844
Telesis Computer Corp      Fax: (412) 963-1373
wiz@telesis.com



> Roy Hann   wrote:
> >Another explanation is offered by the DBA Guide BTW, but it strikes me as
> >pretty lame too.  According to the DBA Guide we can use MAXPAGES to
> >encourage a compressed hash table to be smaller than it might otherwise
> >be.  It is not clear why that might be the case though.  It doesn't 
> >strike me as being real important either...
> 
> That's exactly what I'm using it for - you can't have a FILLFACTOR of
> over 100(%) - or it least that's the way it's documented.  My compressed
> rows are typically much smaller than their uncompressed size and I would
> get a very large, very sparse hash table without using MAXPAGES.  There
> may be a better way to do it - for instance if fillfactor could be
> greater than 100, but this works for me - not lame at all.  Using
> MAXPAGES means I have to occasionally check the size of my table to make
> sure my effective fillfactor doesn't get too large.

When I said it was lame, I was thinking about why the feature was ever
included in the first place.  I was trying to think what obvious and
important problem the designers were hoping to solve when they included
the parameter.  I am quite prepared to believe that it solves the problem
you (and the manual) describe, I am just not quick to believe that it is
THE problem the designers had in mind.  The compressed hash table 
problem just seems too obscure and esoteric somehow.

The explanation that Karl Schendel passed on, that MAXPAGES really means
"MAXBUCKETS", does seem to me (for no good reason) more likely to be what
they were thinking of. 

But whatever, I now concede that MAXPAGES is not a mere decoration, and 
that it can do a number of useful things.  Thank you to everyone for setting 
me straight on that.  

========================================================================
Roy Hann                                   rhann@tnc.com
BKB Engineering Ltd.,                      NAIUA Member-at-Large
11211-76th Avenue,                 
Edmonton, Alberta                          http://www.tnc.com/bkb
T6G 0K2                                    Tel: (403)497-2409
Canada                                     FAX: (403)436-6055
========================================================================



>On Oct 10, 15:27, Roy Hann wrote:
>> Subject: Fillfactors
>> I got such a wonderful dividend of speculation (and even a good answer)
>> from the modest investment I made in asking a single question about the
>> purpose of MAXPAGES a few months ago that I am encouraged to ask another
>> simple question.  Are you ready?
>>
>> Is there any good reason why the default fillfactor for a non-compressed
>> B-tree data page is 80% (and not say, 100%)?
>>
>> (S.M.: as you can tell, this is still keeping me awake nights :-)

Hi Roy.

I must have missed your original posting about maxpages so I apologise
if somebody has already answered the same as this:

The only good reason I can find for maxpages is to prevent the
standard sizing algorithm from creating HUGE tables.  By default the
number of pages required for a hash table is  (number of rows/rows per
page) rounded up to the nearest power of 2.  Therefore in some cases a
1Gb table can jump to 2Gb just by the addition of a few rows followed   
by a modify.  Maxpages can be used to control this
 
 
With regard to your second question, I too have thought about this and
can only assume that the default fillfactor is not 100% to improve
concurrency on the data pages....but why 80% - who knows!

Perhaps it was too difficult to code a different fillfactor for data
pages as opposed to the ADPs!

Cheers

Nick Wadge
Seer Systems Ltd
Ingres Q & A
To William's Home Page

© William Yuan 2000

Email William