DMF Cache Stats

>Subject: Re: dmf.cache_size on Ingres 6.4
>From: schendel@fyi.net (Karl Schendel)
>
>> ....
>> I think this newsgroup could do with a refresher on how to 
>> interpret the output from dm420, in particular where there are
>> several servers. Where there are several servers, there is some
>> confusion on how you collate the results. Care to oblige ? 
>
>I can start this off, although a full description will have to wait
>until Monday when I can refresh my memory as to what all the numbers are.
>
>There are two sections to the DM420 output.  The stuff on the various buffer
>sizes and counts applies to the cache itself, so you'll get the same
>answer from all servers in a shared cache setup.
>On the other hand, the various fix, hit, and I/O counts are for
>THAT SERVER.  You have to do DM420's for all the servers sharing the
>cache, and add up the numbers to get the total picture.
>
>More details Monday, unless someone would like to beat me to it (Chip..?)
>
>Karl

As Karl mentioned, while there may be several DBMS servers that share a
DMF cache (the other caches (pools) are not shared), each server keeps its
own statistics.  There is a KB Document (US-17267 EN) which explains this
in pretty good detail.
  
There are slight differences between Ing v6 and OpIng 1.x.  In version 6
both the Fast Commit and Write Behind threads worked together during a
consistency point to empty the DMF cache.  If both FC flush and WB flush
counts were the same you knew that there really was not any write behind
activity taking place.  In OpIng 1.x there are Consistency Point threads
(that are activated at CP time) and WB threads.  These do not work
together as they did in v6, so you need to view the numbers separately.

In a multi-server environment you would want to have II_DBMS_LOG (see, I
got it right this time!) set to a different file for each DBMS Server.  If
this is too much of a hassle then just use the "set trace output 'filename'"
thing.

[William: Here is a sample dm420 output from our Student Admin system]

!  Buffer Status: GROUP,WBWAIT,SHARED
!  Buffer Manager Id: 30  Connected servers 3
!  Buffer count:   64000  Bucket count:   65535 Group count:    200 Size:   160
!  Free count:    31632 Limit:    1000 Modify count:     367 Limit:   24000
!  Free group count:     200 Modify group count:      0
!  Fixed count:       1 Group fixed count:      0
!  Write Behind start limit:     750, Write Behind end limit:     375
!  CP count: 17458  CP index :     0  CP check :     0
!  Database cache size:    20  Table cache size:    40
! 
Statistics---------------------------------------------------------------------
---------
!       FIX CALLS      HITS     CHECK   REFRESH      READ
!      14355474581401522463  10144172         0  16380945
!     UNFIX CALLS     DIRTY     FORCE     WRITE     IOWAIT
!       824006083  26837223  12531652  12758703    981958
!          GREADS   GWRITES  FREEWAIT MUTEXWAIT     FCWAIT  RECLAIM
!         5633040      5377         0       220        41         0
!     CONSISTENCY POINT FLUSHES        WRITE BEHIND FLUSHES
!           17457                         67819


The free and modify counts (on the top section of dm420) refer to the
single page cache.  Group buffers are invalidated after use.  Things that
I looked at included the ratio of reads (2k single page reads) vs. greads
(group buffer, 16k by default).  In an OLTP environment you would expect
it to be highly skewed towards single page reads.  In a DSS/OLAP type
environment you would expect just the opposite.  The key is to look at the
ratio and ask if it seems reasonable for your specific environment.  If
the greads seem high than there may be a problem with table scanning (due
to indexing (incorect or missing), query design, etc.).   I also look at
IOWAIT, which should ideally be very close to zero.  Also, look at your
cache hit ratio (hits / fix calls).  Ideally this should be in the 90%
range.

Some of my DMF tuning rules of thumb (note:  this won't work for everyone,
but it is usually a good starting point for tuning):

- CP size of 20-25 MB (should not exceed 25% for a small tx log file)
- CP completion time w/i 10-12 seconds (look at the ACP log)
- DMF single page cache size between 5,000 - 15,000 pages
- DMF group buffer count at (1.1 * connected_sessions)
- leave DMF group size at 8 (the optimizer can do strange things with
    larger caches)
- WBStart between 2,000 - 2,500
- WBEnd between 1,000 - 1,250
- 2-3 WB threads per DBMS server.  On Unix make sure that you have at
    least this number plus 2 iislaves (e.g., 3 WB threads, 5 or more slaves)

There are a lot of variables (e.g., available RAM, I/O bandwidth, data
distribution, etc.), but generally this configuration will smoke the
default configuration.  More importantly, it gives you a better starting
point for fine tuning of the cache. 

OpenIngres 2.0 has multiple DMF caches (at least one per page size used),
and somehow manages distributed shared caches (which should be very cool).
I do not know how dm420 works in that environment (or if it even works
(there may be a replacement trace point)), but I am anxious to find out. 
So far I have been very impressed with 2.0 (FWIW).

Chip Nickolett           ChipN@Comp-Soln.com
Comprehensive Consulting Solutions, Inc.   (www.Comp-Soln.com)
Phone:  262-544-9954     Fax:  262-544-1236



Formerly:  ChipN@ingres.com
Following up on the DM420 thread:

Here's what I have on the DM420 trace point output.
As was pointed out earlier, the first section of the output is
buffer cache specific, and lists various buffers counts, sizes, and
limits.  It also lists how many buffers are marked as Modified
and Free (ie, not In Use).

The second part of the output is specific to each iidbms, and you
have to add up all the numbers from each iidbms to get the correct
total in a shared-cache environment.

FIX CALLS - Number of times buffer fix (ie read-data) call was made.
HITS	  - Number of times the request was satisfied from the buffer cache 
	    itself.  The cache hit ratio is HITS / FIX CALLS * 100%
CHECK	  - If cache locks are taken (Pvt buffer cache, multi server or Pvt
		cache, DMCM with multi-server or iidbdb table which are always
	    	cache locked), number of times cache-locks are checked for 
	    	consistency.
REFRESH   - goes with CHECK. If cache-locks determines that locks are stale,
		it rereads the page.
READ 	  - number of physical page-reads from the disk.
WRITE 	  - Number of physical page-writes to disk
TOSS 	  - Pages that are thrown away from the buffer cache (LRU algorithm in
       		buffer manager) to make room for new pages.  (OpenIngres 2.0)
GREADS    - Group buffer reads
GWRITES   - Writes from group buffer
FREEWAIT  - Number of waits due to no free buffer available
MUTEXWAIT - Number of waits due to (cache control) mutex lock being held

UNFIX, DIRTY, FORCE, IOWAIT, and FCWAIT are all semi-obvious, but I don't
have a *precise* definition for them (eg, why is UNFIX typically much less
than FIX?)  Precise definitions, and/or corrections to the above, are
invited and welcome.

By the way, one of my favorite at-a-glance cache statistics is a plot of
hit percent vs attempt rate (fixes-per-second).  Low hit percents are OK
at low attempt rates.  If the hit percent doesn't tighten up as the
attempt rate goes up you need to look at what's going on.

-- 
Karl Schendel            Phone: (412) 963-8844
Telesis Computer Corp      Fax: (412) 963-1373
wiz@telesismfg.com
Ingres Q & A
Back to William's Home Page
Email William