>Subject: Re: dmf.cache_size on Ingres 6.4 >From: schendel@fyi.net (Karl Schendel) > >> .... >> I think this newsgroup could do with a refresher on how to >> interpret the output from dm420, in particular where there are >> several servers. Where there are several servers, there is some >> confusion on how you collate the results. Care to oblige ? > >I can start this off, although a full description will have to wait >until Monday when I can refresh my memory as to what all the numbers are. > >There are two sections to the DM420 output. The stuff on the various buffer >sizes and counts applies to the cache itself, so you'll get the same >answer from all servers in a shared cache setup. >On the other hand, the various fix, hit, and I/O counts are for >THAT SERVER. You have to do DM420's for all the servers sharing the >cache, and add up the numbers to get the total picture. > >More details Monday, unless someone would like to beat me to it (Chip..?) > >Karl As Karl mentioned, while there may be several DBMS servers that share a DMF cache (the other caches (pools) are not shared), each server keeps its own statistics. There is a KB Document (US-17267 EN) which explains this in pretty good detail. There are slight differences between Ing v6 and OpIng 1.x. In version 6 both the Fast Commit and Write Behind threads worked together during a consistency point to empty the DMF cache. If both FC flush and WB flush counts were the same you knew that there really was not any write behind activity taking place. In OpIng 1.x there are Consistency Point threads (that are activated at CP time) and WB threads. These do not work together as they did in v6, so you need to view the numbers separately. In a multi-server environment you would want to have II_DBMS_LOG (see, I got it right this time!) set to a different file for each DBMS Server. If this is too much of a hassle then just use the "set trace output 'filename'" thing. [William: Here is a sample dm420 output from our Student Admin system] ! Buffer Status: GROUP,WBWAIT,SHARED ! Buffer Manager Id: 30 Connected servers 3 ! Buffer count: 64000 Bucket count: 65535 Group count: 200 Size: 160 ! Free count: 31632 Limit: 1000 Modify count: 367 Limit: 24000 ! Free group count: 200 Modify group count: 0 ! Fixed count: 1 Group fixed count: 0 ! Write Behind start limit: 750, Write Behind end limit: 375 ! CP count: 17458 CP index : 0 CP check : 0 ! Database cache size: 20 Table cache size: 40 ! Statistics--------------------------------------------------------------------- --------- ! FIX CALLS HITS CHECK REFRESH READ ! 14355474581401522463 10144172 0 16380945 ! UNFIX CALLS DIRTY FORCE WRITE IOWAIT ! 824006083 26837223 12531652 12758703 981958 ! GREADS GWRITES FREEWAIT MUTEXWAIT FCWAIT RECLAIM ! 5633040 5377 0 220 41 0 ! CONSISTENCY POINT FLUSHES WRITE BEHIND FLUSHES ! 17457 67819 The free and modify counts (on the top section of dm420) refer to the single page cache. Group buffers are invalidated after use. Things that I looked at included the ratio of reads (2k single page reads) vs. greads (group buffer, 16k by default). In an OLTP environment you would expect it to be highly skewed towards single page reads. In a DSS/OLAP type environment you would expect just the opposite. The key is to look at the ratio and ask if it seems reasonable for your specific environment. If the greads seem high than there may be a problem with table scanning (due to indexing (incorect or missing), query design, etc.). I also look at IOWAIT, which should ideally be very close to zero. Also, look at your cache hit ratio (hits / fix calls). Ideally this should be in the 90% range. Some of my DMF tuning rules of thumb (note: this won't work for everyone, but it is usually a good starting point for tuning): - CP size of 20-25 MB (should not exceed 25% for a small tx log file) - CP completion time w/i 10-12 seconds (look at the ACP log) - DMF single page cache size between 5,000 - 15,000 pages - DMF group buffer count at (1.1 * connected_sessions) - leave DMF group size at 8 (the optimizer can do strange things with larger caches) - WBStart between 2,000 - 2,500 - WBEnd between 1,000 - 1,250 - 2-3 WB threads per DBMS server. On Unix make sure that you have at least this number plus 2 iislaves (e.g., 3 WB threads, 5 or more slaves) There are a lot of variables (e.g., available RAM, I/O bandwidth, data distribution, etc.), but generally this configuration will smoke the default configuration. More importantly, it gives you a better starting point for fine tuning of the cache. OpenIngres 2.0 has multiple DMF caches (at least one per page size used), and somehow manages distributed shared caches (which should be very cool). I do not know how dm420 works in that environment (or if it even works (there may be a replacement trace point)), but I am anxious to find out. So far I have been very impressed with 2.0 (FWIW). Chip Nickolett ChipN@Comp-Soln.com Comprehensive Consulting Solutions, Inc. (www.Comp-Soln.com) Phone: 262-544-9954 Fax: 262-544-1236 Formerly: ChipN@ingres.com Following up on the DM420 thread: Here's what I have on the DM420 trace point output. As was pointed out earlier, the first section of the output is buffer cache specific, and lists various buffers counts, sizes, and limits. It also lists how many buffers are marked as Modified and Free (ie, not In Use). The second part of the output is specific to each iidbms, and you have to add up all the numbers from each iidbms to get the correct total in a shared-cache environment. FIX CALLS - Number of times buffer fix (ie read-data) call was made. HITS - Number of times the request was satisfied from the buffer cache itself. The cache hit ratio is HITS / FIX CALLS * 100% CHECK - If cache locks are taken (Pvt buffer cache, multi server or Pvt cache, DMCM with multi-server or iidbdb table which are always cache locked), number of times cache-locks are checked for consistency. REFRESH - goes with CHECK. If cache-locks determines that locks are stale, it rereads the page. READ - number of physical page-reads from the disk. WRITE - Number of physical page-writes to disk TOSS - Pages that are thrown away from the buffer cache (LRU algorithm in buffer manager) to make room for new pages. (OpenIngres 2.0) GREADS - Group buffer reads GWRITES - Writes from group buffer FREEWAIT - Number of waits due to no free buffer available MUTEXWAIT - Number of waits due to (cache control) mutex lock being held UNFIX, DIRTY, FORCE, IOWAIT, and FCWAIT are all semi-obvious, but I don't have a *precise* definition for them (eg, why is UNFIX typically much less than FIX?) Precise definitions, and/or corrections to the above, are invited and welcome. By the way, one of my favorite at-a-glance cache statistics is a plot of hit percent vs attempt rate (fixes-per-second). Low hit percents are OK at low attempt rates. If the hit percent doesn't tighten up as the attempt rate goes up you need to look at what's going on. -- Karl Schendel Phone: (412) 963-8844 Telesis Computer Corp Fax: (412) 963-1373 wiz@telesismfg.com
© William Yuan 2000
Email William