Seven Tips for Maximizing CA-Ingres Performance

by Neil Pitman and Tom Turchioe

Attaining optimum performance for a CA-Ingres based system is a multifaceted effort. Ensuring that a system performs well requires careful management of a wide variety of technical issues during design, deployment, and maintenance.

Performance issues should be addressed from the beginning of a system's design; it is not enough to attempt to fix a deficient system after most or all of the coding is done. The further along in the development lifecycle a system is (designing to coding to testing), the more effort it takes to address performance issues.

Performance not only means the speed at which the system runs, but also how easy screens are to navigate, fill in, and use. A fast response time is of little value if the interface is difficult to use.

This article addresses seven techniques for maximizing CA-Ingres performance. They are:

This article also addresses:


Performance Testing Techniques

Performance must be part of the system's functionality. All phases of design and implementation have a performance metric. The metric states the performance implications of that particular phase on the overall system. For instance, an order entry function has an impact on the usability of the system. The performance metric helps determine what happens if the response times take longer than expected.

The baseline for performance goals should be based on business transaction response times; the time requirements for logical operations should drive the response times for the system. If order entry must run in two seconds from the time the user hits the Add key, then the group of SQL queries performing that operation must be finished in two seconds.

Once you establish performance targets early in the design stage, choose an evaluation methodology and stay with it. Whether it be a CASE methodology, DB design methodology, or testing methodology, pick one for each area and don't change it. Choosing the wrong methodology and staying with it has less impact on attaining performance and functionality targets within time and budget than switching methodologies or dropping them mid-project.

Stress & Load Testing Techniques

In order to adequately test any system, realistic patterns of use must be represented in the test data. Graphing the number of users expected against time and the different transaction types against time will yield important stress and load statistics. This data can then be used to predict performance during peak system usage.

Real, representative amounts of test data are also necessary in predicting performance. Without realistic amounts of data, the optimizer will not generate Query Execution Plans (QEPs) that match what will be seen in production. Any system will appear to run fast if only 50 rows of data are being tested. Data converted from a legacy system can often provide the test data needed.

It is possible to "cook" the statistics by using statdump, modifying them, and using optimizedb to read the modified statistics back into the database. This allows you to obtain realistic QEPs but does nothing to test response time and concurrency. Having real data is preferable.

You should look at third-party tools, like Empower, which allow you to stress test the system in an accurate and realistic way.

In any event, you should test performance and concurrency at the routine (where appropriate), subsystem, and system levels.

Unit Testing Techniques

CA-Ingres provides some trace facilities for unit testing for performance. Using II_EMBED_SET = "printqry; printtrace" and setting ING_SET = "set qep; set trace point qe90; set lock_trace" generate plenty of information. Printqry places the queries generated by the module in a file called "iiprtqry.log" allowing you to examine what is being sent to the DBMS server. You can create a script that parses the file for long-running queries. The QE90 trace point shows what the DBMS actually had to do to execute the query, while the QEP shows the query plan of what it is going to do. Check the QE90 against the QEP to determine if the optimizer selected a suboptimal query plan. Using set lock_trace shows what locks are acquired. Since most unit testing is single user, setting this trace point allows early detection of concurrency problems. For example, if lock_trace shows that a table lock is being requested as shared or exclusive, this could indicate a future potential problem.

Always check the error log (in the "files" directory of the CA-Ingres system tree). Look for errors such as maximum locks per transaction exceeded, or log file force abort. Always find out the causes for errors like these before attempting any additional testing.

System Testing Techniques

System performance testing is an important final step in ensuring a successful system. When coupled with a stress test tool, system testing will help verify the system configuration and concurrency design. During system testing, production procedures for monitoring the disk, CPU, memory, network, and CA-Ingres should be tested.

Since system level testing is a multi-user test, it is important to check the error log for deadlock and lock timeout. Lock modes are listed as integers (for example, with mode = 1) with 1 = IS, 2 = IX, 3 = S, 5 = X. Lock timeout can be useful in identifying problematic concurrency areas. By setting timeout to one second, concurrency problems can be detected. Viewing the Lock List Screen in IPM is also very useful in spotting concurrency problems.

Back to top


The CA-Ingres Query Optimizer

The importance of checking the QEP and QE90 output for key queries cannot be overemphasized. If the QEP shows 90 DIOs to read data from a table in the QEP and the QE90 indicates it actually took 900 DIOs, there is a problem that needs to be resolved.

Other optimization issues should also be considered. The types of joins should be examined. Are sort merge or lookup joins being utilized? If two large tables are being joined, then a sort merge is appropriate. Are the proper storage structures being used in light of the critical queries? The primary storage structure should benefit the queries most often run and those with the most stringent performance requirements. Also, it is possible to carry non-key columns in a secondary index. This can allow the optimizer to bypass reading the base table and result in substantial performance gains. Remember that an update to a column that is carried in a secondary index results in more I/O. This is because CA-Ingres has to update the column in both the base table and the secondary index.

There are a number of trace points that allow viewing of all QEPs generated as the optimizer enumerates through the plans until it finds the most efficient one (op188). Use set joinop [no]timeout to instruct the optimizer to find the best plan no matter how long it takes.

The optimizer selectively flattens queries. Nested queries are typically reworked by the optimizer to be flattened by utilizing equijoins. Trace point op132 turns flattening off completely. Trace point op133 turns flattening on. In some cases, it may be desirable to turn off flattening; if a nested query seems to performing slowly, try turning off flattening using set trace point op132.

Back to top


Connectivity and Interoperability

A client/server system requires more rigorous testing than a non-distributed system. Typically, system testing may involve local testing on the same machine as the DBMS server. It's important to test across the network because network delays can cause transactions to be held open longer, thereby increasing response times that lead to decreased concurrency. Properly testing all pieces of the production system may require multiple tests: that is, running batch jobs on the server machine, running interactive tests over the network, then running both at the same time.

The key to good client/server performance is to minimize the movement of data through the use of database procedures and set-oriented queries.

The increasing use of ODBC compliant applications can unfortunately prevent the use of CA-Ingres specific performance features like repeated. However, some ODBC drivers, such as the Intersolv ODBC driver, incorporate the special functionality required to take advantage of repeated queries.

If you are going to use CA-Ingres/Enterprise Access products (Gateways), then you should use OpenSQL. OpenSQL is designed to work against all Gateways supported by CA-Ingres.

Back to top


The Production Environment

You can calculate the memory and disk needs by performing unit and subsystem testing coupled with an analysis of the system requirements and the CPU.

Make sure that the machine you select can be expanded by adding more processors. Nothing is more expensive than having to buy completely new hardware when system demand grows.

A thorough system test should be performed once the environment is completely set up. A third-party testing tool can be used to drive the system to catch last minute problems.

You can perform a good first pass configuration of CA-Ingres while the production environment is being set up. The number of DBMS servers is determined by either or both of the following factors:

Locking and logging parameters should be based on the number of users and the transaction types. Longer transactions require more locks held longer and more log file space in use at any one time. Configure the logging system with at least eight log buffers of at least eight Kb in size. Try to reduce the number of log split and free waits to zero.

DBMS server parameters are based on criteria such as the number of repeat queries, and the number and size of database procedures (QSF pool). Other criteria include database size, and transaction types (RDF pool size, DMF cache size, DMF Group cache parameters). If complex queries comprise part of the transaction mix, you may need to pay some attention to OPF parameters like OPF Memory and OPF number of users.

Monitoring the Production Environment

A comprehensive monitoring procedure should be in place in the production environment. Things to monitor include:

Preventative Medicine

Before a single user logs in, the following procedures should be in place and tested:

Pay me now or pay me (much more) later

Make sure a backup and recovery plan has been well designed and in place. Large databases need multiple backup devices and should use open checkpoint. Make sure that online checkpoint backs up the dump file; without it, the backup is useless. One more thing: make sure a recovery from backup is performed and working before you need it.

Back to top


Physical Database Design

Many factors determine the optimal physical database design. Some important factors are controlled redundancy, choosing the appropriate storage structures, and transaction, query, and concurrency design.

A technique that can be critical to attaining a good level of performance is controlled redundancy. Data redundancy taken to its extreme is undesirable; logically inconsistent databases become more likely in this scenario. The extra work of excessive joins in the highly normalized database is exchanged for extra work in keeping redundant data in sync. Controlled redundancy is the tradeoff between the two. Examples of controlled redundancy include calculated values like order totals, part counts, and so on, stored in a header table.

There are a number of CA-Ingres specific techniques that are available to aid in implementing controlled redundancy. For example, it is possible to add extra non-key columns to secondary indices. As discussed earlier, this allows the optimizer to build query plans that bypass base tables in some instances. This can greatly increase performance of key queries. Note, however, that CA-Ingres now has to keep this redundant data in sync, resulting in an extra I/O.

Storage Structure Tradeoffs

CA-Ingres offers four major types of storage structures: heap, hash, ISAM, and BTREE. The choice of structure depends upon the queries themselves: exact match favors hash, range matches favor BTREE.

Even data volatility influences storage structure selection: static tables can be stored as ISAM which otherwise might lead to overflow; dynamic tables might need to be stored as BTREE to avoid overflow chains. Hash is acceptable if the addition of data to the table is evenly distributed to avoid overflow pages between modifies. (Storage structures are discussed in greater detail in Choosing CA-Ingres Storage Structures for Your Database.)

In some cases, it may even make sense to create secondary indices for the benefit of critical queries; for example, if a primary key has an uneven distribution. If the table itself had a storage structure with that column as the key, the length of overflow chains (say in a hash structure) might be too long; it might be better to create a secondary index on that column. Though some parts of the hash secondary may have overflow, the length of the overflow chains would be far less (assuming the key length is short in comparison to the row size). This costs an extra I/O to the base table, but performance would be better. If concurrency is negatively affected because the key values do not hash well, fillfactors and minpages can be used to compensate.

Concurrency concerns also drive the choice of index: high concurrency tables can benefit from the hash index structure - this spreads out the users across more pages. The downside is more I/O because the data is spread over more pages. The hash table has a default fillfactor of 50%. Never attempt to modify a hash table to a fillfactor of 100%.

Finally, CA-Ingres offers the ability to spread tables out over multiple locations. This eliminates I/O bottlenecks in multi-user situations. Each location should reside on a separate physical hard disk. The log file, dump files, and work files should be on separate disks as well.

Back to top


Transaction, Query, and Concurrency Design

An obvious, but sometimes ignored, fact is that transactions should only take place after all user input required for that transaction has been completed. In fact, calculations and/or preparation that can take place before the first query in the transaction should be done before the first query of the transaction.

Before transaction design takes place, there should be some kind of modeling tool in place that yields an ER diagram or its equivalent. An ER diagram is useful for tracing the "path" of a query through a database design based on its where clause. By tracing the path of key queries through the database design, potential bottlenecks can be detected and addressed before - and not during - a crisis. It also helps to know the "usage over time" and "query mix over time" profile when performing this analysis.

A key part of obtaining CA-Ingres based systems which perform well is the use of set-oriented queries. However, this does call for increased diligence in preventing escalation due to increased numbers of page locks acquired in a query. More than nine locks per query per table leads to escalation to table level locking. This results in substantial concurrency degradation.

Good transaction design becomes even more important in the distributed database environment; failure to take advantage of set queries is more detrimental to performance. Communication time becomes a larger percentage of transaction time in a distributed system. Row-at-a-time (RAT) processing invokes the most communication overhead possible.

If RAT processing is unavoidable, you may want to pull the dataset into memory, process it, then put it all back into the database. Although this brings up data integrity and concurrency issues, it reduces communication overhead.

If data is held in the client for a long period of time (anything over a few seconds could be considered a long period of time), alternative locking schemes should be employed. A counter or time stamp can be compared at save time, for instance.

Back to top


Application Performance Issues

Application development environments like CA-Ingres/Vision and CA-OpenROAD offer great productivity. They are also designed to be efficient. This does not mean all operations are quick; some are expensive. It is important to know what they are so good judgment can be employed during the coding phase. Some things to be aware of include:

Back to top


New CA-OpenIngres ® 1.1 Features

CA-OpenIngres 1.1 includes many new features which afford performance benefits or offer new opportunities for improved performance management.

Some of the new features are available just by upgrading to CA-OpenIngres 1.1 (referred to as intrinsic changes). Some of the new features require programmatic changes but offer tangible gains in performance and/or functionality.

Some of the intrinsic features include:

Some of the features which require programmatic changes include:

IMA Offers Comprehensive Monitoring Capabilities

IMA is the Ingres Information Management Architecture. IMA allows monitoring of DBMS server information including DMF cache, QSF cache and thread information. Information can be obtained on the GCN (name) and GCC (communication) servers. For instance, it is now possible to use SQL to select the list of all users currently connected to the server. Since IMA uses SQL, it will work over CA-OpenIngres/Net. This allows comprehensive monitoring of a distributed system containing many CA-OpenIngres installations.

Back to top

Ingres Database Reference
To William's Home Page

© William Yuan 2000

Email William