Data and System recovery strategies

Problem:
    Devise a relatively simple and reliable way to provide for data (and server)
"high-availability."  Ideally, we'd like to have dedicated primary and backup
server machines, where the backup machine can be made quickly (e.g. 10 mins)
available in case of hardware and/or software failure on the primary machine.
Very low data loss may be acceptable (e.g. a 15 minute window).
    Note that true fault-tolerance isn't necessary:  a deferred copy
arrangement or some such would be fine.  Indeed, a deferred copy solution may
help in preventing error propagation.  (On the other hand, I'd appreciate
hearing about fault-tolerant solutions.)


Background:
    The two server machines are DEC 5000/240s (Ultrix).  Ingres 6.3 (or 6.4).
3 databases, 350+ MB total, 500+ tables.
    I have worked with technical support on this for a while now.  Although
they have been helpful, their bottom line is essentially 'the suggested
approaches will strain and use Ingres in ways for which it was not designed.
Try talking with Ingres consulting.'
    Thus, it's time to turn to the net.  Below I've listed some approaches
in broad terms.  If anyone has either thoughtful suggestions or known
solutions (!) I would appreciate hearing about them.  Thanks in advance.


Approach 1:
    Use (Ultrix) disk-shadowing.
Problems:
    As far as I have been able to ascertain, only intra-machine
shadowing is available.  Anyone know otherwise?  I assume there are custom
solutions available.  Do I want to hear about these ($$$)?


Approach 2:
    Use Ingres/STAR to automagically provide replication.
Problems:
    Although I recall reading "direction" papers some time ago that hinted
at replication, relation fragmentation, etc., my understanding is that none of
these will be available in the near future.


Approach 3:
    Variations on selective copy schemes.  E.g. use copydb, copy the underlying
files, etc.  If one can partition a database into static and non-static tables,
the overhead of a full checkpoint copy can be avoided.
Problems:
    Inter- as well as intra-table consistency issues.  Down time.  Excessive
data transfer time.


Approach 4:
    Variations on incremental copying schemes.  E.g. use the journaling
system to update the shadow node, create routines to process the log file,
use "triggers" to update the shadow node, etc.
Problems:
    Some of these solutions may get complicated.  Would require tight
administration of databases.  However, this category looks the most promising
at the moment.


Approach 5:
    Application and/or library level shadowing.  Rewrite all applications to
post transactions to both nodes.  Write a cover library over libingres.a
to provide replication (via multiple posting).
Problems:
    Not reliable.  Potentially not simple.


Email or post as seems appropriate; if interest...summary...etc.

Paul Turner, turner@kadsma.kodak.com
Ingres Q & A
To William's Home Page
Email William