> I am very interesting on the rollforward database to a remote site to > try to recover database as much as you can. > > Can you tell me more detail about how did you do it ? Do you do > checkpoint once a day and turn on journal, then copy the journal file > to remote node during the day. So you can rollforward database on > the other machine if anything happen ? > > Because I am thinking about doing it recently, but did not figure out > a really good way to do it ? Do you mind send me back a mail and give > me some info to help ? I have talk to CA about the duplicate server. > We are run openingres 1.2/01 on open vms. But CA said the replicate > server use event log will generate a lot of locking issue. It did not > sounds a good idea. > > Thanks . . . . > > tyyang To recover a database to a remote host with a checkpoint taken on you current host is actually quite straight forward. However, it is totally unsupported by Ingres. The main problem to overcome is that to use a checkpoint/dump and journal to recover your database you need a configuration file associated with that database that correctly states the checkpoint requirements. Remember that a configuration file contains many hardwired paths eg checkpoint area , journal, dump and database area(s) for the database. Furthermore, it also contains pointers to your current hosts log file that will have no meaning to your remote host. This last point is worth remembering as the attempt to use this configuration file may screw the log on your remote host! But we'll go into that later. Note that you CANNOT edit a configuration file. Hence it is necessary to fool it using symbolic links! I can do this on UNIX but as you are using VMS I'm not sure what the equivalent is. Something like a logical name comes to mind but I'm no expert in VMS. Using a symbolic link is not too difficult. However experience has shown that you are better off puting your symbolic links at the highest level in the path not the lowest. This can become critical when attempting to recover several databases form several hosts onto a single host. A real example to clarify the matter. I wish to recover a database from my host avalon to my host bucket. The critical paths on avalon are all prefixed /avalon/prod. The equivalent paths on bucket are all prefixed /bucket/drp. As a complication there are /bucket/prod directories used by a seperate installation that I dont wish to get confused with. Hence on bucket I make the directory /avalon (with suitable permissions). In this directory I place a symbolic link prod -> /bucket/drp. Now any reference to say /avalon/prod/ingres/ingres/data/default/merlin will actually traverse the path /bucket/drp/ingres/ingres/data/default/merlin. Having set up the symbolic links once they can be kept in place. After all why reinvent the wheel. Now, the fun bit! On your remote host: 1. Create a database named after the database you wish to recover, It must be owned by the same username. 2. Spread the database to all extended locations that the lost database used. 3. Checkpoint this database turning journaling on. By turning journaling on all the journal and dump locations are created for you. NOTE: To do this you must make sure that at any checkpoint on your lost hosts that you prepare a file (we call it a Basics file) of sufficient details to be able to do this. Having created the Basics File copy it to your remote host. When you need it it may as well be there waiting for you! I suggest the following should be included in your Basics file: 1. from iidbdb taking a snapshot of the tables: select * from iiuser; select * from iilocations; select * from iidbaccess; select * from iiusergroup; select * from iiextend; select * from iidatabase; select * from iirole; 2. From each of your important databases take. select * from iifile_info the copy.in prepared by copydb -c dbname 3. As you're running OI I also suggest putting your config.dat into the Basics file. 4. There may be many other site dependant files that you will need on your remote host to get your recovered database back to the point that your users can really use it. You now have an empty database of the same name/owner as the lost database. We now just have to get the data back into it. To Do this: 1. Take a copy of the configuration files of the newly created database. These can be very handy. Remember there are two one in the database area and one in the dump area of the database. 2. Take the configuration file of the lost database and copy it into the database area and dump area of the newly created database. If you now perform an infodb -udbowner dbname yuo should now see that that the database is described as if it were from the lost host! Magic. We now simply have to copy all the journals, dumps and saved configuration files from the lost host into the appropriate areas of the new database. If you are using a disk checkpoint than copy this in too. NOTE: You are correct in surmising that the journals get copied to my remote host. Ditto the configuration file, and the dump area comprising both the dump files themseleves and the saved configuration files. Now CHECK what we have got. You should use the hardcoded paths in the infodb output (ie the lost host paths that should be symlinked to new host equivalents) to check that all data locations exist, and that only the default location has any data files in it (these are your system catalogs). Furthermore check that all required journals for the recovery from the checkpooint exist and are uncompressed. Ditto the dumps and saved configuration files. Now you can do your rollforwarddb. Remember I said that the configuration file contains pointers to the log files that may cause grief. This is where your going to find out if they will or not. I have found that 9 times out of 10 there is no problem. However, every now and then the log file corrupts hopelessly. At which point you will have to do an rcpconfig -force_init. HENCE, DURING THIS RECOVERY MAKE DAMN SURE NO ONE ELSE IS USING THE INSTALLATION. SET SERVER CLOSED is a damn useful option! Hence I suggest you practice on an installation you are quite prepared to lose for a an extended period why you reformat the log file. Normally, all is okay and your database will now simply rollforward and recover quite happily. This simply recovers your database. It ain't necessarily usable! What about your user access rights, application code, any specific files your users require to access the system. Any daemon process that need to be running etc. A lot of this is site dependant and I'll let you fill in the gaps. BUT, the critical thing is to test it! Get used to the procedure so that it becomes second nature to you. Try a real test, ie recover a production database to some host and then ask you users to test that databaase and see if they can access it and use it as if it were on the real host. The GOLDEN RULE in Disaster Recovery Planning is if you need to access information from your lost host you FAIL! One thing I forgot to mention wass that the the process that transfers your journals, dumps, current and saved configuration files from your production host(s) to your backup host should be run from cron (or whatever the VMS equivalent is) at a frequency that gives acceptable data loss. I run my transfer script at 15 minute intervals. A further complication is that your Archiver process must be firing up more regularly than this interval. I set my Archiver to wake up at 1% of log file. This normally gives me pretty good journalling. The frequency you choose is a site dependant issue. My people were happy at a 15 minute data loss. I could go more frequent as the transfer process normally takes very little time to run. Furthermore the script should execute on your backup host and fireoff remote jobs on your production host to accomplish its job. I also have a script that monitors the progress of this transfer script and it will send me warnings if the journals or current configuration file for any database is over two hours old. I suggest that you transfer the data to equivalent areas on your backup host. This is for two reasons. One is that these areas will probably be set up with sufficient disk space to handle the job. the second is that it makes sense to store like with like as you'll probably remember where to find it and its easier to explain to others. ie Journals --> II_JOURNAL/recovery/hostname/dbname Dumps/Saved Configuration files --> II_DUMP/recovery/hostname/dbname Current Config File --> II_DATABASE/recovery/hostname/dbname eg On my Disaster Recovery Host (DRH) named bucket I store data from my three production hosts (avalon,electra and cascade). Hence the Journal area on bucket has directories: II_JOURNAL/recovery/avalon II_JOURNAL/recovery/electra II_JOURNAL/recovery/cascade Under each of these is a seperate directory for each of my production databases. PS. One other thing. It is esential that your Disaster Recovery Plan (DRP) be documented. That the document be stored offsite. It should be written in such a manner that any Ingres DBA could walk into the smouldering remains of your office, sit down and commence the recovery of your databases. I dont know if I've achieved that last bit yet, but I am working on it! Martin Bowes. Hi Tyyang, > > 1. Where did you setup your archiver wake up every 15 mins. Is it in > the configuration files ? > You set the archiver wakep by percentage of log file not by time interval. This is a configurable parameter in you logging system. It defaults to 5%. To see its current value run 'showrcp' if you use 6.4 or go into CBF Logging Parameters and check the appropriate value. If you wish to change it, then you really have to shutdown the installation and then reconfigure your logging system. You can get a feel for what timeframe X% of the log file translates to by monitoring the II_ACP.LOG file. > 2. How did you decide when to copy which journal file to remote machine > every 15 mins. Will it happen that the journal file is written by the > archiver at that point ? Or you just copy all the journal files every > 15 mins, no matter the journals have been copy before ? My process checks its current set of journals/configuration files first and then determines what the last successful transfer time was and then accesses the production hosts looking for anything that has changed since then. Since I wrote this script a few years ago some kind soul beat me around the ears and told me of this wonderful utility called 'rdist' which would do the job a lot more easil!!! I havent had a chance to recode it yet. I dont know of a VMS equivalent to rdist. Journal files are not flocked by the archiver and may be examined at will. If the archiver happens to be busy writing to one end of it this is not a problem as whatever you miss this time will be picked up next time. If the machine goes down in the mean time at least the journal you've got is still usable and heaps better than no journal at all. > > 3. Did you do on-line checkpoint ($ckpdb dbname ) or off-line checkpoint > ($ckpdb -l dbname ) every night ? To turn on journal on the database > is using $ckpdb +j dbname. It will do the off-line checkpoint and > start the journal file. After that do I need do $ckpdb +j dbname > every night or I just need do $ckpdb dbname. > All my checkpoints are online. The only time I do an offline checkpoint is when a dire emergency forces me to do so! Having turned journaling on with ckpdb +j there is no need to do it again. All subsequent checkpoints may be with just ckpdb dbname. After all, turning journalling off/on requires an offline checkpoint, and the loss of access to the database for your users may piss them off! One further thing you could consider for your checkpoints is checkpointing directly to your backup host from your production host. This can be very usefull and save considerable time in a backup both by having instant access to a checkpoint file(s) on your backup host and by removing the chance of not correctly positioning a tape during the recovery phase. To do this I export a disk from my backup host and mount it on my production hosts. I then replace the checkpoint directory of a production database with a symbolic link to an appropriatly named directory on this disk. I still use tape checkpoints of all my production databases, furthermore these tape checkpoints occur at a much higher frequency than my disk checkpoints. But the disk checkpoints are always my first resort when attempting to recover a database. The extra time required to process more journals from an older checkpoint is not enough to worry me. Martin Bowes Hi Con, My Ex Manager has actually met people whose Recovery Strategy is based on the statement 'Oh, we do a checkpoint about once every fortnight, isnt that enough?'. Aye Carrumba, Someones Dogs are barking tonight! I absolutely agree with both points you have raised, namely: 1. Having completed your checkpoint to tape, you MUST backup the II_DUMP, II_JOURNAL and II_SYSTEM areas to tape. Preferably the same tape! Other areas could also be backed up eg the home areas of all users, particularly the ingres user. At my site we do a full system dump of everything except II_DATABASE after the checkpoint. 2. The first part of testing a new release should be an attempt to recover a database. Any decent database for such a test must be multilocation, have many tables of all structures , some of which should be multilocation and some should have secondary indicies. I like to test that I can recover the correct number of tables, rows on the table, users, grants, usergroups roles indicies etc. A good test should also include the recovery of dumping and journaling activity. I think we should all remember the famous multilocation BTREE recovery failure of a few years ago. Ouch! Martin Bowes
© William Yuan 2000
Email William