> From: Roger Hill> Subject: Re: Replicating without replicator > > > >On Wed, 3 Mar 1999, Mark Crowder wrote: > > > >>The powers that be in my organisation have decreed that they want to > >>replicate an OI1.2 database between DEC Unix machines. Some of the > >>table names are too long, and the chance of getting these changed is > >>virtually nil. I have a recollection of other people doing clever > >>things to achieve similar effects on system, e.g. copying journal > >>files. Does anyone have any suggestions? > >> > >Mark, what I am doing at the moment is something like this: > > > >Run a checkpoint. > >Back up the dump, journal, and checkpoint areas (I actually just > >create a tar file on disk) > > > >FTP the tar file to machine #2, which has an IDENTICAL structure in > >terms of directories, locations etc for Ingres. > >Delete all files from the dump,journal and checkpoint areas on > >machine 2, after dropping Ingres. > >Untar, bring up Ingres > >Rollforward each DB. > > > >This gives me a hot backup, albeit up to about 24 hours old, that I > >can use for recovery if the main machine crashes. I use it for running > >test reports, and for development as well. > > > >Works for me. > > > >(Thanks to Mike Leo for the original suggestion). > >Roger > >-- > >=========================================================================== > = > >Roger Hill, McEnearney Alstons (Barbados) Ltd E-mail:rhill@mcalbds.com > >Tel:246-426-5764/230-9596 Fax:246-228-5796 > >=========================================================================== Hi, I use this method in my Disaster Recovery Plan and have done so for some considerable time. With great success. In short, I can recover any of our administrative databases from a checkpoint taken on any of their hosts to a single site. Databases here are of the order of 5G. A recovery can take typically 3hours to achieve. It is very easy to automate the transfer of data from your production hosts to their designated disaster recovery host. I run a transfer of journals, dumps and configuration files from each database of interest on a 15 minute cycle. Although for special times of year I increase this cycle to 5 minutes. I also have the checkpoints of these databases going via a symlink to an NFS mounted dik exported from the Disaster Recovery Host. Hence, at all times on my Disaster Recovery Host I have access to the most recent Disk checkpoint (Tape checkpoints will do just as well though). I also have (to within 15 minutes) the current configuration file, journals, dumps and saved configuration files (c*.dmp). Even this alone is not quite enough to guarantee a recovery. You also need to know the details of each database you wish to recover. For instance the database owner, what locations it uses, who is permitted to acces the database etc. It's to help in thjis area that I prepare what I refer to as a 'Basics File'. It simply contains basic details of the databases and installation. And is simply the result of the following queries run on the production hosts iidbdb: select * from iiuser; select * from iilocations; select * from iidbaccess; select * from iidbpriv; select * from iiusergroup; select * from iiextend; select * from iidatabase; select * from iirole; For completeness, I also include the fule iifile_info listing from each database in the installation as well as the copy.in (from unloaddb) for each database. Also, throw in the config.dat, symbol.tbl, termcaps, maps. (It wont hurt) The basics file is prepared at every checkpoint and transferred automatically by the checkpoint script to the Disaster Recovery Host. You do have to be carefull with the values of II_DATABASE, II_CHECKPOINT, II_JOURNAL, II_DUMP, II_WORK on each of the seperate hosts. If you have a semi decent naming strategy you will find that symlinks can be installed so that any path on your admin host can be appropriatly replicated on your Disaster Recovery Host. Finally to recover the database. 1. Use the information on database ownership and default locations and access type for iidatabase in the basics file to create the database on the DRH. 2. Checkpoint this database turning journaling on (this creates the default journal and dump directories) 3. Save the new databases config files (both dump and database copies). These will be VERY useful if something really screws up. 4. Copy the config file transferred from the production database to the datbase and dump areas of the new database. You should now be able to issue command infodb dbname on the Disaster Recovery Host and see an infodb listing that is taken purely from the production host. ie production host path names and the checkpoint, journal and dump history from the production database. 5. Now copy in the appropriate journals for your recovery from the storage area on the Disaster recovery Host, ditto the dumps and saved config files. 6. If using a disk checkpoint then simply copy the checkpoint tar files to the appropriate directory. If using a tape simply mount and position the tape. 7. Check everything is in the correct spot. ie use the infodb listing of paths to confirm you have your symlinks right! Remember we are using symlinks to fool the config file. If these arent right the recovery will spew bile at you! 8. Roll from checkpoint exactly as you would normally. This method will work on 6.4 and OI. Its actually better in OI as it has a fixed alterdb command which can be usefull. The reason I say that is because of the shortcoming in this method. Remember that we are using the config file form another installation. It contains many pointers to the log file of its installation. Thay have no relevance on this installation and MAY cause the installation to believe it has a corrupted log file. That isnt fun. To guarantee that this wont occur: in OI: alterdb -disable_journaling dbname followed by ckpdb +j dbname in 6.4: You have to scrub the log file with an iistartup -init. Martin Bowes PS. Of course recovering a database is one thing. Now we have to recover the user details, allowing for some users having access to multiple hosts, and being in widly differnt UNIX groups. You have to be particularly careful here as you must construct a methodology that will allow you to restore the users back from the Disaster Recovery Host to the original hosts without scrambeling their files. They get real pissed off if you dont get this right! PPS. Dont forget printers as well. PPPS. The obvious expansion from this method is clustering. I'm currently investigating this option and I must say it looks very promising. > I am very interesting on the rollforward database to a remote site to > try to recover database as much as you can. > > Can you tell me more detail about how did you do it ? Do you do > checkpoint once a day and turn on journal, then copy the journal file > to remote node during the day. So you can rollforward database on > the other machine if anything happen ? > > Because I am thinking about doing it recently, but did not figure out > a really good way to do it ? Do you mind send me back a mail and give > me some info to help ? I have talk to CA about the duplicate server. > We are run openingres 1.2/01 on open vms. But CA said the replicate > server use event log will generate a lot of locking issue. It did not > sounds a good idea. > > Thanks . . . . > > tyyang Hi tyyang, To recover a database to a remote host with a checkpoint taken on you current host is actually quite straight forward. However, it is totally unsupported by Ingres. The main problem to overcome is that to use a checkpoint/dump and journal to recover your database you need a configuration file associated with that database that correctly states the checkpoint requirements. Remember that a configuration file contains many hardwired paths eg checkpoint area , journal, dump and database area(s) for the database. Furthermore, it also contains pointers to your current hosts log file that will have no meaning to your remote host. This last point is worth remembering as the attempt to use this configuration file may screw the log on your remote host! But we'll go into that later. Note that you CANNOT edit a configuration file. Hence it is necessary to fool it using symbolic links! I can do this on UNIX but as you are using VMS I'm not sure what the equivalent is. Something like a logical name comes to mind but I'm no expert in VMS. Using a symbolic link is not too difficult. However experience has shown that you are better off puting your symbolic links at the highest level in the path not the lowest. This can become critical when attempting to recover several databases form several hosts onto a single host. A real example to clarify the matter. I wish to recover a database from my host avalon to my host bucket. The critical paths on avalon are all prefixed /avalon/prod. The equivalent paths on bucket are all prefixed /bucket/drp. As a complication there are /bucket/prod directories used by a seperate installation that I dont wish to get confused with. Hence on bucket i make the directory /avalon. In this directory I place a symbolic link prod -> /bucket/drp. Now any reference to say /avalon/prod/ingres/ingres/data/default/merlin will actually traverse the path /bucket/drp/ingres/ingres/data/default/merlin. Having set up the symbolic links once they can be kept in place. Afterall why reinvent the wheel. Now, the fun bit! on your remote host: 1. Create a database named after the database you wish to recover, It must be owned by the same username. 2. Spread the database to all extended locations that the lost database used. 3. Checkpoint this database turning journaling on. By turning journaling on all the journal and dump locations are created for you. NOTE: To do this you must make sure that at any checkpoint on your lost hosts that you prepare a file (we call it a Basics file) of sufficient details to be able to do this. Having created the Basics File copy it to your remote host. When you need it it may as well be there waiting for you! I suggest the following should be included in your Basics file: 1. from iidbdb taking a snapshot of the tables: select * from iiuser; select * from iilocations; select * from iidbaccess; select * from iiusergroup; select * from iiextend; select * from iidatabase; select * from iirole; 2. From each of your important databases take. select * from iifile_info the copy.in prepared by copydb -c dbname 3. As your running OI I also suggest putting your config.dat into the Basics file. 4. There may be many other site dependant files that yuo will need to on your remote host to get your recovered database back to the point that your users can really use it. You now have an empty database of the same name/owner as the lost database. We now just have to get the data back into it. To Do this: 1. Take a copy of the configuration files of the newly created database. These can be very handy. Remember there are two one in the database area nad one in the dump area of the database. 2. Take the configuration file of the lost database and copy it into the database area and dump area of the newly created database. If you now perform an infodb -udbowner dbname yuo should now see that that the database is described as if it were from the lost host! Magic. We now simply have to copy all the journals, dumps and saved configuration files from the lost host into the appropriate areas of the new database. If you are using a disk checkpoint than copy this in too. NOTE: You are correct in surmising that the journals get copied to my remote host. Ditto the configuration file, and the dump area comprising both the dump files themseleves and the saved configuration files. Now CHECK what we have got. You should use the hardcoded paths in the infodb output (ie the lost host paths that should be symlinked to new host equivalents) to check that all data locations exist, and that only the default location has any data files in it (these are your system catalogs). Furthermore check that all required journals for the recovery from the checkpooint exist and are uncompressed. Ditto the dumps and saved configuration files. Now you can do your rollforwarddb. Remember I said that the configuration file contains pointers to the log files that may cause grief. This is where your going to find out if they will or not. I have found that 9 times out of 10 there is no problem. However, every now and then the log file corrupts hopelessly. At which point you will have to do an rcpconfig -force_init. HENCE, DURING THIS RECOVERY MAKE DAMN SURE NO ONE ELSE IS USING THE INSTALLATION. SET SERVER CLOSED is a damn usefull option! Normally, all is okay and your database will now simply rollforward and recover quite happily. This simply recovers your database. It aint necessarily usable! What about your user access rights, application code, any specific files your users require to access the system. Any daemon process that need to be running etc. A lot of this is site dependant and I'll let you fill in the gaps. BUT, the critical thing is to test it! Get used to the procedure so that it becomes second nature to you. Try a real test, ie recover a production database to some host and then ask you users to test that databaase and see if they can access it and use it as if it were on the real host. The GOLDEN RULE in Disaster Recovery Planning is if you need to access information from your lost host you FAIL! Martin Bowes. Hi Tyyang, Hope you have success! One thing I forgot to mention wass that the the process that transfers your journals, dumps, current and saved configuration files from your production host(s) to your backup host should be run from cron (or whatever the VMS equivalent is) at a frequency that gives acceptable data loss. I run my transfer script at 15 minute intervals. A further complication is that your Archiver process must be firing up more regularly than this interval. I set my Archiver to wake up at 1% of log file. This normally gives me pretty good journalling. The frequency you choose is a site dependant issue. My people were happy at a 15 minute data loss. I could go more frequent as the transfer process normally takes very little time to run. Furthermore the script should execute on your backup host and fireoff remote jobs on your production host to accomplish its job. I also have a script that monitors the progress of this transfer script and it will send me warnings if the journals or current configuration file for any database is over two hours old. I suggest that you transfer the data to equivalent areas on your backup host. This is for two reasons. One is that these areas will probably be set up with sufficient disk space to handle the job. the second is that it makes sense to store like with like as you'll probably remember where to find it and its easier to explain to others. ie Journals --> II_JOURNAL/recovery/hostname/dbname Dumps/Saved Configuration files --> II_DUMP/recovery/hostname/dbname Current Config File --> II_DATABASE/recovery/hostname/dbname eg On my Disaster Recovery Host (DRH) named bucket I store data from my three production hosts (avalon,electra and cascade). Hence the Journal area on bucket has directories: II_JOURNAL/recovery/avalon II_JOURNAL/recovery/electra II_JOURNAL/recovery/cascade Under each of these is a seperate directory for each of my production databases. Martin Bowes. PS. One other thing. It is esential that your Disaster Recovery Plan (DRP) be documented. That the document be stored offsite. It should be written in such a manner that any Ingres DBA could walk into the smouldering remains of your office, sit down and commence the recovery of your databases. I dont know if I've achieved that last bit yet, but I am working on it! Hi Tyyang, > > 1. Where did you setup your archiver wake up every 15 mins. Is it in > the configuration files ? > You set the archiver wakep by percentage of log file not by time interval. This is a configurable parameter in you logging system. It defaults to 5%. To see its current value run 'showrcp' if you use 6.4 or go into CBF Logging Parameters and check the appropriate value. If you wish to change it, then you really have to shutdown the installation and then reconfigure your logging system. You can get a feel for what timeframe X% of the log file translates to by monitoring the II_ACP.LOG file. > 2. How did you decide when to copy which journal file to remote machine > every 15 mins. Will it happen that the journal file is written by the > archiver at that point ? Or you just copy all the journal files every > 15 mins, no matter the journals have been copy before ? My process checks its current set of journals/configuration files first and then determines what the last successful transfer time was and then accesses the production hosts looking for anything that has changed since then. Since I wrote this script a few years ago some kind soul beat me around the ears and told me of this wonderful utility called 'rdist' which would do the job a lot more easialy!!! I havent had a chance to recode it yet. I dont know of a VMS equivalent to rdist. Journal files are not flocked by the archiver and may be examined at will. If the archiver happens to be busy writing to one end of it this is not a problem as whatever you miss this time will be picked up next time. If the machine goes down in the mean time at least the journal you've got is still usable and heaps better than no journal at all. > > 3. Did you do on-line checkpoint ($ckpdb dbname ) or off-line checkpoint > ($ckpdb -l dbname ) every night ? To turn on journal on the database > is using $ckpdb +j dbname. It will do the off-line checkpoint and > start the journal file. After that do I need do $ckpdb +j dbname > every night or I just need do $ckpdb dbname. > All my checkpoints are online. The only time I do an offline checkpoint is when a dire emergency forces me to do so! Having turned journaling on with ckpdb +j there is no need to do it again. All subsequent checkpoints may be with just ckpdb dbname. Afterall, turning journalling off/on requires an offline checkpoint, and the loss of access to the database for your users may piss them off! One further thing you could consider for your checkpoints is checkpointing directly to your backup host from your production host. This can be very usefull and save considerable time in a backup both by having instant access to a checkpoint file(s) on your backup host and by removing the chance of not correctly positioning a tape during the recovery phase. To do this I export a disk from my backup host and mount it on my production hosts. I then replace the checkpoint directory of a production database with a symbolic link to an appropriatly named directory on this disk. I still use tape checkpoints of all my production databases, furthermore these tape checkpoints occur at a much higher frequency tahn my disk checkpoints. But the disk checkpoints are always my first resort when attempting to recover a database. The extra time required to process more journals from an older checkpoint is not enough to worry me. Martin Bowes
© William Yuan 2000
Email William