From: emvy@platon.com (Maarten Veerman) Subject: Attention DBAs: Checkpoint/Journalling has SERIOUS flaws I found this out the hard way. Needless to say, I'm very choked, but maybe this will save you some hairs... Scenario: UNIX system with 6.4/04 Ingres Disk1: UNIX, page space Disk2: II_SYSTEM Disk3: II_SYSTEM/ingres/jnl II_SYSTEM/ingres/ckp II_SYSTEM/ingres/dmp Every night, the system backs up all the above, then takes a checkpoint. One day (late morning), Disk2 crashes. No problem, you replace the disk, then restore your backup of II_SYSTEM, being careful not to overwrite the jnl|ckp|dmp directories. Everyone is concerned about lost work, but you tell them its no problem, Ingres has this terrific journalling... So you do the above, then issue the rollforwarddb command to recover work from the checkpoint & journal files, expecting it to work... WRONG WRONG WRONG! What happens is that Ingres looks at the aaaaaaaa.cnf file in the data directory (not the dmp copy), sees that its different from the one in the dmp directory, aborts the rollforward, and -without warning, mind you- OVERWRITES the file in the dmp directory! STUPID STUPID STUPID! Now, the checkpoint & journal you have on Disk3 are useless! You might as well delete them, because there is NO WAY you can use them. The aaaaaaaa.cnf file in the data directory is the one the system uses, and in this case it was restored from the backup. The checkpoint and journal files can only be accessed if you have the CURRENT copy of the aaaaaaaa.cnf file, which existed on the dmp directory, but was overwritten by the abortive rollforward. To make matters worse, the aaaaaaaa.cnf file which is "tar"ed into the latest checkpoint (and can be extracted) is useless, since it indicates that the checkpoint it is a part of is INVALID - of course, since the checkpoint it is a part of has not yet completed when it is "tar"ed... What I learned: 1) Copy the aaaaaaaa.cnf from the dmp directory to the data directory before doing rollforward - Ingres must be down while this is done 2) There is no Ingres tool to correct the aaaaaaaa.cnf file if it is in error; Ingres should be taken to task for this. 3) Make backup copies of your journal & aaaaaaaa.cnf files before trying a rollforward; if the roll fails it will trash these files and anything else that's handy. 4) The documentation makes no mention of any of this, and probably a lot of other *very* important things as well. Thank you. I feel much better. Can I have my bullets back now, please? -- MV ----------------------------------------------------------------------------- -MV (Maarten Veerman) Platon Consulting Limited, New Westminster, B.C. emvy@platon.com AIX-UNIX, Ingres, & Forest Industry Applications ----------------------------------------------------------------------------- From: mal@winternet.com (Michael Leo) Subject: Re: Attention DBAs: Checkpoint/Journalling has SERIOUS flaws In article <4d6uhm$s5m@grid.direct.ca> emvy@platon.com (Maarten Veerman) writes: Maarten, Some of this is covered in my paper on the NAIUA site under the Ingres World 93 area. But the point about backing up the dmp and jnl directories is not there. Nor is the destruction of the .cnf file. Things I overlooked. I would recommend you do not recover the data on the II_SYSTEM disk. If there is NO .cnf file there, Ingres will use the dmp copy. This should OBVIOUSLY be documented. I will include this in the paper I am writing for Ingres World '96. Good points! Hope you feel better soon, |---------------------------------------------------| | Michael Leo York & Associates, Inc. | | mal@winternet.com (612) 921-8083 (voice) | | Minneapolis, MN, USA | |---------------------------------------------------| | NAIUA = North American Ingres Users Association | | http://www.naiua.org/ingres/ | | Ingres FAQ is at ftp.naiua.org in /pub/ingres | |---------------------------------------------------|
© William Yuan 2000
Email William