From: William Yuan Subject: A NOLOGGING problem Hello Ingres folks, We're running OI1.2 Patch 6013 on a DEC Alpha (DUNIX v4.0d). --------------------------------------------------------------------------- Brief history: Developer used a "set nologging" session (to create a dummy table as select - thankfully that's all he did) on our production finance database at Thurs 5:15pm - a really BAD move! Problem detected and Ingres shutdown by DBA at 6pm. --------------------------------------------------------------------------- To summarise what we've done: We first tried rollforwarddb to Thurs 6pm (which back then was when we shutdown Ingres after nologging), we did a verify (-oxtables) and found 3 corrupt tables (also did a rollforward to Thurs 6:30am on a separate machine and it also had the same corrupt tables!). We restored to the Thurs 3am checkpoint which we knew to be valid (after a verifydb again), shut ingres down and took copies of the underlying binary files for the 3 tables concerned. Then we startup ingres, rollforwarddb to Thurs 6pm, shutdown ingres and replace the bad table files with the 3am versions. After a full verifydb, we then ran a script to apply an audit trail to the 3 tables to bring those tables to 6pm, recreated an affected table index, ran full verify again, did some other checks (audit trail check before and after "setnologging", check that finance team were happy with the data) and finally a ckpdb +j! As a final precaution, we shutdown ingres and copied the ckp, jnl, dmp files for that new checkpoint to another backup location and restarted ingres. If the developer had done an insert, delete or update on a "real" production table then we would have only rollforwarded to just before the nologging. We're looking fine now after two full days of recovery. I guess the big test will come on Monday when we let the users loose (it's now 11pm Saturday), but I've got a modify script running at the moment (and it looks good). Whew! All I can say is, I'm glad I took an immediate backup of the journal, dump files and aaaaaaaa.cnf and put them in a totally separate location. I also took an immediate auditdb -inconsist, infodb, logstat, lockstat for later diagnosis. Also having a spare (fast) machine to do trials recoveries, keep periodic backups of jnls, dmp, ckp files and to perform trial rollforwards is very useful! Anything I missed ? P.S. It's good to have 2 DBAs who are available for the whole recovery process (the cross checking can help avoid quite a few mistakes). Overall, it was a delicate, but worthwhile experience. Now to regain some sanity and sleep! ;-> -- Regards, William From: Karl & Betty SchendelSubject: Re: Fwd: A NOLOGGING problem >... But we have invalid tables according to verifydb -mreport -oxtables > - at least two, we're looking for more. Ugh! Does verifydb fix the tables, or just piss and moan about them? Once verifydb is done complaining, can you copy out the table data? If so, I'd be inclined to run verifydb, copy out, drop the table, recreate, and copy back in. Your other alternative is as you suggest, use the audit trail. There is one other possibility that very occasionally works: set trace point dm801 tells DMF to skip bad rows. Depending on the kind of table damage, this may allow you to copy all or nearly all of the table out, drop it, and recreate it. verifydb is usually better but sometimes I can get more of the table data with DM801 than I can with verifydb. Karl R. Schendel, Jr. K/B Computer Associates schendel@kbcomputer.com Ingres and Unix Expertise Past President, North American Ingres Users Association past_president@naiua.org From schendel@kbcomputer.com Sat Nov 6 14:00:24 1999 From: Karl & Betty Schendel Subject: Re: Fwd: A NOLOGGING problem > One other question, if we pick an rollfoward end time which is > (accidentally) in the middle of a transaction, it seems to me that > Ingres will try to rollback from subsequent journals (after the > end time), looking for an undo operation for that transaction id > eg >.... > Is my prognosis correct ? > Well, as the messages state, Ingres applies everything it sees in the journals. Anything left uncommitted at the end of processing has to be rolled back. What is a little puzzling to me is why it would have to go back before the start of the checkpoint; an online checkpoint is supposed to stall until any active updating transactions complete or rollback, to prevent this exact scenario. As Ingres rolls back the transactions, it should have everything it needs in the journals since the checkpoint. It was unclear from your email whether this is causing you fatal difficulties or not. As a last resort you might try the -norollback option (or perhaps it's #norollback), which tells rollforward to just stop. Of course, it's then up to you to fix any inconsistencies in the database. Karl R. Schendel, Jr. K/B Computer Associates schendel@kbcomputer.com Ingres and Unix Expertise Past President, North American Ingres Users Association past_president@naiua.org
© William Yuan 2000
Email William