Nologging Recovery

From: William Yuan
Subject: A NOLOGGING problem

Hello Ingres folks,

We're running OI1.2 Patch 6013 on a DEC Alpha (DUNIX v4.0d).

---------------------------------------------------------------------------

Brief history:

Developer used a "set nologging" session (to create a dummy table as select -
thankfully that's all he did) on our production finance database at 
Thurs 5:15pm - a really BAD move! 
Problem detected and Ingres shutdown by DBA at 6pm.

---------------------------------------------------------------------------

To summarise what we've done:

We first tried rollforwarddb to Thurs 6pm (which back then was when we shutdown
Ingres after nologging), we did a verify (-oxtables) and found 3 corrupt tables
(also did a rollforward to Thurs 6:30am on a separate machine and it also had 
the same corrupt tables!). We restored to the Thurs 3am checkpoint which we 
knew to be valid (after a verifydb again), shut ingres down and took copies 
of the underlying binary files for the 3 tables concerned. Then we startup 
ingres, rollforwarddb to Thurs 6pm, shutdown ingres and replace the bad table 
files with the 3am versions. After a full verifydb, we then ran a script to 
apply an audit trail to the 3 tables to bring those tables to 6pm, recreated an 
affected table index, ran full verify again, did some other checks (audit trail 
check before and after "setnologging", check that finance team were happy with 
the data) and finally a ckpdb +j! As a final precaution, we shutdown ingres
and copied the ckp, jnl, dmp files for that new checkpoint to another backup 
location and restarted ingres.

If the developer had done an insert, delete or update on a "real" production
table then we would have only rollforwarded to just before the nologging.

We're looking fine now after two full days of recovery. I guess the big test
will come on Monday when we let the users loose (it's now 11pm Saturday), but
I've got a modify script running at the moment (and it looks good). Whew!

All I can say is, I'm glad I took an immediate backup of the journal, dump
files and aaaaaaaa.cnf and put them in a totally separate location. I also
took an immediate auditdb -inconsist, infodb, logstat, lockstat for later
diagnosis. Also having a spare (fast) machine to do trials recoveries, keep 
periodic backups of jnls, dmp, ckp files and to perform trial rollforwards is
very useful! Anything I missed ?

P.S. It's good to have 2 DBAs who are available for the whole recovery process
(the cross checking can help avoid quite a few mistakes).

Overall, it was a delicate, but worthwhile experience.

Now to regain some sanity and sleep! ;->

-- 
Regards,
		William


From: Karl & Betty Schendel 
Subject: Re: Fwd: A NOLOGGING problem

>...	But we have invalid tables according to verifydb -mreport -oxtables 
>	- at least two, we're looking for more.

Ugh!  Does verifydb fix the tables, or just piss and moan about them?
Once verifydb is done complaining, can you copy out the table data?
If so, I'd be inclined to run verifydb, copy out, drop the table, recreate,
and copy back in.  Your other alternative is as you suggest, use the
audit trail.

There is one other possibility that very occasionally works:
set trace point dm801
tells DMF to skip bad rows.  Depending on the kind of table damage,
this may allow you to copy all or nearly all of the table out, drop it,
and recreate it.  verifydb is usually better but sometimes I can get
more of the table data with DM801 than I can with verifydb.

Karl R. Schendel, Jr.
K/B Computer Associates   schendel@kbcomputer.com
Ingres and Unix Expertise

Past President, North American Ingres Users Association
past_president@naiua.org




From schendel@kbcomputer.com  Sat Nov  6 14:00:24 1999
From: Karl & Betty Schendel 
Subject: Re: Fwd: A NOLOGGING problem

>	One other question, if we pick an rollfoward end time which is 
>	(accidentally) in the middle of a transaction, it seems to me that
>	Ingres will try to rollback from subsequent journals (after the 
>	end time), looking for an undo operation for that transaction id
>	eg
>....
>	Is my prognosis correct ?
>

Well, as the messages state, Ingres applies everything it sees in the
journals.  Anything left uncommitted at the end of processing has to be
rolled back.  What is a little puzzling to me is why it would have to
go back before the start of the checkpoint;  an online checkpoint is supposed
to stall until any active updating transactions complete or rollback,
to prevent this exact scenario.  As Ingres rolls back the transactions,
it should have everything it needs in the journals since the checkpoint.

It was unclear from your email whether this is causing you fatal difficulties
or not.  As a last resort you might try the -norollback option (or perhaps
it's #norollback), which tells rollforward to just stop.  Of course, it's
then up to you to fix any inconsistencies in the database. 

Karl R. Schendel, Jr.
K/B Computer Associates   schendel@kbcomputer.com
Ingres and Unix Expertise

Past President, North American Ingres Users Association
past_president@naiua.org
Ingres Q & A
Back to William's Home Page

© William Yuan 2000

Email William