Recovery Failure

> Subject: Re: EMERGENCY! Cannot apply journal during rollforward +j

> Hello,
>
> Today our OpenIngres installation crashed. After strating it we had 19
> of our databases inconsistent with REC_OPEN_FAILURE cause of
> inconsistency. I've started to rollforward them. And the second one
> refused to process last journal file. This is our main database. Please
> help.
>
> Nick
>
> P.S. This is an extract from errlog.log with this error:
>
>  ::[II_JSP, 00AFA040]: Thu Jul1 19:22:59 1999
> E_DM93A7_BAD_FILE_PAGE_ADDR Page 21598 in table tran_main_log, owner:
> ingres, database: maindb, has an incorrect page number: 0.Other page
> fields: page_stat 0x00000000,  page_log_address (0x00000000,0x00000000),
> page_tran_id (0x00000000, 0x00000000).Corrupted page cannot be read into
> the server cache.
>  ::[II_JSP, 00AFA040]: Thu Jul1 19:22:59 1999
> E_DM920C_BM_BAD_FAULT_PAGEError faulting a page.
>  ::[II_JSP, 00AFA040]: Thu Jul1 19:22:59 1999
> E_DM9C83_DM0P_CACHEFIX_PAGE An error occurred while fixing a page in the
> buffer manager.
>  ::[II_JSP, 00AFA040]: Thu Jul1 19:22:59 1999
> E_DM9206_BM_BAD_PAGE_NUMBER Page number on page doesn't match its
> location.
>  ::[II_JSP, 00AFA040]: Thu Jul1 19:22:59 1999 E_DM960D_DMVE_PUT  Error
> recovering PUT operation.
>  ::[II_JSP, 00AFA040]: Thu Jul1 19:22:59 1999 E_DM1306_RFP_APPLY_RECORD
> Error occurred attempting to apply rollforward record.
>  ::[II_JSP, 00AFA040]: Thu Jul1 19:22:59 1999 E_DM1301_RFP_ROLL_FORWARD
> An error occurred that prevents further processing.


Martin Bowes wrote:
Hi Nick,
>
    You're Screwed big time!
>
    Heres a thought, I've had to do this once on a (thank the lord) trial and
    it seemed to work. I had a situation where the database recovery was 100%
    okay according to its output and to the errlog. However, the susbsequent
    processing I did on the database revealed that one table was corrupted
    with BAD_PAGE erros as you've seen. I tried several recoveries and they
    all screwed the same table at the same point. I still dont know why.
    It was only recently and CA & I are still working on it!
>
    However, what I found was that with a checkpoint only recovery (ie no
    journals) that the table was 100% okay (as was the rest of the database).
    Clearly you dont want to lose all the journals, in my case (and I suspect
    in yours) the table in question is vital to the integrity of your
    database.
>
    Hence, if the table (and all the others) is okay at the checkpoint only.
    EITHER,
    1. Recover the database as far as you can get with a stock standard
       rollforwarddb. If the database is inconsistent due to the RCP error
       then try forcing it consistent with verifydb. If that doesnt work then
       use the time stamp on your second last journal as an end point to a
       rollforwarddb. Either way, you need the database to be accessible
       before continuing.
>
    2. Work out what the name(s) and location(s) of the underlying UNIX file(s)
       are for your table tran_main_log.
       ie quiz iifile_info. select * from iifile_info where
       table_name='tran_main_log' and owner_name='ingres'
>
    3. Extract the necessary files from the checkpoint tar file(s) and insert
       them into the correct location(s).
>
       Are your spurs jingling yet, cowboy?
>
    You now have a database with everything recovered as far as you can with
    the exception of tran_main_log, which is now recovered upto the time of
    the checkpoint only.
>
    4. Use the audit trails for tran_main_log to construct the necessary
       insert/update/delete operations to recover it to the same point as the
       rest of the database. Hmmm, didnt I make that sound simple!
>
    OR,
    1. If you dont already have a program to recover an entire database from
       an audit trail then I suggest you go to Mike Leo at Caribou Lake (check
       their web site) and buy his.
>
    I think the thing to focus on here is that you are in a very bad position
    as it is. Furthermore, some database is better than none. You have to
    balance a few things with a recovery technique like this. How important
    to you is this database? If the checkpoint only recovery was okay, how
    much would you lose if thats all you got back? Is that loss acceptable?
    If not, do the users of the database have the means to simply reenter all
    the work they've done on it? If they did so, would 'critical numbers' come
    out the same. ie if this is an accounting system you may find invoice
    numbers are auto genereated by the system depending upon the order of
    keying. In which case the users could never hope to get the same numbers.
    Also, How long is this all going to take?
>
    Best of luck,
>
    Martin Bowes
>
--
Random Earthworm Jim Quote #21:
PsyCrow - Maximum Suckage!


From: Rue Pham 

Hi Martin,
I agreed with Martin.  I had the same experiences a couple of month ago.  I
rollforwarded to the checkpoint, then asked people to reenter their transactions.
We wasted a lot of time tried to applied the journals.
I haven't tried to copy the files. Since a transaction could span more than one
physical files. I don't feel comfortable with this step.

Rue
Ingres Q & A
Back to William's Home Page

© William Yuan 2000

Email William