Server shutdown

 paul@pafpaf.demon.co.uk (Paul Farrow) writes:
 >Please let me know exactly how often, you out there 
 >in industry do actually shut the server down.

We try and do it daily, and at least once a week.

This is a very conservative approach, but it has solved some problems for
us. The one hassle is that the server sometimes says:
'Error limit for server reached' and then spontaneously shuts down. We
have quite a lot of Lock-related 'errors' like escalating to table level
locks and deadlocks. The problem here is that we work with a package,
so we do not have the source code to change.

We also have users that like disconnecting without first logging out (ie,
switching off the terminal/PC without first exiting from the application)
We have been advised that too much of this causes the server behaviour to
become 'unpredictable'. (!) The only 'cleanup' we know of is to start a
fresh server.

We have also had some problems with 'old' servers getting threads going
into a CS_MUTEX state. (Note that this does not happen if we shut down
regularly... so unless my logic is wrong it has to point to an internal
problem)

I must also mention that before we went into daily shutdowns there has been
weeks that we ran without problems, but to avoid the occasional
'unexplained behaviour' we are doing regular shutdowns.

Some of our servers run on shared cache, so we dont have to interrupt the
service in order to start a fresh server. We just do it in the background
without the users knowing, and the shut the old server once all the
connections have gone.

Regards

--
  Cor

The Data Base Approach cc       _______     Cor Winckler
PO Box 5165, Helderberg        |\      |    Internet: cor@itu2.sun.ac.za
7135 Somerset West             | >=====|    Tel: 0(+27) 21 785 1171
South Africa                   |/______|    Fax: 0(+27) 21 881 3318



"Anne.L.F.Zorner"  wrote:
>mal@winternet.com (Michael Leo) wrote:
>>In article <482da9$p5v@citecuf.citec.qld.gov.au> sgccjkb@citecub (John Babbidge) writes:
>>>1. lock out the users somehow (ie. via /etc/profile or /etc/nologin)
>what if its only the Ingres system that needs to be shutdown at this stage?
>>>2. kill all frontends (abf progs and so one - not any servers)
>seems reasonable but is it allways possible to spot them all?
>>>5. if this fails then "iishutdown -force" is the next option. This step is a 
>>>   little more brutal than "-s" but is supported by CA.
>but in the manual it says that force will make your databases inconsistent

We have used iishutdown -force (Ingres 6.4, SunOS 5.{3,4}) for some time now.
We used to "remove " in iimonitor, but Ingres recommended not doing
this (DB inconsistent probs!). Iishutdown -force has not caused us any problems
(25+ installations); the iidbms server(s) are halted by an iimonitor "stop
server" (rather than the gentle "set server shut", yawn). The Recovery process
gets rcpconfig -shutdown, does its' stuff and then shuts down normally after
the iidbms server dies, so everything is hunky dory. See the iishutdown script

Richard.
~~~~~~~~

P.S. Who knows how OpenINGRES handles this?

John Babbidge (sgccjkb@citecub) wrote:
> Hey Kev,

> We shutdown Ingres automatically on most machines at least once a night for
> various reasons. The basic activities we do are as follows:

> 1. lock out the users somehow (ie. via /etc/profile or /etc/nologin)
> 2. kill all frontends (abf progs and so one - not any servers)
> 3. disable ingres/net (stops incoming network sessions)
> --- now no Ingres user activity should exist so a shutdown should be smooth ---
> 4. iishutdown -s
> 5. if this fails then "iishutdown -force" is the next option. This step is a 
>    little more brutal than "-s" but is supported by CA.
> 6. If this fails you have serious problems that need case by case attention.

> Hope this helps. I assumed that you were UNIX based. If I'm wrong most of the
> above should be applicable to VMS I would think.

> John

Kev,

I might add that killing frontends is only support in Ingres 6.4/05 not in 
6.4/04. A 6.4/04 bug could mean potential corruption when killing front-endr
(still better than using the hack and slash method). If you are running 6.4/04
or a lower 6.4 then I suggest you upgrade as a lot of critical bug fixes have
been made.

Sorry to hear you are the meat in the sandwich. It's sys admins like your senior
sysadm that give the rest of us the cowboy label...

Bye,

John

------------------------------------------------------------------------------
John Babbidge                     _____!_____ email: sgccjkb@citec.qld.gov.au
Centre for Information Technology  _\_(_)_/_
and Communications                   ./ \.

In article <489saj$2sdn@unixfe.rl.ac.uk> "Anne.L.F.Zorner"  writes:

[snip]
>
>>I agree.  What is crucial is that you must wait for dmfrcp to shut down.
>
>aha
>
>>
>>Kev, your system manager has a screw loose.  What good is shutting down
>>the system fast if it takes forever to come back up?
>
>but it shouldnt take forever to take it down either!
>

Re-read my statement: "What good is shutting down the system fast if it 
takes forever to come back up?"

If Ingres has recovery work to do, it has recovery work to do.  If you 
don't like that, use flat files.  EVERY DATABASE SYSTEM DOES THIS  IN
SOME DISGUISED FORM OR ANOTHER.

>>
>>If the dmfrcp (the recovery process) has work to do, it will do that 
>>work NO MATTER HOW MANY TIMES YOU REBOOT.  And it will start over each
>>time.
>
>is there anyway to tell whether this is still trying to do stuff, or we 
>just have a duff user process stuck somewhere?
>

If the dmfrcp (UNIX) or II_RCP (VMS) is using cpu and i/o resource, it 
is doing something.  There is some limited logging in II_RCP.LOG.  I
wish Ingres would print out some kind of progress messages more often.

I have been able to accurately predict when recovery will be complete
using "truss" and from experience reading II_RCP.LOG.  

>>
>>Anybody who says other databases don't have this problem are insane.  ALL
>>transaction management systems currently deployed MUST recover aborted or
>>failed transactions to maintain the integrity of the database.  Period.
>
>Ah but some dont do recovery until restart, after all if you have a power
>failure or something equally catastrophic what would you rather be on...
>a system that can clean up perfectly on restart or a system that needs to
>cleanup on shutdown?

Re-read my statement: "What good is shutting down the system fast if it 
takes forever to come back up?"

I NEVER said Ingres could not do the same.  But why start the recovery
process over after it already has performed some work?  Why take chances
that some strange disk problem could corrupt the log file?

Personally, I think people think magic happens when you reboot.  If
they were the system managers, the power key would be ornamented with
a rabbit's foot.   99% of all reboots are useless, except on Microsoft
operating systems, where a reboot gives the computer a chance to 
momentarily not run the hideous OS within.

>
>Which mainframe databases have you used?
>SQL/DS on VM may have had some problems but for 9 years it has always come back
>up after all types of system crash or FORCE that were used. 
>
>Anne
>

Hmm.  Most mainframers claim that mainframes NEVER crash and are
highly stable and dependable.  Yet every mainframer I know can describe
SQL/DS or DB2 error recovery like they wrote it themselves.  Hmmm....

One last item.  I didn't write Ingres.  I would have done a few things
differently if I had.  I won't defend the quality of the Ingres 
transaction system in the 6.x architecture.  I've found that 
patience and understanding work better than rebooting, wishful thinking,
and voodoo.

Cheers,

|--------------------------------------------------------------------------|
| Michael Leo            | The Ingres FAQ is at ftp.adc.com, /pub/ingres.  |
| York & Associates, Inc.| Also check out /pub/ingres/utilities/NAIUA for  |
| Minneapolis, MN, USA   | the NAIUA Tool Kit.  Lastly, access all this via|
| (612) 921-8083 (voice) | WWW at http://www.adc.com/ingres/ing-top.html.  |
| mal@winternet.com      | All constructive suggestions/criticism welcome. |
|--------------------------------------------------------------------------|

John Babbidge (sgccjkb@citecub) wrote:
: John Babbidge (sgccjkb@citecub) wrote:
: > Hey Kev,

: > We shutdown Ingres automatically on most machines at least once a night for
: > various reasons. The basic activities we do are as follows:

: > 1. lock out the users somehow (ie. via /etc/profile or /etc/nologin)
: > 2. kill all frontends (abf progs and so one - not any servers)
: > 3. disable ingres/net (stops incoming network sessions)
: > --- now no Ingres user activity should exist so a shutdown should be smooth ---
: > 4. iishutdown -s
: > 5. if this fails then "iishutdown -force" is the next option. This step is a 
: >    little more brutal than "-s" but is supported by CA.
: > 6. If this fails you have serious problems that need case by case attention.

: > Hope this helps. I assumed that you were UNIX based. If I'm wrong most of the
: > above should be applicable to VMS I would think.

: > John

: Kev,

: I might add that killing frontends is only support in Ingres 6.4/05 not in 
: 6.4/04. A 6.4/04 bug could mean potential corruption when killing front-endr
: (still better than using the hack and slash method). If you are running 6.4/04
: or a lower 6.4 then I suggest you upgrade as a lot of critical bug fixes have
: been made.

If you have a version of 6.4/05 where the iimonitor command remove session
actually works, instead of producing a success message and little else,
then you can use a script (ingwho does a wonderful job of this) to go
through, and remove all users BEFORE running iishutdown -s.

If however you are running a version where the removes session merely
produces a cheery message of success and little else, you are faced with
either removing all the frontend processes (Anyone who relies on a buggy
version of Ingres that has the potential of corrupting when the fron-end
process dies is asking for trouble), or using unsupport and probably
catastrophic methods of shutting down (ie iishutdown -force).

Actually that's not quite true. It's probably safer to actually switch the
power off to the machine, and then bring it back up. If you run a
journalling filesystem that is. If you don't, I guess you get some risk of
file loss, but hey... That's what the $$ every fortnight are for right? :)

: Sorry to hear you are the meat in the sandwich. It's sys admins like your senior
: sysadm that give the rest of us the cowboy label...

Yeah. Sysadmins should be sent on an Ingres DBA course as a matter of
rote. All ours do (Like me), and it gives a wonderful perspective of
exactly what Ingres can and can't do, and whats safe to do and whats not.

: Bye,

: John

: ------------------------------------------------------------------------------
: John Babbidge                     _____!_____ email: sgccjkb@citec.qld.gov.au
: Centre for Information Technology  _\_(_)_/_
: and Communications                   ./ \.

--
======================================================================
|  Hamish Marson                                                     |
|  Systems Programmer & News Manager          news@news.waikato.ac.nz|
|  Computer Services               | INTERNET h.marson@waikato.ac.nz |
|  University of Waikato           | PHONE    +64 7 8562889 xt 8181  |
|  New Zealand                     | FAX      +64 7 8384066          |
Ingres Q & A
To William's Home Page
Email William