Batch and Online Processing

To add to the good advice given already:

I assume that your on-line transactions are already as small as possible, and
that you are not retaining shared locks (or, God forbid, exclusive locks) over 
user input. Best to use NOLOCK for reads (if you can) on the on line side if 
the data integrity isn't compromised by the batch job. It reduces the locks
that the batch update must wait for.  

The key is the design of the batch job. It must let the locks go quickly to 
let the on-line user update. If not, the users get locked out while your 
batch job wanders over your database. OK, so you code it so that it loops
and commits on a 'row-at-a-time' basis. You issue a commit every loop, to 
release locks and let in the users. Trouble is, you have now written a
Row-At-Time batch job, and it performs like a dog. So what do you do?

As Roy comments, the best thing is not to update your database from
batch and on-line at the same time. If you have to, try to consider
copies of tables and a batch process overnight to bring them in line.
However, sometimes there are real business reasons why you have to do
this. 

One site I know had this problem. They had major data input from batch
(barcodes loaded via LDT terminals as a flat file) loading into their
database during the day. They HAD to be processed, as the data was
required to tell the business what would be on later flights on the
same day. At the same time, a team of data entry clerks banged in lots
of lovely data of a similar nature, from a separate manual source.
This manually entered data was required on the database right away
(and so couldn't be stuck in a file and uplifted like the LDT data).
All this stuff hit the database at once. 

Let me share the pain they went through. The on-line side was OK.
NOLOCK for read, and a brief set of update/inserts at the end of
entry. The batch uplift was very complex, with lots and lots of
integrity checking. 

The batch side was a problem. Written in a non-Ingres 4GL, it
performed Row-at-a-time processing all in a single transaction. It
therefore took a long time, escalated to table locks and knackered the
on-line users. 

Commits were then added to let the on-line users in, but this extended
the run times for the batch job, so much that performance couldn't
keep up. 

The whole batch program was then re-written in ESQL to do normal Chunk At Time
processing. Run times moved from hours to minutes (as normal). But -
the CAT processing was dependent on a large transaction. Every so
often, users got locked out for 3 minutes, while it did it's stuff.

Flat files and copies were considered. But wait - COPY stuffs
READLOCK=NOLOCK. Copy into a table and kiss READLOCK=NOLOCK goodbye.

Finally, the solution was reading all the validation data into memory
quickly and committing. Performing all the validation into temporary
tables. Marking the rows in the temporary tables into sets for the
updates/inserts. The size of the sets are dependent on a value set 
in the database (so it can be varied). Put in considerable effort to
make the program re-runable. Do all the updates/inserts in a final
transaction, looping for each set.

If the 'no of records' values was set to a high value, locking
concurrency was crap, but the batch program was dead fast. If set to
1, it was bloody slow, but concurrency was good. They played around
with it, and left at 100 (nice round number).

Performance is acceptable. But what a nightmare. Only do it if you
have to. If the batch volumes are too high there is NO SOLUTION.

Enjoy.   
-- 
--Mike Prollins
--Apertus Systems Limited             Tel      : (44) 113 256 6000
--Leeds                               Email    : mjp@apertus.demon.co.uk
Ingres Q & A
To William's Home Page

© William Yuan 2000

Email William