Ingres Data sorts

> Quick question : what type of Sort(s) does Ingres use to order its data for
> output or joining etc. ?

There are two data sorters in the Ingres server.  Firstly, the query execution
facility (QEF) has its own sorter, which uses a binary insertion sort.  A
server startup parameter (qef.sort_size) controls the size of the buffer
used by this sorter.  The default is 8 pages (16Kbytes).

Secondly, if the number of tuples being sorted doesn't fit in the QEF sort
buffer, then a separate sorter in the data manipulation facility (DMF) is
invoked.  This is a n-way merge sort using replacement selection through a
heap.  A single sort output work file is used to gather all the runs, which
are then merged into a second output work file.  This process continues,
alternating between reading and writing the two sort work files until the
sort is complete.

The DMF sorter is not really a standard tape merge sort, because the sort
work files are not accessed sequentially, and must be on disk or other random
access device.

In 6.4, it is possible to "stripe" the sort work files across multiple
locations using the 'II_WORKDIR_n' environment variables.  If these are
not defined, then the sort work files are placed in the default location
for the database.

If you want more details of the sorting algorithms, then a standard text
such as Knuth's "Sorting and Searching" will explain much better than I
can!

-Mike Glendinning, Sequent Computer Systems Ltd.




Michael Leo wrote:
> 
> Gajendra Jain  writes:
> 
> >
> > Hi!
> >
> > How does Ingres uses sorting alrgorithm. For Hp -UNIX system does it use
> > default sorting provided in the kernel or Ingres has its on sorting mechanism.
> > Our plan is to replace the sorting used by kernel with a fast sorting
> > software Syncsort. This will help us gain 200-300% on sorting.
> >
> > Need a response from Ingres Gurus.
> >
> > Thanks,
> > Gajendra
> 
> I am reasonably certain that Ingres uses its own internal sorting
> for both in-memory and disk sorts.  I believe this to be true because
> of sort bugs I've run in to on certain platforms in the past.
> 
> The in-memory sort is a bubble sort.  That should amuse the folks
> at Syncsort.
> 
> The disk sorts depend upon the query plan and are highly integrated
> with the Ingres DMF facility.
> 
> In conclusion, I don't believe Syncsort's product will help your
> Ingres performance at all.  I think it is a good product, though.
> 
> Cheers,
> 
> Michael Leo            mleo@cariboulake.com        mal@visi.com
> Caribou Lake Software  http://www.cariboulake.com  Java/RDBMS/Ingres Solutions
> 

It is true that Ingres has its own sort techniques and that external
sort packages cannot be hooked into it. All sorts done as part of query
plan execution start off as QEF in-memory sorts. If the number of rows
being sorted causes memory requirements to exceed the "qef_sort_mem"
config parameter, QEF defers to a DMF disk sort. The QEF sort used to be
an insertion sort (not a bubble sort) which used a binary search to
determine the location of each added row. The only virtue of this
technique was that duplicates were detected as they entered the sort
(since the rows were being ordered as they were read) and could be
discarded before they occupied any sort memory (obviously only for
duplicates removal sorts). It was decided prior to 1.2 that the rather
large performance penalty paid by all sorts (not just duplicates
elimination sorts) for this luxury wasn't worth it. So starting with
1.2, the QEF sort uses the same algorithm as the DMF sort (only without
the disk support). And that algorithm is a heap sort, which I'm told is
one of the more reasonable algorithms kicking around (according to my
data structures and algorithms text, it has n log n performance, as does
Quicksort).

And to confirm earlier responses to the original question, there is
indeed no way to substitute a commercial sort package for any of the
sorting done by Ingres.

Doug.



On Sep 15, 13:53, Doug Inkster wrote:
> ..... So in 1.2 and 2.0, the
> QEF sort is an in-memory heap sort (one of the more reasonable sort
> algorithms, or so I'm told), the same as DMF's algorithm (except DMF
> will offload to disk, if necessary).


Well, that's certainly interesting.  I would have thought it would be
a shell-sort, or a quicksort.  Quicksort is generally considered the
fastest, as long as you are careful to avoid its n^2 behaviour on
already-sorted input.  Heapsort is noted for being an N log N
algorithm in the worst case as well as the average case,  although
with a high constant factor.  So it tends to be not ever especially
fast and not ever particularly slow.

I guess this means that if you've got the memory, large qef_sort_mem
values are now more plausible than they were in the old days.

-- 
Karl Schendel            Phone: (412) 963-8844
Telesis Computer Corp      Fax: (412) 963-1373
wiz@telesismfg.com
Ingres Q & A
To William's Home Page

© William Yuan 2000

Email William