Join Backs

                         Document #: US-13271,EN
------------------------------------------------------------------------------

Major subject: analysis    Minor subjects: tech_notes

Keywords: 

Abstract:
Join Backs - Equivalent to Release 6 Technical Note #31.


Expert note:
Join Backs 
==========

Overview
--------
Because relational theory relies heavily on relating tables based on
commonly shared attributes, most people think of joining a table to
another table in a database.  Often, however, there are useful reasons
for joining a table to itself.  This document presents several examples
where this technique is necessary to produce the desired result.

o	Background
o	Comparing Two Rows in a Table
o	The Disjoint Query
o	Summary

Background
----------
INGRES provides a way in both SQL and QUEL for using an alternative name
for a table in a query. In SQL, the alternative name is called a "corre-
lation name". In the QUEL language it is called a "range  variable".  
Typically these are used to make typing the query a little easier:

SQL:  SELECT r.name,r.office,i.grantno,i.title
      FROM researcher r,institution i
      WHERE r.grantno=i.grantno

QUEL: RANGE OF r IS researcher
      RANGE OF i IS institution
      RETRIEVE (r.name,r.office,i.grantno,i.title)
      WHERE r.grantno=i.grantno

Two correlation names or range variables can be used for the same  table
within a single query.  This provides a second "look" at the table while
remaining within the query.  Alternately, the second range variable or
correlation name can be thought of as a second copy of the original
table.  A query constructed this way is said to be "joining a table to
itself" and is sometimes called a "join back" or "self join".

NOTE: It is good style to use two correlation names (or range variables)
rather than using the table name and a correlation name (or range vari-
able).  Two correlation names make it clear that two "copies" of the
table are being used. In addition, two correlation names better indicate
where each copy of the table is being used in the query.  Following this
rule  will  also  make  "disjoint  queries",  discussed later, much less
likely. The examples given in this document follow this stylistic rule.


Comparing Two Rows in a Table
-----------------------------
Although there are more rare and unusual  queries  that  employ  a  join
back,  perhaps  the  most frequent need that requires a join back is the
need to compare two rows within a single table. There  are  many  situa-
tions that involve comparing a row to another row. Some typical examples
are presented below.

1. The employee-manager problem

There are many tables in which one of the key values is also  an  attri-
bute  of another row of the table. For example, a table listing chemical
compounds and their ingredients might have a compound that  is  also  an
ingredient  of  a different compound. The most popular case of this type
of table is the employee-manager table and query. This involves  a  per-
sonnel  table  listing  every employee, their salary, and their manager.
However, managers are also included as employees, with  their  own  data
about  them.  The query attempts to find out which employees have higher
salaries than their managers:

SQL: SELECT p.name,p.salary,p.manager
     FROM personnel p,personnel p2
     WHERE p.manager=p2.name AND p.salary>p2.salary

QUEL: RANGE OF P IS personnel
      RANGE OF p2 IS personnel
      RETRIEVE (p.name,p.salary,p.manager) WHERE p.manager=p2.name
      AND p.salary>p2.salary

2. Ranking

Producing a list of ranked records is another example of  comparing  one
row  to  another.  If  you  want  to rank the lowest value '1', the next
lowest rank '2' and so on, then the strategy is to count how  many  rows
are  less  than  or equal to the value being ranked. If you want to rank
the highest value '1', and so on, then  you  count  how  many  rows  are
greater than or equal to the value being ranked. For example,

SQL: SELECT t1.prog,t1.job,x=COUNT(t2.job)
     FROM programs t1,programs t2
     WHERE t2.prog=t1.prog AND t2.job < =t1.job GROUP BY t1.prog,t1.job

QUEL: RANGE OF t1 IS programs
      RANGE OF t2 IS programs
      RETRIEVE (t1.prog,t1.job,x=COUNT(t2.job BY t1.prog,t1.job
      WHERE t2.prog=t1.prog AND t2.job < =t1.job))

These queries rank the 'jobs' within each 'prog', numbering them 1,2,3...
and so on.

3. Word Searches

If we can use join backs to create sequential  numbering,  then  we  can
also  use them to qualify a query from a table where the numbers already
exist. This technique is especially useful  in  tables  containing  text
elements.   For  example,  suppose  we  have  a  table with the document
number, sequence number, and the word. Each word in each document is  on
a  separate  row  of  the table.  We want to find all occurrences of the
words "join" and "table" that are in the same document and are  no  more
than ten words apart:

SQL: SELECT DISTINCT w1.docno
     FROM words w1,words w2
     WHERE w1.word='table' AND w2.word='join' AND w1.docno=w2.docno
     AND w1.seqno-w2.seqno < =10

QUEL: RANGE OF w1 IS words
      RANGE OF w2 IS words
      RETRIEVE UNIQUE (w1.docno)
      WHERE w1.word="table" AND w2.word="join"
      AND w1.docno=w2.docno AND w1.seqno=w2.seqno < =10


The Disjoint Query
------------------
A special case of joining a table to itself is the disjoint query.  This
is  a  trap  that  many new users of QUEL fall into. The query creates a
range variable, but the retrieve command uses both the  table  name  and
the  range  variable. As a result, INGRES joins the table to itself. The
consequences can be drastic.

For example, consider the following retrieval  from  the  "words"  table

described above:

QUEL: RANGE OF w IS words                        /* Disjoint query!  */
      RETRIEVE (w.word) WHERE words.word="t*"   /* Do not do this!  */

It is easy to make the mistaken conclusion that this query will retrieve
all  words  beginning  with "t". In fact, all words in the table will be
retrieved. Because "words" and "w" represent two separate looks  at  the
table,  the  expression  words.word="t*"  will  always be true for every
w.word retrieved, as long as there is at least one  word  in  the  table
beginning with a "t".

A disjoint query that retrieves is an annoyance. But  a  disjoint  query
that  deletes  or  updates  can  be disastrous!  For example, suppose you
wanted to delete a word from the words table, and  wanted  to  update  a
misspelled word.

QUEL: RANGE OF w IS words                                /* Disjoint  */
      DELETE words WHERE w.word="every"                 /*  QUEL statements */
      REPLACE words (word="QUEL") WHERE w.word="QULE"  /* Do not do this!  */

The result of the delete is that all rows are deleted if there is a sin-
gle  occurrence  of  the word "every". We can think of INGRES as working
with two tables, one called "words" and one called "w". As  INGRES  goes
through  the  "words"  table,  it  deletes  the row if it finds the word
"every" in the "w" table. A single instance of "every" will delete every
row.  Similarly,  the  replace  will  replace every word with "QUEL", if
there is a single instance of the misspelled word "QULE".

In SQL, if a single correlation name is created, that  correlation  name
must  be  used  in the query. Using the table name will produce an error
message. However, if two correlation names  are  created  for  the  same
table,  improper  mixing of the two correlation names in a select state-
ment can produce the same results as those described above.


Summary
-------
The "join" is the fundamental tool in relational -----------------------------------------
Ingres Database Reference
To William's Home Page

© William Yuan 2000

Email William