Join Backs

Document #: US-13271,EN
------------------------------------------------------------------------------

Major subject: analysis Minor subjects: tech_notes

Keywords:

Abstract:
Join Backs - Equivalent to Release 6 Technical Note #31.

Expert note:
Join Backs
==========

Overview
--------
Because relational theory relies heavily on relating tables based on
commonly shared attributes, most people think of joining a table to
another table in a database. Often, however, there are useful reasons
for joining a table to itself. This document presents several examples
where this technique is necessary to produce the desired result.

o Background
o Comparing Two Rows in a Table
o The Disjoint Query
o Summary

Background
----------
INGRES provides a way in both SQL and QUEL for using an alternative name
for a table in a query. In SQL, the alternative name is called a "corre-
lation name". In the QUEL language it is called a "range variable".
Typically these are used to make typing the query a little easier:

SQL: SELECT r.name,r.office,i.grantno,i.title
FROM researcher r,institution i
WHERE r.grantno=i.grantno

QUEL: RANGE OF r IS researcher
RANGE OF i IS institution
RETRIEVE (r.name,r.office,i.grantno,i.title)
WHERE r.grantno=i.grantno

Two correlation names or range variables can be used for the same table
within a single query. This provides a second "look" at the table while
remaining within the query. Alternately, the second range variable or
correlation name can be thought of as a second copy of the original
table. A query constructed this way is said to be "joining a table to
itself" and is sometimes called a "join back" or "self join".

NOTE: It is good style to use two correlation names (or range variables)
rather than using the table name and a correlation name (or range vari-
able). Two correlation names make it clear that two "copies" of the
table are being used. In addition, two correlation names better indicate
where each copy of the table is being used in the query. Following this
rule will also make "disjoint queries", discussed later, much less
likely. The examples given in this document follow this stylistic rule.

Comparing Two Rows in a Table
-----------------------------
Although there are more rare and unusual queries that employ a join
back, perhaps the most frequent need that requires a join back is the
need to compare two rows within a single table. There are many situa-
tions that involve comparing a row to another row. Some typical examples
are presented below.

1. The employee-manager problem

There are many tables in which one of the key values is also an attri-
bute of another row of the table. For example, a table listing chemical
compounds and their ingredients might have a compound that is also an
ingredient of a different compound. The most popular case of this type
of table is the employee-manager table and query. This involves a per-
sonnel table listing every employee, their salary, and their manager.
However, managers are also included as employees, with their own data
about them. The query attempts to find out which employees have higher
salaries than their managers:

SQL: SELECT p.name,p.salary,p.manager
FROM personnel p,personnel p2
WHERE p.manager=p2.name AND p.salary>p2.salary

QUEL: RANGE OF P IS personnel
RANGE OF p2 IS personnel
RETRIEVE (p.name,p.salary,p.manager) WHERE p.manager=p2.name
AND p.salary>p2.salary

2. Ranking

Producing a list of ranked records is another example of comparing one
row to another. If you want to rank the lowest value '1', the next
lowest rank '2' and so on, then the strategy is to count how many rows
are less than or equal to the value being ranked. If you want to rank
the highest value '1', and so on, then you count how many rows are
greater than or equal to the value being ranked. For example,

SQL: SELECT t1.prog,t1.job,x=COUNT(t2.job)
FROM programs t1,programs t2
WHERE t2.prog=t1.prog AND t2.job < =t1.job GROUP BY t1.prog,t1.job

QUEL: RANGE OF t1 IS programs
RANGE OF t2 IS programs
RETRIEVE (t1.prog,t1.job,x=COUNT(t2.job BY t1.prog,t1.job
WHERE t2.prog=t1.prog AND t2.job < =t1.job))

These queries rank the 'jobs' within each 'prog', numbering them 1,2,3...
and so on.

3. Word Searches

If we can use join backs to create sequential numbering, then we can
also use them to qualify a query from a table where the numbers already
exist. This technique is especially useful in tables containing text
elements. For example, suppose we have a table with the document
number, sequence number, and the word. Each word in each document is on
a separate row of the table. We want to find all occurrences of the
words "join" and "table" that are in the same document and are no more
than ten words apart:

SQL: SELECT DISTINCT w1.docno
FROM words w1,words w2
WHERE w1.word='table' AND w2.word='join' AND w1.docno=w2.docno
AND w1.seqno-w2.seqno < =10

QUEL: RANGE OF w1 IS words
RANGE OF w2 IS words
RETRIEVE UNIQUE (w1.docno)
WHERE w1.word="table" AND w2.word="join"
AND w1.docno=w2.docno AND w1.seqno=w2.seqno < =10

The Disjoint Query
------------------
A special case of joining a table to itself is the disjoint query. This
is a trap that many new users of QUEL fall into. The query creates a
range variable, but the retrieve command uses both the table name and
the range variable. As a result, INGRES joins the table to itself. The
consequences can be drastic.

For example, consider the following retrieval from the "words" table

described above:

QUEL: RANGE OF w IS words /* Disjoint query! */
RETRIEVE (w.word) WHERE words.word="t*" /* Do not do this! */

It is easy to make the mistaken conclusion that this query will retrieve
all words beginning with "t". In fact, all words in the table will be
retrieved. Because "words" and "w" represent two separate looks at the
table, the expression words.word="t*" will always be true for every
w.word retrieved, as long as there is at least one word in the table
beginning with a "t".

A disjoint query that retrieves is an annoyance. But a disjoint query
that deletes or updates can be disastrous! For example, suppose you
wanted to delete a word from the words table, and wanted to update a
misspelled word.

QUEL: RANGE OF w IS words /* Disjoint */
DELETE words WHERE w.word="every" /* QUEL statements */
REPLACE words (word="QUEL") WHERE w.word="QULE" /* Do not do this! */

The result of the delete is that all rows are deleted if there is a sin-
gle occurrence of the word "every". We can think of INGRES as working
with two tables, one called "words" and one called "w". As INGRES goes
through the "words" table, it deletes the row if it finds the word
"every" in the "w" table. A single instance of "every" will delete every
row. Similarly, the replace will replace every word with "QUEL", if
there is a single instance of the misspelled word "QULE".

In SQL, if a single correlation name is created, that correlation name
must be used in the query. Using the table name will produce an error
message. However, if two correlation names are created for the same
table, improper mixing of the two correlation names in a select state-
ment can produce the same results as those described above.

Summary
-------
The "join" is the fundamental tool in relational -----------------------------------------

Ingres Database Reference

To William's Home Page

Email William