Document #: US-13271,EN ------------------------------------------------------------------------------ Major subject: analysis Minor subjects: tech_notes Keywords: Abstract: Join Backs - Equivalent to Release 6 Technical Note #31. Expert note: Join Backs ========== Overview -------- Because relational theory relies heavily on relating tables based on commonly shared attributes, most people think of joining a table to another table in a database. Often, however, there are useful reasons for joining a table to itself. This document presents several examples where this technique is necessary to produce the desired result. o Background o Comparing Two Rows in a Table o The Disjoint Query o Summary Background ---------- INGRES provides a way in both SQL and QUEL for using an alternative name for a table in a query. In SQL, the alternative name is called a "corre- lation name". In the QUEL language it is called a "range variable". Typically these are used to make typing the query a little easier: SQL: SELECT r.name,r.office,i.grantno,i.title FROM researcher r,institution i WHERE r.grantno=i.grantno QUEL: RANGE OF r IS researcher RANGE OF i IS institution RETRIEVE (r.name,r.office,i.grantno,i.title) WHERE r.grantno=i.grantno Two correlation names or range variables can be used for the same table within a single query. This provides a second "look" at the table while remaining within the query. Alternately, the second range variable or correlation name can be thought of as a second copy of the original table. A query constructed this way is said to be "joining a table to itself" and is sometimes called a "join back" or "self join". NOTE: It is good style to use two correlation names (or range variables) rather than using the table name and a correlation name (or range vari- able). Two correlation names make it clear that two "copies" of the table are being used. In addition, two correlation names better indicate where each copy of the table is being used in the query. Following this rule will also make "disjoint queries", discussed later, much less likely. The examples given in this document follow this stylistic rule. Comparing Two Rows in a Table ----------------------------- Although there are more rare and unusual queries that employ a join back, perhaps the most frequent need that requires a join back is the need to compare two rows within a single table. There are many situa- tions that involve comparing a row to another row. Some typical examples are presented below. 1. The employee-manager problem There are many tables in which one of the key values is also an attri- bute of another row of the table. For example, a table listing chemical compounds and their ingredients might have a compound that is also an ingredient of a different compound. The most popular case of this type of table is the employee-manager table and query. This involves a per- sonnel table listing every employee, their salary, and their manager. However, managers are also included as employees, with their own data about them. The query attempts to find out which employees have higher salaries than their managers: SQL: SELECT p.name,p.salary,p.manager FROM personnel p,personnel p2 WHERE p.manager=p2.name AND p.salary>p2.salary QUEL: RANGE OF P IS personnel RANGE OF p2 IS personnel RETRIEVE (p.name,p.salary,p.manager) WHERE p.manager=p2.name AND p.salary>p2.salary 2. Ranking Producing a list of ranked records is another example of comparing one row to another. If you want to rank the lowest value '1', the next lowest rank '2' and so on, then the strategy is to count how many rows are less than or equal to the value being ranked. If you want to rank the highest value '1', and so on, then you count how many rows are greater than or equal to the value being ranked. For example, SQL: SELECT t1.prog,t1.job,x=COUNT(t2.job) FROM programs t1,programs t2 WHERE t2.prog=t1.prog AND t2.job < =t1.job GROUP BY t1.prog,t1.job QUEL: RANGE OF t1 IS programs RANGE OF t2 IS programs RETRIEVE (t1.prog,t1.job,x=COUNT(t2.job BY t1.prog,t1.job WHERE t2.prog=t1.prog AND t2.job < =t1.job)) These queries rank the 'jobs' within each 'prog', numbering them 1,2,3... and so on. 3. Word Searches If we can use join backs to create sequential numbering, then we can also use them to qualify a query from a table where the numbers already exist. This technique is especially useful in tables containing text elements. For example, suppose we have a table with the document number, sequence number, and the word. Each word in each document is on a separate row of the table. We want to find all occurrences of the words "join" and "table" that are in the same document and are no more than ten words apart: SQL: SELECT DISTINCT w1.docno FROM words w1,words w2 WHERE w1.word='table' AND w2.word='join' AND w1.docno=w2.docno AND w1.seqno-w2.seqno < =10 QUEL: RANGE OF w1 IS words RANGE OF w2 IS words RETRIEVE UNIQUE (w1.docno) WHERE w1.word="table" AND w2.word="join" AND w1.docno=w2.docno AND w1.seqno=w2.seqno < =10 The Disjoint Query ------------------ A special case of joining a table to itself is the disjoint query. This is a trap that many new users of QUEL fall into. The query creates a range variable, but the retrieve command uses both the table name and the range variable. As a result, INGRES joins the table to itself. The consequences can be drastic. For example, consider the following retrieval from the "words" table described above: QUEL: RANGE OF w IS words /* Disjoint query! */ RETRIEVE (w.word) WHERE words.word="t*" /* Do not do this! */ It is easy to make the mistaken conclusion that this query will retrieve all words beginning with "t". In fact, all words in the table will be retrieved. Because "words" and "w" represent two separate looks at the table, the expression words.word="t*" will always be true for every w.word retrieved, as long as there is at least one word in the table beginning with a "t". A disjoint query that retrieves is an annoyance. But a disjoint query that deletes or updates can be disastrous! For example, suppose you wanted to delete a word from the words table, and wanted to update a misspelled word. QUEL: RANGE OF w IS words /* Disjoint */ DELETE words WHERE w.word="every" /* QUEL statements */ REPLACE words (word="QUEL") WHERE w.word="QULE" /* Do not do this! */ The result of the delete is that all rows are deleted if there is a sin- gle occurrence of the word "every". We can think of INGRES as working with two tables, one called "words" and one called "w". As INGRES goes through the "words" table, it deletes the row if it finds the word "every" in the "w" table. A single instance of "every" will delete every row. Similarly, the replace will replace every word with "QUEL", if there is a single instance of the misspelled word "QULE". In SQL, if a single correlation name is created, that correlation name must be used in the query. Using the table name will produce an error message. However, if two correlation names are created for the same table, improper mixing of the two correlation names in a select state- ment can produce the same results as those described above. Summary ------- The "join" is the fundamental tool in relational -----------------------------------------
© William Yuan 2000
Email William