|
(U03) |
www.btinternet.com/~adrian.larner/database/pcl15 |
|
PLATOCLAST Lecture XV |
||
|
|
||
|
|
I hope you all enjoyed the last lecture, and profited by it. What a long way the relational theorists have gone, haven’t they? Backwards. Dr Codd had this pure vision, you see. DBMSs based on a simple, sound theory. But now they clamour for Outer Joins: they haven’t defined the problems that Outer Joins are supposed to solve; they don’t have a solid theoretical understanding of Outer Joins. But they don’t care: why shouldn’t we make whatever informal, ad hoc adjustments we want? We’re all practical people aren’t we? I wonder sometimes whether they have learned anything at all from Dr Codd. Mind you, we saw just the same approach with Nulls: there even Dr Codd hasn’t learned anything from Dr Codd. |
|
|
|
||
|
|
Combination as a Primitive Operator |
|
|
|
I left a question or two hanging at the end of Lecture XIII: how were we to understand and handle Unions? But, looking back at our last lecture, when we dealt with Outer Joins, I think those questions got more or less answered. Now that we understand Combination the great conjunction operation let’s take it as primitive (Natural Join lovers might take Conjoin as primitive). Given the Combination, r, of p and q remember it will have columns p.X, p.Y, q.Y, and q.Z we can easily get the Cartesian product of p and q. It is the strict (p.X, p.Y, q.Y, q.Z) Projection of r, where a strict projection is one in which all columns in the list are mandatory. I’ll use “;” for Combination, so we get:
The Inner Join of p and q is obtained by applying the restriction “p.Y = q.Y” (or “p.Y º q.Y” if we want the join on the null value as well) and then projecting away one of the Y columns. The Union of p and q can also be defined, in two ways. Firstly, we could project r using (p.X, p.Y) | (q. Y, q.Z), i.e. two strict projections. Secondly we could apply the restriction:
|
|
|
|
||
|
|
Non-Uniform Tables and Language Issues |
|
|
|
But how far have we got? Most importantly, perhaps, we’ve seen that even the relational bigots want tables containing rows of different formats. But their solution is wrong in three ways:
As you may have gathered, there are two schools of thought about which join Natural Join or Cartesian Product should be taken as primitive. On the one side we have, we might say, the “Wardenians”, who favour Natural Join: Mr Date is now, I believe, of this persuasion; and so is Dr Codd in spades.[1] On the other side we have the SQL reactionaries, including myself, who favour Cartesian Product. What I’ld really like to see is a user interface that accepted Natural Joins and translated into Cartesian Product and most likely Restriction, which the user would then accept or modify.[2] However, this strikes me as odd: there are now proposals to implement Outer Joins in SQL. But this involves a radical change to that language to make it in effect Natural Join based. Yet, when you think it through as we have done, it seems that handling the sort of non-rectangular tables that are to be supported (padded to look like relations) is best achieved on a base of Combination. And that is much nearer to the current SQL Cartesian product approach than are Conjoins or Outer Joins. But then, they haven’t thought it through, have they? And we haven’t finished thinking it through, for that matter. We still have to get a way to do relational unions, and we have a little problem with projection. You will recall that by using Conjoins or Combination we enabled restrictions to be applied either before or after such a join with no effect on the result (which wasn’t the case with Outer Joins). But we would like something similar to apply to Projection. Remember that our ideal was a query language in which we:
|
|
|
|
||
|
|
Relational Union |
|
|
|
We need therefore to investigate how to move any projections to, so to speak, the end of the query (just as we can move all restrictions to the second stage); but let’s tackle relational union first. We’ll go back to an old friend:
Of course, I’ve cheated a little on you. Being strict about Combination would have given us two PERSON columns: FATHERHOOD.PERSON and MOTHERHOOD.PERSON. But we would then have used the same Extension and Projection: create PERSON as the OR-extension of FATHERHOOD.PERSON and MOTHERHOOD.PERSON, and project those two columns away. Now notice two things:
Perhaps rather wickedly I might point out another option: start with FTHMTHHOOD and simply rename the FATHER and MOTHER columns to PARENT (yes, both of them):
But the important thing is: we can still get relational union when we want it. Now let’s turn to projection. |
|
|
|
||
|
|
Quantification |
|
|
|
Youll remember that at the end of our last lecture we had projected just the NAME column of ASSIGNMENT to give ONE_ASSGNT, effected a Left Conjoin of STUDENT and ONE_ASSGNT, and fiddled about to get:
What’s new about the first problem? Its solution depends, unlike an extension or a restriction, on a number of rows. But there is more than that: any projection in a sense depends on a number of rows. Remember our CI interpretation of Projection:
|
|
|
|
||
|
|
Accumulations and Collection |
|
|
|
And as it happens we can surely see coming the need to do such things as count how many assignments have been handed in on time. I hope you can see that coming. And things could get more compiex: we might start to record a mark for each assignment and then find the average mark over the term for each student. (Only kidding!) Surely all this is pushing us towards some way of aggregating ... Aggregating what? Records (rows) or merely values in a column? “Merely values in a column”, you reply, shuddering as I do at the thought of such extra structures as aggregates of rows, and bearing in mind that we might still have recourse to polynormality. Very well, let us give ourselves a new operation, collection (dangerous this any other alternative welcomed). Given a relation, q, with columns including KI, Kl, ... Kn, C, the expression:
Of course, we might, instead of merely projecting DILIGENT, first extend using a function like “PRESENT”, so PRESENT (DILIGENT) for Judy would come out as Yes, and for Douglas as No: and we could project that answer. If we had used WEEK instead of ON_TIME, getting the many-valued column WEEKS say, then we would need to apply a function like “PRESENT” to get a column for projection: Yes if some value was to be found in WEEKS, and No if WEEKS had an absent value. We can also imagine other functions, such as HOWMANY (WEEKS), returning 0 for Douglas and 4 for Judy. I ought to warn you: seeing whether anything is there, mere existential quantification (such as PRESENT), is expressible within the FOPC. Counting how many is not, so we’re stretching things a bit, but our users do want to count, add up, average, and so on. Later we’ll talk about how we can consign all that to the unsavoury limbo of “data type processing”. Bear with me for now. Notice how polynormality is now starting to emerge from the shadows: we are seeing polynormal records transiently. The sort of expression we are using is:
Thus what was the question of polynormality should we allow polynormal records at all? has changed. Now we want to ask: how conspicuous should polynormality be? Remember that we started with two problems: one we can now solve by the “#” operation collection within a scope. The other was that of specifying the level of detail of the result, which turned out very easy in our example: we simply used projection. But then we got to the end of the query: we didn’t have any more collection to do. So we’ll need to look at more complex queries to let this problem emerge properly.[3] But not today. |
|
|
|
||
|
|
Interpretation |
|
|
|
I want to go back to that very simple table, I’ll call it “S”:
It doesn’t follow from the information in my files that Douglas is a slacker. I’ll prove that to you. Suppose we say that the information in S constitutes a theory, which I’ll also call S. And suppose, for simplicity, that S is not the result of a query but the only file I keep. When a student registers for the course, I insert an appropriate short (NAME) row for that student; and when a student hands in their first on-time assignment I add a long (NAME and DONE) row. Now let me add a long row for Douglas (just for the sake of argument). I get a new theory, say T. And S is a subthoery of T; so T asserts (has as a theorem) everything that S asserts. But T asserts that Douglas is no slacker but is diligent. So S can’t assert that Douglas is a slacker: if it did then T would assert that Douglas was a slacker and was not a slacker; it would be inconsistent. Clearly T is not inconsistent (merely false). So S doesn’t assert that Douglas is a slacker; it doesn’t deny that Douglas is diligent. |
|
|
|
||
|
In concluding that Douglas was a slacker, my friends, you went beyond the evidence. Of course, your conclusion was correct; but it wasn’t warranted by the data you derived it from. |
|
|
|
|
|
|
|
||
|
Copyright © 1993, 2001 Adrian Larner. The author asserts all moral rights. |
||