|
(U03) |
www.btinternet.com/~adrian.larner/database/pcl10 |
|
PLATOCLAST Lecture X |
||
|
|
||
|
|
I’m going to begin today by saying a few words about quantification. You may have thought it odd that although I insist on relative identities, always saying “the same such-and-such” rather than merely “the same”, I still quantify with “There is something” (FOR SOME x) or “Everything is such that” (FOR EACH x). Don’t we really want to say “There is some person” or “Every employee is such that”? |
|
|
|
||
|
|
Typed and Untyped Quantification |
|
|
|
If you do want to say that sort of thing, then it’s OK. We can translate it back into indefinite quantification.
Now normally we would say that either Fa or ¬Fa, but not both. However, this depends on “a” being a proper name (under our systemic criterion of identity), and likewise we would say that FOR EACH x, NOT (Fx AND ¬Fx), but that depends on “x” ranging over all distinguishable entities of our system, and therefore being replaceable (in any particular case) by a proper name. My goodness! Don’t look so worried. You’re not worried that “Something is an inscription in the book” and “Something is not an inscription in the book” both come out true. Why shouldn’t they? No reason at all. Notice that, whatever “a” is, we always have:
If it weren’t for all that delicate placement of English definite and indefinite articles, if you spoke Latin say (much hopes ), you would happily affirm “Dog doesn’t wag tail” and deny “It is false that dog wags tail”. Of course dog wags tail and dog doesn’t wag tail. Thank you Douglas For those who didn’t hear, Douglas offers the explanation: different dogs. Quite so.[1] |
|
|
|
||
|
|
Polynormal Records |
|
|
|
Now, that little introduction was another Platoclast “two birds with one stone” attempt: to explain why I’m not mug enough to abandon indefinite quantification, but also to show that when we get what looks like a name and here I’m thinking particularly of a column name and it names several things, or perhaps nothing, then we have to be a bit careful and introduce some clearly stated quantification to avoid confusion between propositional and predicate negation. And here’s a case in point. You’ll remember that in order to get minimal structure, i.e. just one way to hold fields together, we treated a record as comprising its fields (and nothing else no pointers, indexes, etc.) Such a record was, however, somewhat more complex than a normal record: it could have multiple values of the same field-type. We used the example:
We can look as we have looked at inordinate length at the interpretation of records. The Entity interpretation gives us no assistance: but that interpretation, in any event, gives us no clue how to interpret operations like projection or join. Presumably any join of, say, two person records, represents an entity: but what sort of entity? And how can we tell which one of that sort? The Classical interpretation looked more promising, indeed it seems entirely satisfying under self-interpretation. And it is the Classical interpretation that makes the proposal of normality so persuasive. A normal record type represents a predicate, and a normal record of that type represents a proposition formed by inserting a name in each place of that predicate. This close fit between the FOPC and normal records is the most powerful argument for accepting Dr Codd’s proposal. Yet we have, except perhaps for self-interpretation, rejected the Classical interpretation. The names (values) found in records are no longer the proper names demanded by the FOPC: we still interpret in terms of the FOPC, but not in the intimate, one-for-one way that provided an obvious argument for normality. So the argument for polynormality looks a little stronger; that for normality a little weaker. And, of course, when we consider the TT-DD, which is I think, anyway a pretty natural formalisation, our records are just polynormal records. And there is no more reason to impose normality than to impose value-normality. |
|
|
|
||
|
|
The Postulate of Two-Dimensionality |
|
|
|
Incidentally, you may well be wondering why I call it a Theory of Two-Dimensional Data, when its records don’t exactly fit neatly in a table. It’s because of another postulate I’m going to make, and what follows from it.
The format of a record is, in effect, the projection (in the mathematical sense) of such a matrix onto its vertical axis; the value is the projection onto its horizontal axis. A normal record has just one tick in each row, a value-normal record has one tick in each column. We’re certainly not tempted to enforce value-normality, so it seems a shame (don’t you think?) to destroy such an elegant symmetry. I’m sure Dr Codd wouldn’t approve: he invented normal records to avoid “repeating groups”, and now we’re thinking of allowing well, not repeating groups repeating fields. But this is important; it’s the very soul and essence of formatted data: a field comprises a type and value, a question and answer. Fl is the same field as F2 when Fl is the same answer to the same question as F2. And all we do is add the definition that a record comprises its fields, and we get polynormality. It’s that simple. |
|
|
|
||
|
|
Restriction Conditions and the Quantifying Operators |
|
|
|
Let’s think for a moment about what complexities might arise if we used polynormal records. I’m not at all certain, by the way, that we should use polynormal records, but I have another (ulterior) motive for this exercise as it happens. I think you can see that we’ll have no problems with projection or Cartesian product. But restriction conditions are going to get a little complex. In a restriction on a normal record we can just use a column name, say C, to pick out a value, and then compare it I’ll use “@” as a schematic operator, it might be “=” or “<” or whatever. We could compare the column to a literal, or to another column, say D. So the condition would look like:
Consider this case first, that for each value in D, there is a value in C that is “@” to it; we write this (I pronounce it “C left at D”, though my “left” is a right brancket!):
means we have one C value (one row), and two D values (two columns), and the C value is “@” to just one of the D values. So that makes:
And here’s my diagram:
In each quarter, the upper (top left) matrix is for when “C[@]D” is false, and the lower (bottom right) matrix is for when “C[@]D” is true. Notice the empty matrices: the vertical when we have no D values (so “C]@D” is trivially true), the horizontal when we have no C values (so “C@[D” is trivially true), and the origin alone, when we have neither C nor D values. Remember the combined definitions:
|
|
|
|
||
|
|
Restrictions on Present and Absent Fields |
|
|
|
So let’s suppose we didn’t allow polynormal records, just normal records, but we allowed that there might be no value in some column of a record. It would mean: the record didn’t contain that type of field, not that it had by some foundation shattering magic an “empty” field, a question without an answer: we wouldn’t countenance that. So we’re going to exploit the “]” and “[” operators to handle restriction conditions of the form “C@D”, where “C” or “D” might denote either one value or no value. This gives us a somewhat simpler diagram than we had before.
Naturally enough, all the minimal matrices with a side greater than 1 have been removed, giving us only five matrices. Consequently, the lower matrix in the bottom right corner gives us what are now equivalent expressions (we would naturally use the first):
Notice also that “C]@D” holds true when either C@D or D is not present, and “C@[D” when either C@D or C is not present. So let’s consider again the special case, when “@” is “=”. |
|
|
|
||
|
|
Completed Identities and Cryptic and Esoteric Data |
|
|
|
What on earth is old Platoclast up to? That’s what you ask yourselves isn’t it? Not satisfied with pursuing interpretation of normal records to the bitter end, I indulge in sundry digressions and animadversions. I step aside from the path to examine exotic flowers: remember cryptic values that identify (show sameness) without being usable to pick out records; remember taking those column criteria of identity and defining their completions, just in case a such-and-such column might contain a value that wasn’t the same such-and-such as itself; and now we’ve considered comparing values that might be multiple, or even absent. Believe me, it’s all in a good cause: several good causes actually. But I’ve just about done with collecting these exotics: a few more words and they’ll be ready to come into their own. Firstly, you will have appreciated that completed identities, where for column values C and D we say “CºD” rather than “C=D”, can be quite easily and obviously combined with our “]” and “[” operators. The ideas they express are orthogonal, even especially when we consider only zero or one value. We use the “]” and “[” to talk about possibly absent values; but “º” isn’t concerned with absent values, rather with present values that aren’t the same such-and-such as anything. Secondly, you may be wondering where those record identifiers, present only at the form level, fit in when we get to the concept level. Are they cryptic? or absent? or what? Well, they’re not cryptic (though they might be cryptic at the form level). At the concept level they don’t appear at all, and they can’t be referenced by their column name, because that’s been projected away. In implementation and I stress that this is an implementation, and invisible to the user we might keep them attached to their records. And then I would call them esoteric: they are manifested as record type names (like “EMPLOYEE”), but in no other way. Notice that keeping esoteric record identifiers would greatly assist update of the data base: when records (in a view) are updated, it rather helps to know which kept records need to be modified. Cryptic data is quite different: it occupies a column, and it can be compared (for equality at least, and possibly for ordering as well). It just can’t be clearly seen. |
|
|
|
||
|
|
SQL Introduced |
|
|
|
As we have a few minutes left, let’s look at an example of a relational query language: the most popular of them all, SQL. I want to look at this now because we shall be using it for examples before long, and because it has an oddity: it allows duplicate records in relations, and this question of duplicate records pertains directly to that of esoteric columns (columns projected away, but retained in implementation, for implementation reasons). I’m going to look only at a tiny subset of SQL, a part of its query statement, the SQL SELECT. It looks like this:
If you think about it, you’ll find that this is not a good idea: we may wish to keep esoteric keys in implementation, but we certainly don’t want to do anything to modify our criterion of identity of records at any level. At least, the only thing we would allow is at the concept level the record type name (which is, admittedly, derived from the now esoteric key); but that means that the only duplicates we would allow are records with different type names (but meeting our criterion of identity). That is to say: records in different tables as perceived by a user at the concept level. (And even the Relational bigots allow those: records similar except that they occur in different relations. This is what demonstrates the essentiality of relations.) But that still doesn’t allow duplicate records in the same table. Obviously, if we want rows that are different but look the same to a user, what we need are cryptic values. A cryptic value occupies a column: but the user can’t tell not just by looking whether or not a cryptic field in one record has the same value as that in another, though they could be compared in a restriction condition (an SQL WHERE, if SQL allowed cryptic values, which it doesn’t). But I don’t know: perhaps we ought to say that SQL does provide some cryptic value support, and that’s why we seem to have duplicate rows. We would have to say that the “ALL” option in SQL, the default for a SELECT, meant: keep the cryptic key. And the “DISTINCT” option, which removes duplicates, would mean: project away the cryptic key value. So we ought to be very careful what we say, especially what we say about modifying SQL: should we (as the relational bigots say) get rid of these duplicate rows? or should we extend the cryptic value support? Do notice the elegance of SQL from the users’ viewpoint: they don’t have to work out some series of restrictions, projections, and joins, to express a query . They just get all the queried data together in a Cartesian product (FROM), then apply all the restrictions (WHERE), and then project (SELECT) the required columns. At least, I think that’s what the SQL keywords mean; but perhaps FROM indicates the projection of the preceding columns (the ones after SELECT), and SELECT doesn’t mean anything. Anyway, that makes no difference to what the whole thing means: the projection of the restriction of the Cartesian product, which is very neat. Any query comprising restrictions, projections, and joins can be expressed in that way. But, of course, it’s not necessarily performed in that sequence: it’s performed in a much more efficient sequence because these relational operations can be shuffled around each other quite harmlessly. For instance, a restriction on one of two joined tables can be applied before or after the join; any columns not mentioned anywhere in the query can be projected away at any point. |
|
|
|
||
|
More important than all that: the user has absolutely no control over the sequence of operations. The user just can’t say “Do this restriction before that projection”, or “Do this join before that join”. And remember: what the user can’t say, the user can’t get wrong. |
|
|
|
|
|
|
|
||
|
Copyright © 1993, 2001 Adrian Larner. The author asserts all moral rights. |
||