(U03)

www.btinternet.com/~adrian.larner/database/pcl08

PLATOCLAST
ON DATA

Lecture VIII
New Interpretations

 

 

At the end of Lecture VI – we were labouring mightily then in our investigations of identity – we discovered an interesting association between a column, PERSONNO in REGISTRATION, and a criterion of identity, “is the same person as”. You will recall that this criterion of identity was peculiarly stable: it persisted even across a change in the primary key of its relation. But there is one other thing to say about it: in stating this criterion of identity we gave, with considerable precision and clarity, the meaning of the column.

 

 

Column Definitions and Criteria of Identity

 

Think of any column in a relation, say COLOUR in a CAR relation. One of the things the data analyst, the data base designer, has to do is to explain the meaning of such a column. And, of course, the choice of an appropriate name is half the battle. You would be amazed (though the FOPC wouldn’t twitch a whisker) if I told you that the COLOUR column held the number of cylinders. But, putting aside such a crude error (not a formal error, I stress – the FOPC doesn’t understand English), you would be right to be annoyed if you found that “Shocking Pink and Royal Purple” was an acceptable value of COLOUR. Wouldn’t it be better to have called the column “COLOURS” or “COLOUR COMBINATION”? At least, you would be right unless the actual users – however unadvisedly – really did say that “Shocking Pink and Royal Purple” constituted just one colour.

Now this sort of discussion is, as you’ll appreciate, about the criterion of application of “colour”: what counts as a colour? Suppose that we could get this criterion agreed, and we said, for instance, that Scarlet was a colour and that Primary Red was a colour. Well we wouldn’t quite have finished, would we? I want to ask – and I’m sure you do now: but is Scarlet the same colour as Primary Red? That is, do the users regard them as the same colour?

So you can see that we haven’t really given the interpretation of a column until we have associated with it a criterion of identity. But we now know, because it’s one of the few rays of light that have gleamed in the dark valley that we’ve passed through, that, once we’ve specified this criterion of identity, we have really pinned down the meaning of the column in a very stable way.

Now I ask you to think back to the Classical interpretation and to consider the question: under that interpretation, what are the entities (sense 2) of the system – the things said by the records to exist? They are the things named by the values in fields. Thus, in a CAR record, there would be, say, a registration number (assume that’s the “name” of a car), a colour (the name of a colour), a make and model (the names of a make and of a model), and so on.

The record is a proposition that states:

Such-and-such car has this colour and is of this model etc.
So our theory has lots and lots of different sorts of entities, and very strange entities they are: the number four, the colour green, and – in other relations – vacancies, and “intentional objects” like the widget gauger we would like to employ (but who may not exist at all). But think now what we can say. Take the car record: it says that there is something, say x, and x has a registration number, say n. What entity does n name? It names x, under the criterion of identity “is the same car (instance) as” (I ignore the re-use of registration numbers). x – what the record asserts to exist – is the same car as anything that has the registration number n. Well it shouldn’t surprise us that n names a car.

But take the MAKE, say m. What entity does that name? It names x, just as n does, but under the criterion “is the same make as”. x is the same make as anything that has the make m.

So we don’t need an extra entity for m to name. But take the colour, c. Surely that names a colour, an extra entity. Not at all. There’s that charming story about the little lost girl who is asked “Do you know what sort of car your mummy has?” “A green one,” she replies. And quite right too. Any way of classifying things gives sorts, kinds, or types. It’s just prejudice that makes us think colour doesn’t count, whereas model does.[1]

For the doubtful, I define “is the same colour-car as”:

x is the same colour-car as y =df x is a car, and y is a car, and x is the same colour as y.
And you can work out what a colour-car is:
x is a colour-car =df x is the same colour-car as x. I.e. x is a car and has a colour.
So we can say that the colour value, c, doesn’t name some funny “thing” (or even “attribute”): it names a colour-car. And a colour-car, as its definition shows, is a car. We simply do not need to suppose that there are all sorts of entities: columns (or field-types) are simply different ways of classifying the entities we already have. And that is why these column definitions, specified as criteria of identity, are so stable, so resilient to change. When we add a column we merely anow ourselves another way of classifying things. It may be a broader classification than we have already, like adding “height in inches” to our person REGISTRATION record, or a narrower one, like adding the new primary key, registration number. But it doesn’t – it can’t – invalidate all the other classifications that we have.

 

 

Narrowness and Completion of Criteria of Identity

 

As I’ve mentioned broader and narrower classifications, let me say that a relative identity, xIy, is no broader than another, xJy, when FOR EACH x, FOR EACH y, IF BOTH xIy AND xJx THEN xJy. The systemic identity of a theory is its narrowest identity (its finest classification). Thus, if x=y is the systemic identity of a theory, and xIy is an identity of that theory, then FOR EACH x, FOR EACH y, IF BOTH x=y AND xIx THEN xIy. It is, incidentally, well known and easily provable that any theory has only one systemic identity, which is what justifies us in talking of the systemic identity.[2]

We can simplify these “narrowness” definitions by introducing what we might call the completion of an identity. Suppose we have an identity, xIy. As we know, we do not in general demand that FOR EACH x, xIx (total reflexivity). However we can define another, totally reflexive identity on xIy, thus:

xIy =df xIy OR (NOT xIx AND NOT yIy)
Thus the completion of “x is the same person as y” is “x is the same person as y or neither x nor y is a person”. Thus while “x is the same person as x” (i.e. x is a person) holds true only of persons, the equivalent form of the completion amounts to “x is the same person as x or x is not a person”, which holds true of everything. Well it would, wouldn’t it? It means: either x is a person or not. I know it’s trivial: that’s logic for you.

But now we can say that one identity is no broader than another if the one implies the completion of the other: xIy is no broader than xJy, when FOR EACH x, FOR EACH y, IF xIy THEN xJy. And the systemic identity, “=”, is such that FOR EACH x, FOR EACH y, IF x=y THEN xIy.

 

 

Proper and Common Names

 

In our last lecture I gave the definition of “proper” name as: a name given under the systemic identity. This made the notion of proper name system-relative. I should add that there are two other possible definitions. First, “Geoffrey” is the proper name of a person; that is, a name may be said to be proper with respect to a certain criterion of identity (in this case, “is the same person as”). Any name is therefore proper with respect to the criterion of identity under which it was given. Incidentally, the use of “Geoffrey” as the proper name of different persons is just a non-systematic ambiguity: to treat it as a common name, and talk of “Geoffreys”, is a howler (like saying, “There are four banks in our town, two on either side the river, and two financial houses on the main street.”) Second, those who believe in absolute identity would regard a proper name – an absolutely proper name – as a name given under the absolute identity. Naturally, as there is no absolute identity, no name is in that sense proper.

In discussing a natural language we use the term “proper” name, as far as I can see, for a name given under a criterion of identity, when we do not normally use (in the language) any narrower criterion of identity. Logic is, however, blind to what we “normally” do, and the distinction between proper and common names in natural language is blurred indeed. Remember Mr Man in Brer Rabbit, Tommy Atkins, or every Tom, Dick, and Harry. We once Christened the redbreast, the daw, and the pie with “proper” names: “Robin”, “Jack”, and “Mag”. Now, in their turn, “robin”, “jackdaw”, and “magpie” have become common names. And think of the quislings, the little hitlers, and the mute inglorious miltons. Or consider: were “Sun “ and “Moon” once proper names? I guess so. And “occupational” surnames – Smith, Fletcher, or Engineer – started as common names, became proper perhaps – applied to the sole smith or fletcher or engineer in the village – and then became “family” names (a special sort of common name).[3]

 

 

Existential Interpretation

 

Now we can approach our new interpretation, and we shall see that it has features of the Classical interpretation, in that each record is a proposition; but it also has features of the Entity interpretation. Each record will be a proposition to the effect that something exists (or, as we shall see, that some things exist). So, if you wish, you may say that the record “represents” or “corresponds to” the thing (or a relationship between the things) so proposed to exist. In some cases, the thing proposed may be understood as the record itself; in others not. So self-interpretation is neither required nor excluded.

I’ll give you the interpretation in its most general form. A record of type P with field-types (columns) C1, C2, ...Cn, and values respectively V1, V2, ... Vn, will be interpreted as a proposition of the form:

EI
FOR SOME x1, FOR SOME x2, ... FOR SOME xp P(x1, x2, ... xp) AND y1 =C1 V1 AND y2 =C2 V2 ... AND yn =Cn Vn
Notice that “EI” is just a label for this “Existential Interpretation”, because I’ll be referring back to it fairly often; P(x1, x2, ... xp) is a predicate associated with the record type; the expressions of the form “=Cv” represent the column criteria of identity; and each variable of the form “yv” is the same variable as x1 or as x2 ... or as xp.

Suppose we had a person-and-car record with fields PERSONNO, SURNAME, and CARREGNO.

It might be interpreted as:

FOR SOME x, FOR SOME y, x owns y AND x is the same person as (PERSONNO) 12345 AND x is the same surperson as (SURNAME) Brown AND y is the same car as (CARREGNO) 123ABC
“PERSONNO”, “SURNAME”, and “CARREGNO” are parenthesised just to indicate that the value following them is the value in that field. You will, perhaps, wonder what a surperson is: x is the same surperson as y when x is a person, and y is a person, and x has the same surname as y. And you can now define “is a surperson” for yourself.[4]

In the above context, “x is the same surperson as Brown” may therefore be replaced by “x has the surname ‘Brown’” (because the previous clause tells us that x is a person).

 

 

Special Cases

 

There are some special cases of the above general interpretation:

SI
FOR SOME x P(x) AND x =C1 V1 AND x =C2 V2 ... AND x =Cn Vn
 
This is, we may say, the Singulary Interpretation: we use just one variable.
TI
FOR SOME x1, FOR SOME x2, ... FOR SOME xp y1 =C1 V1 AND y2 =C2 V2 ... AND yn =Cn Vn
 
In this case, the “Truth Interpretation”, the proposition “P(x1, x2, ... xr) amounts merely to “true” (i.e. all information is conveyed by the expressions “yv =Cv Vv”).
II
FOR SOME x, x =C1 V1 AND x =C2 V2 ... AND x =Cn Vn
 
This, the “Independent Interpretation”, is simply the Singulary Truth Interpretation.[5]
Just for completeness and comparison here’s the Classical interpretation:
CI
P(VI, V2, ...Vn)
Applied to the example:
(PERSONNO) 12345 has the surname (SURNAME) “Brown” and owns the car (CARREGNO) 123ABC
And, for those that hate going to extremes, here’s a hybrid between the Classical and the Existential. It’s the Classical modified to take account of column criteria of identity:
HI
FOR SOME x1, FOR SOME x2, ... FOR SOME xn P(x1, x2, ... xn) AND x1 =C1 V1 AND x2 =C2 V2 ... AND xn =Cn Vn
And, applied to the example:
FOR SOME x, FOR SOME z, FOR SOME y, x is called z AND x owns y AND x is the same person as (PERSONNO) 12345 AND z is the same surname as (SURNAME) “Brown” AND y is the same car as (CARREGNO) 123ABC
Of course, HI is a special case of EI, when p=n and each “yv” is the same variable as “xv”.

And that’s that! We have our new interpretation: EI with optional variants SI, TI, II, and HI. Silence.

 

 

Restriction

 

Of course that’s not that! That’s only structure reinterpreted. What about data manipulation? We interpreted restrict, project, and join on CI. But now we have to reinterpret them for EI. Very well, recall EI:

FOR SOME x1, FOR SOME x2, ... FOR SOME xp P(x1, x2, ... xp) AND y1 =C1 V1 AND y2 =C2 V2 ... AND yn =Cn Vn
A restriction, say Q(KI, K2, ...Kq), where each Kw is one of the Cv (one of the columns of the record), when applied to the above, results in:
FOR SOME x1, FOR SOME x2, ... FOR SOME xp P(x1, x2, ... xp) AND Q(z1, z2, ... zq) AND y1 =C1 V1 AND y2 =C2 V2 ... AND yn =Cn Vn
In this expression, each zj is the yv such that Kj is the column Cv. (Recall that each yv is one of the xi.)

Notice that the form of the interpretation remains the same:

A number of existential quantifications (“FOR SOME xi”)
 
A predicate (here “P(x1, x2, ... xp) AND Q(z1, z2, ... zq)”)
 
A conjunction (ANDing) of column equivalences (“yv =Cv Vv”)
Notice also that SI and HI when restricted result in interpretations of unchanged form: still SI or HI respectively. But a restriction of TI or II – because the restriction predicate, Q, is introduced – becomes of the form EI or SI respectively

 

 

Joins

 

Consider now the Cartesian Product of EI, as shown, with a record interpreted as:

FOR SOME z1, FOR SOME z2, ... FOR SOME zq, Q(z1, z2, ... zq) AND w1 =K1 U1 AND w2 =K2 U2 ... AND wm =Km Um
Of course, this is just a relettering of EI: Q(z1, z2, ... zq) is the predicate associated with the record type; the expressions of the form “=Ku” represent the column criteria of identity; and each variable of the form “wu” is the same variable as “z1” or as “z2” or ... or as “zq”.

The result comes out as:

FOR SOME x1, FOR SOME x2, ... FOR SOME xp, FOR SOME z1, FOR SOME z2, ... FOR SOME zq, P(x1, x2, ... xp) AND Q(z1, z2, ... zq) AND y1 =C1 V1 AND y2 =C2 V2 ... AND yn =Cn Vn AND w1 =K1 U1 AND w2 =K2 U2 ... AND wm =Km Um
It is, of course, just the two interpretations conjoined, but then reordered to give the standard EI form: quantifications, predicate, and column equivalences. Notice that the product of two HI interpretations gives an HI, and that of two TI interpretations gives a TI. But two II interpretations give a TI and two SI interpretations give an EI.

To get a Natural Join, we can start from the Cartesian Product, as above, and then take each join column, say Cv is the same column as Ku. We then have two equivalences: “yv =Cv Vv” and “wu =Ku Uu”. We remove one of them (say the latter) and add “AND yv =Cv wu” to the predicate, making it: “P(x1, x2, ... xp) AND Q(z1, z2, ... zq) AND yv =Cv wu”.

 

 

Projection

 

Projection is pretty straightforward: we merely drop the equivalences of the columns that are projected away. It can be seen that the natural join amounted to a Cartesian Product, a Restriction (adding the expression, “yv =Cv wu” to the predicate), and a Projection (removing the equivalences, “wu =Ku Uu”). It is also obvious, I think, that our primitive operations, Cartesian Product, Restriction, and Projection, are all logical implications; the first two are of the form: from propositions “p” and “q” to derive “p AND q”; no change here from CI. Projection, however, was an existential quantification in CI, i.e. from “Fa” to derive “FOR SOME x, Fx”; this, you will recall, was what required “a” to be a name. Under EI Projection is now an implication of the form: from “p AND q” to derive “p”: still a good implication, of course, though a radical change in interpretation.

But do remember that this inverse of conjunction happens within one or more existential quantifications. Let’s assume just one. We go from:

FOR SOME x, F(x) AND G(x) to
 
FOR SOME x, F(x)
And that’s fine: “Someone both drinks and smokes” does imply “Someone drinks”.

But notice what happens when we try to reverse this (on a join). We have:

FOR SOME x, F(x)
FOR SOME y, G(y)
To avoid confusion, I’ve used different variables. The join gives us:
FOR SOME x, FOR SOME y, F(x) AND G(y)
But not:
FOR SOME x, F(x) AND G(x)
And that’s correct: from “Someone drinks” and “Someone smokes” we can’t conclude that someone both drinks and smokes (though they tell me that there are people with both those vices).[6]

Well, I’m sorry about the extra complexity in interpretation: but we were driven to it. It does, however, have some interesting features. One of them is that we can – rather surprisingly – now distinguish (if we choose) between records representing Entities, namely those records whose interpretation is of the form SI – or II of course – the records with a single assertion of existence; and other records that represent Relationships. I’m not inclined to make much of this myself, indeed I’m slightly annoyed that I might have given comfort to the EAR fans by providing a decent underpinning for their theory. Moral: we should follow the argument where it leads, for though none of us is unbiassed, reason is.[7]

Let’s think about what our primitive operators do:

None of them removes an existential quantification, and none of them removes any part of the predicate.
 
Restriction adds to the predicate.
 
Cartesian Product includes everything pertaining to both of its operands.
 
Projection removes equivalences.

So very clearly we can see that Restrictions add information, Cartesian Product combines (but neither adds nor removes) information, and Projection loses information. And this is just as it should be.

 

 

Cryptic Data

 

As we have a few moments left, let’s look again at our new standard interpretation, (EI):

FOR SOME x1, FOR SOME x2, ... FOR SOME xp P(x1, x2, ... xp) AND y1 =C1 V1 AND y2 =C2 V2 ... AND yn =Cn Vn
I was tempted – still am – to replace the expressions “yv =Cv Vv” with the more explicit:
yv =Cv yv viz Vv
To say, for example, not “x is the same surperson as Brown”, but “x is the same surperson as x, namely Brown”. Remember that we were obliged to distinguish two senses of “identification”: being the same such-and-such, and picking out. And obviously these values somehow serve both these purposes: the name “Brown” serves to pick out a surperson; and x having that name and y having that name serves to identify x and y as the same surperson.

Well sometimes we might just want to do one of these without the other. Today we’ll consider just identification proper: marking x and y as the same such-and-such, without – so to speak – picking out just what such-and-such they are. It’s a bit like this; suppose you were looking at data about persons’ coats of arms, and you found that two persons had the same colour ground on their shields, say “Vert”. Well, you might not know what “Vert” meant (what colour it was) but you would know about a similarity – an identity – between the shields.

As it happens, it is often helpful to have this sort of data in a data base: values that the system can compare for equality, but which are not shown to users. Dr Codd calls such values (in a limited context) “surrogates”:[8] I will say that they are of a cryptic data type. I won’t go into why we want cryptic data at the moment; we may come across it from time to time in the future. But just remember the idea: it drives a neat wedge between the two senses of “identification”.

 

 

Proper Values

 

And while we’re on the subject of column equivalences, or, as we can properly say, “column criteria of identity”, what does it mean, given a column, C, with criterion of identity, “is the same C as”, to say of some value, V, that:

V is the same C as itself?
It means, we might be tempted to say: V is in the domain of C. I won’t be that tempted. I’ll say: V is a proper value of C; or V is proper to C.

You will remember that trick we used to go from an ordinary relative identity, merely reflexive, to a totally reflexive identity: “completion” we called it. Let’s define the completion of our arbitrary column criterion. Taking “=C” to mean “is the same C as”, we can define its completion:

x ºC y =df x =C y OR (NOT x =C x AND NOT y =C y)
Using this completed identity instead of “=C” we could now admit one value in column C that is not proper to C. Actually, we could admit any number of values, but each value not proper to C is the same as each other such value under the criterion of identity, “ºC”. If you want some way to say “ºC” in English, how about “is the same C-wise as”.

Then we could say that V is not proper to C when V is the same C-wise as something but not the same C as anything. If we did allow a value in a column that was not proper to that column, we would have to modify EI slightly; it would become:

FOR SOME x1, FOR SOME x2, ... FOR SOME xp P(x1, x2, ... xp) AND y1 ºC1 V1 AND y2 ºC2 V2 ... AND yn ºCn Vn
You may rightly wonder why we should want to make such an odd change, but I’m coming to that.

 

 

Are Records Assertions?

 

 

(Professor Platoclast was pressed on the question whether records ought to interpreted as assertions.)

You are right. We speak loosely when we say that inserting a record amounts to asserting a proposition. What, for instance, do we wish to say about a record representing a merely planned warehouse? Certainly not that there exists something that is that warehouse.

“Proposition”, you know, doesn’t mean, “something that is asserted”; it means, as you might guess, “something that is proposed”. It’s proposed for our consideration. And that means: so that we can work out what follows from its truth, which is precisely what we want in a planning data base. Of course, a proposition can be asserted: when we propose it we can also indicate, explicitly or implicitly, that we do assert it.

But I’m quite pleased you raised the question. If you think about this new method of interpretation, it consists of a conjunction within one or more existential quantifications. But the manipulations operate only on the conjunction: they don’t affect the prefixed quantifications. However, it’s the quantifications that bear any assertion we intend: they say that there exist this or that. So, if we wanted “planning records” we could give them an interpretation beginning not simply “FOR SOME x” or “There is something such that”, but “We plan that there will be something such that”.

 

SITE HOME PAGE

And likewise for things like “fictional persons” or “fabulous beasts”: we don’t have to assert that there is someone who is Mr Pickwick, or something that is a cockatrice; we could have a prefix saying that “It is said that there is someone” or “It is said that there is something”. Our manipulations simply leave such prefixes in place.

THE DATABASE PAGE

THE DATABASE PAPERS

 

Preface & Contents

 

DOWNLOAD

Download Lecture VIII (rtf, Word for Windows compatible)

Platoclast on Data: Lecture IX

 

Copyright © 1993, 2001 Adrian Larner. The author asserts all moral rights.