|
(U01) |
www.btinternet.com/~adrian.larner/database/newint2 |
|
A New Interpretation of Data (continued) A database paper by Adrian Larner |
||
|
|
||
IDENTITIES |
|
|
|
|
We need to distinguish
this sort of relative identity from other identities:
We would therefore have completely to rethink our concepts, perhaps construing individual dogs as sets of timeslices of dogs (earlier timeslices being puppies, and later being non-puppies). Again, we would have to decide whether a breed or a species was to remain a set of individual dogs (a set of sets of timeslices of dogs), or was to become a set of timeslices of dogs (and have individual dogs as subsets rather than as members). Notice that it is not time alone that forces such distinctions on us. Any distinction finer than any previously formalised will have the same effect: it might be location, role, or many other features that we might wish to record. We will see some other examples, below. |
|
|
|
||
Counting and Naming |
|
|
|
|
We now show, following [Geach],
how relative identities can be used in counting and naming.
Suppose that we have a collection of inscriptions of symbols a certain page of printing, for example.
The term “alphabetic letter” is now defined on the full criterion of identity,
“is the same alphabetic letter as”, which we represent by “I”.
Intuitively, x is the same alphabetic letter as y
when x is an A and y is an A,
or x is a B and y is a B, or ...
x is a Z and y is a Z.
In a very obvious sense: there are exactly 26 alphabetic letters, and therefore no more than 26 on the chosen page.
(Remember that any two inscriptions of “A” are the same alphabetic letter,
and that “a” is the same alphabetic letter as “A”.)
The definition of “is an alphabetic letter” is:
We count the alphabetic letters on the page
by the following algorithm:
To illustrate the use of relative identities in naming, consider a small child given the name “Richard” (perhaps, but not necessarily, at a formal naming ceremony). What now entitles this man to that name? The physical material of the child, at the point of receiving the name, is now a widely scattered object; but why is it this man, rather than that scattering of atoms, that is called “Richard”? The reason is: because the name was given under a particular criterion of identity, is the same person as. The name “Richard” was given to the baby, and to each thing that (tenseless) is the same person as it. Notice that the use of the identity works even backwards in time. We may refer to the newborn Richard, even though the name was not given until some weeks later (permissibly years later, even posthumously). |
|
|
|
||
Systemic and Absolute Identities |
|
|
|
|
A systemic identity, apart from its special role in its theory, is merely a relative identity. In a theory of inscriptions whose sole primitive predicate was “is the same alphabetic letter as”, “I”, the systemic identity would be “I”. But this is a perfectly ordinary relative identity. Let us call the theory, “Q”. If we now added to Q a new primitive, say “is lower case”, we would get a new theory, say F, having Q as a sub-theory (so that every truth of Q became a truth of F). The vocabulary of F would contain “I”, in exactly the same sense, but “I” would not be the systemic predicate of F (because, in F, we could distinguish upper and lower case letters, say a Q and a q, that were I-identical). This shows, we might reckon, that the systemic identities of theories are not generally to be interpreted as absolute identity. Indeed, the systemic identity of Q, when used as a subtheory of F, simply cannot be interpreted as absolute identity; and it would be peculiarly perverse so to interpret it for Q when not used as a subtheory of F. That would make “Q on its own” and “Q as a subtheory of F” different theories, and show that, in general, it was not possible to make one theory a subtheory of another (because it would become a different theory, not that “one” at all). Obviously, if absolute identity is to be formalised in a theory, it will have to serve as the systemic identity of that theory. We may read an absolute identity as: is the same as, with neither explicit nor implicit understanding of a count noun after “same”. Not “the same such-and-such as” for any such-and-such. Not “the same person”; not “the same dog”. “The same thing”, perhaps: we have no criterion of identity for things (except in the special sense of maximal coherent objects the sort of “thing” one might throw individually across a room). Whether we should admit the concept of absolute identity
is a difficult question. But we should at least observe some of its problems:
|
|
|
|
||
THE CLASSICAL INTERPRETATION |
|
|
|
|
It is true that [Codd1970]
did not explicitly propose a modelling from, or interpretation of, the records that were themselves modelled by tuples.
However, there was such an interpretation implicit in Codd’s approach,
hidden alas by the popular interpretation of tuples as “entities”:
witness even Codd’s early use of expressions like “entity integrity”.
But notice the following remark from [Codd1990]:
On its own, this remark of Codd’s
would be scant evidence that such an interpretation had ever been intended.
However, we should also bear in mind that:
Here then is what we have asked for:
a language interpretation of records; moreover,
one that has a formal basis, the FOPC.
And this interpretation does cover the data manipulation operators. |
|
|
|
||
Data Manipulations Interpreted |
|
|
|
|
The FATHERHOOD relation is interpreted as:
Or, in a word: Isaac is paternal grandfather of Dinah.
If we wanted an intelligent front end to interpret paternal grandfather
we would have to give it the definition:
|
|
|
|
||
Foundations of the Classical Interpretation |
|
|
|
|
In the design of a database, either directly or indirectly, we attempt to define kept records (tuples in base relations) in one of the higher normal forms (second through fifth), which means in essence that we decompose records, using the projection operation, in such a way that we can recompose them using natural joins. These two operations are also of major importance in the formulation of other views and of user queries. We need therefore to understand their interpretations in terms of the FOPC, and to understand the constraints that arise from those interpretations. Let us use “a”, “b”, “c”, etc. for names, and (as usual) “x”, “y”, “z”, etc. for variables of the FOPC. A predicate a relation type, or record type is written as P(x), Q(x,y), R(x,y,z), etc. and accordingly a record (tuple) instance as P(a), Q(a,b), R(a,b,c), etc. The values in such a tuple are, it should be noted, interpreted as names. A projection is interpreted
as the existential quantification of the columns that are not projected.
Thus, the projection of the second and third columns of R(x,y,z) is:
There is one simple way to state
the constraint we need: “a” is an acceptable name,
as far as making projection safe, if all instances of the following schema hold true:
The FOPC does not admit intentional predicates. Of course not, “P(a) ® $x P(x)” is a theorem of the FOPC. Notice that “... worshipped Moloch” is a perfectly acceptable predicate. Strictly, it is places not predicates that are intentional. Intentionality is not a problem restricted to theological data bases. Do we have a vacancy record showing that we wish to employ an expert on French, Algerian, and Sudanese law? It does not follow that there is an expert on French, Algerian, and Sudanese law that we wish to employ (or, indeed, at all). Have we signed a contract to deliver a ruggedised processor for use in the Antarctic? It does not follow that there is such a ruggedised processor that we have signed a contract to deliver. On the face of it, it looks as if quite a lot of the values in our databases might be pseudo-names in the intentional places of predicates, and that an incautious projection might lead our users from truth to falsity. Human users, with application domain knowledge, may avoid being misled (despite our worst efforts); intelligent front ends face a somewhat bigger challenge. In projection, therefore, values give us problems if they are supposedly names, yet fail to designate anything at all. We now turn to natural joins, where, we shall find, values give us problems if they designate more than one thing. If, as illustrated above, we join by Cartesian product and restriction as we do in some data manipulation languages, including SQL, but not for database design (higher normalisation) purposes we might hope that the equalities used would be encapsulated: their meanings given by the data types (the domains) of the joined columns. But implementation of encapsulated types is, alas, rare in current relational systems. It is obvious that we would wish the equijoin of FATHERHOOD to have its equality, “=”, interpreted as “is the same person as” (and applying over a domain of persons). But apparently (it is, it seems, nowhere clearly stated) the equality used in equijoins and other restriction conditions, and implicit in natural join, is intended to be, at least, the systemic identity of the system, and probably absolute identity. But what is “the system” of which it is the systemic identity? It is the theory that comprises (at least) the FOPC itself and the propositions represented by each record kept in the database. As we have seen, if it is the systemic identity that is intended, there may be radical re-interpretation in store for us as we incrementally design the database. If we keep data about species and breeds, our systemic identity may well hold between any two animals of the same breed. Introducing records about individual animals will require a new systemic identity, and may therefore require reformulation of every query and view definition that uses “=” to mean the old systemic identity (“is the same breed as”). If “=” is intended invariably to mean absolute identity, our database of breeds is in error (perhaps not perceptible) even before we add information on individual animals. Two animals of the same breed are not absolutely identical. And then adding the extra information gives us the same problem, in practical terms, as it did with merely systemic “=”. But it leaves us with more niggling doubts, for even two things that are one and the same individual animal are not absolutely identical. Again, we need to consider whether this is a practical problem.
Suppose that we have records interpreted as:
It should be stressed that this join trap, and many others, are not fallacies; they do not involve human mistakes in informal logic. The argument the interpretation of the composition is flawless. The mistake is the use as a value of something that is not a proper name (does not name one and only one thing) under the systemic identity. The mistake is in the interpretation of the kept records; not in the interpretation of the query. What follows from this? Perhaps we can ensure that each value in our database is such a proper name, and on any modification to the database ensure that each value is still a proper name, and make appropriate adjustments to all affected views and queries. But, if we cannot do this (unless we are happy to allow our users to derive falsehoods from truths), the classical interpretation is ruled entirely out of court. We cannot, in practice, perform this massive policing job on our databases; neither initially nor intermittently thereafter. Think of all the values we keep. What will it take to ensure that “green”, “2.5 kilograms”, “Wednesday”, and so on are invariably interpreted and used as proper names? So we have to abandon the classical interpretation, and along with it the interpretations of the relational algebra, the domain calculus, Datalog, and so on. But we can at least say this for the classical interpretation: it was so well specified that it could bring about its own definitive destruction. Contrast the entity interpretation which is too malleable and ductile to stand any argumentative strain. Decisive refutations are to be welcomed. But, even more, robust reconstructions; to which we will shortly turn. |
|
|
|
||
Self-Interpretation |
|
|
|
|
We may be convinced that, because often their values are not proper names, our databases have serious problems of interpretation, including join traps. But the refutation above seems to show that we have overwhelmingly serious problems; that our databases do not work at all. But we know they do. We have surprisingly few problems of interpretation. Why? Suppose we wished to keep records
showing which persons worshipped which false gods.
We would (let us assume, ignorant of intentionality) have a two column relation,
WORSHIP, with columns, WORSHIPPER and DEITY.
We can give the records in this table a perfectly respectable interpretation:
We are, so to speak, processing forms; without necessarily giving a second thought to what those forms tell us about “the real world”, the world outside the record-keeping facility. And yet users of such databases do derive from them information about the world. Of course, human users know that an employee record tells us that there is such and such a person employed by the company, that a delivery note gives information about some delivered goods, and so on. But these interpretations are entirely informal. When a user is misled by a join trap in such a system,
strictly speaking, the analyst is innocent (merely technically innocent, we might judge).
All the analyst was committed to was:
And this is why our databases are not the disasters that we might expect. They are magnetic stores of forms: and our users know how to interpret, and even how to manipulate, their own forms. Our databases need to capture no external interpretation at all; their human users can still get by. For intelligent front ends, and even human users on a bad day, it is a different story. We should perhaps understand the tuple calculus (including SQL) in this way. Its variables range over (so the entities of its theory are) tuples in relations. And we can understand the domain calculus and the relational algebra as pertaining to values in records, and not to anything designated by those values. Notice that for the “inherently self-representing” records, like invoices, this is a perfectly acceptable interpretation: far from such records being problematical, they are the only records with an unproblematical interpretation. Some idea of human users’ subtlety, and how much mistaken (or merely absent) interpretation it covers, can be appreciated from the following:
|
|
|
|
||
|
|
Return to the start of A New Interpretation of Data. |
|
|
|
Continue reading A New Interpretation of Data |
|
|
|
||
|
|
|
|
|
|
||
|
|
||
|
|
Download A New Interpretation of Data in Restricted Text Format (rtf, Word for Windows compatible) |
|
|
|
||
|
Copyright © 1994, 2001 Adrian Larner. The author asserts all moral rights. |
||
|
The decorative image of a key (cc004239.gif) used on this page was obtained from IMSI's MasterClips/MasterPhotos© Collection, 1895 Francisco Blvd East, San Rafael, CA 94901-5506, USA. |
||