(U03)

www.btinternet.com/~adrian.larner/database/pcl24

PLATOCLAST
ON DATA

Lecture XXIV
Keys

 

 

Primary Keys

 

One of the major shocks and disappointments I received when struggling to understand relational theory was the notion of primary key. Firstly, we had what appeared to be a very complex system, based on set theory; and yet this extra primitive notion had been added. Moreover, the choice of primary key for a relation seemed purely arbitrary, in theory. Naturally, in practice, the choice of primary key was usually obvious (though, I suspect, usually obvious and wrong as well).

Secondly, the concept appeared to be used to solve too many problems. It was to be part of the mechanism for uniquely identifying – picking out – an instance of a value (by giving the relation, the primary key value, and the column name). It was also supposed to serve for some sort of major – I hesitate to say “privileged” – connection between records. It was also intended to represent the self-identity (sameness) of the so-called entities supposedly represented by records.

As we’ve already seen, we can pick out records (should we wish to do that) by record identifiers. We have seen that the rule covering these major connections – the Referential Integrity rule – is actually trivial. And we have said, firstly, that there is no such relationship as “being the same entity” – absolute identity; and secondly, that every column is associated with some criterion of relative identity. I conclude, therefore, that – thank goodness – the notion of “primary key” was just a horrible mistake.

Now I’m pretty certain that the relational bigots would pick on my record identifiers and claim them as primary keys. They certainly do serve to identify records (in the sense of sameness); they do serve to indicate a certain sort of identity (in the sense of sameness), namely identity of records. And they might serve, I guess, to make some connections between records. For instance, where the connection really is a matter of making an association with a form, they could be so used. But users might not want to associate, say, a person form and a house form; they might want to associate a person with a house, so should use a Person Id and a House Id. I just don’t see why they should always have to worry about how many forms it takes us to keep information on a person or a house, and which of those forms should be associated.

But, as we’ve seen, this whole area is confused in relational theory, because of the unthinking acceptance of absolute identity, and the unfounded assumption of one-to-one relationships between records and entities. All this talk about one-to-one relationships, unique identifiers, and the like has to be sharpened up: one what to one what? unique under what criterion of identity? And when we do sharpen it up, the notion of primary key vanishes away like the dew in the morn.

 

 

Identifiers and Keys

 

Let us, however, turn our minds to the notion of key.

We haven’t had a TT-DD schema for some while, so here’s one:

S3 x identifies F =df
FOR EACH y, if F(y) then FOR EACH z, if F(z) then if x\y is the same record as x\z then y is the same record as z.
 
x identifies F records when, if y is an F record and z is an F record and y and z have the same x projection, then y is the same record as z.
We can substitute some record type for “F”, for instance “EMPLOYEE” so that F(y) means “y is an EMPLOYEE record”.. Then “x identifies F” means that x (or, of course, any record of the same format) is an EMPLOYEE identifier. And we can define a minimal identifier:
S4 x keys F =df
x identifies F and FOR EACH y, if y is a proper projection of x then not y identifies F.
 
Taking F as a record type (effectively a relation) this captures the idea of x being a key of that record type (or of the relation containing all and only the records of that type).
It should also be said that we might take F as holding true of a certain projection of a type of record, and then “x identifies F” (or “x keys F”) would amount to x being a functional determinant (or a minimal functional determinant) of the F columns.

And that’s about as far as I can get in the TT-DD towards formalising the concepts of “key” and “functional determination”. I used to be worried that the relational people seemed to get much further, until it dawned that they weren’t doing it formally, merely informally in notation. I warned you about that, didn’t I?

 

 

Candidate Keys

 

Let’s suppose you find relational theorists saying things like “EMPNO is a candidate key of the EMPLOYEE relation” or “EMPNO functionally determines EMPNAME”. What does it mean? Well, one thing it means is that EMPNO is a key – in that TT-DD sense – of the EMPLOYEE record type, or is a key of the (EMPNO, EMPNAME) projection of it. But it means a bit more than that, which we can’t formalise in the TT-DD (nor they in the FOPC or set theory): that EMPNO is asserted to be such a key.

This is easily seen. Suppose EMPNAMEs were unique: every employee had a different name. Then we would find that EMPNAME was a key of EMPLOYEE, or of its (EMPNO, EMPNAME) projection. By chance, as it were. But it wouldn’t be a candidate key because it wouldn’t be asserted to be a key. Notice that “It is asserted that” is a propositional operator, like NOT (“It is not the case that”). But it’s not a truth-functional operator because we can’t tell whether “It is asserted that p” is true or false merely by knowing whether “p” is true or false. We can do that for “NOT”; if “p” is true then “NOT p” is false, and vice versa.

However, the only propositional operators in the FOPC, or set theory for that matter, are truth-functional: NOT, AND, OR, and the like. So “candidate key” and “functionally determines” are, in the senses used, not formalisable.

 

 

Keys in Optimisation

 

Now, I’m going to have a look in a bit at where these functional determinacies come from, but I want to say now that there’s certainly one area where some people think they’re important; but they’re not important at all, though keys in the TT-DD sense are. Actually, I suspect that this holds in most areas, but the area I’m particularly referring to is query optimisation.

I won’t go into it in detail – ask the good Father Warden if you want to understand it – but think about when you’ve done a projection and you need to check whether you’ve got any duplicate records, which you often have. Now it costs you to check for duplicates, but it may cost you not to check early enough if you do have duplicates. And one thing that helps is: if there is at least one key in the projection, that is, one collection of columns that keys all the columns, then you won’t have any duplicates (if you didn’t start with any!)

But – and here’s the rub – it doesn’t matter whether the key is a candidate key or not: it doesn’t matter whether it’s asserted that it’s a key. If it is asserted, in your data base definition, then it’s just fine, because the DBMS will have ensured that it’s true. But you may have found out by other methods that it’s true. And that’s what counts: being a key, not being asserted to be a key.

Although it’s unwise, you see, to drink from a bottle asserted to contain poison, that’s not what causes the trouble. It’s drinking from a bottle that actually contains poison, however you may have discovered – or even not – that it does contain poison. And so it is with optimisation.

 

 

Keys in Data Base Design

 

Now let’s turn our attention to another use of keys. People use them in data base design. We’ve touched on this already.

Suppose we want to hold this sort of data:

CAR:    REGNO  COLOUR  MAKE  BODYFRAME
It means, roughly:
Car REGNO has colour COLOUR, make MAKE, and bodyframe material BODYFRAME.
I’m sure you can put that into EI. We’ll soon find out. Now these data base designers will tap you on the shoulder and say: surely every car of any given make has the same bodyframe material. Let’s assume they’re right. Naturally you reply, “So what?” Or words to that effect. They say, “Have two relations”:
CARINST:    REGNO  COLOUR  MAKE
CARMAKE:    MAKE  BODYFRAME
Then, when you want CAR you can simply do their natural join on MAKE (a key join, as MAKE is the key of CARMAKE). And – well, we’ve discussed this before – they have all sorts of spurious-sounding arguments about avoiding redundancy and inconsistency, and in the end you find that they’re really talking about physical design.

But notice that if you really do want to impose the constraint that MAKE keys (MAKE, BODYFRAME) then you do. You will assert it in the data base schema, and you will want your DBMS to keep it true. And one possible physical implementation that will keep it true is to have store level records like those shown. But does that have anything at all to do with logical data base design? I don’t know, but it’s not obvious, is it?

Nor, as it happens, does it matter even at the store level that MAKE should be a candidate key of (MAKE, BODYFRAME). Suppose this were true: we thought, to the best of our knowledge, that MAKE did key (MAKE, BODYFRAME), but we didn’t care, and we weren’t certain. Couldn’t we say this, at the store level:

Assume, as a defeasible functional determinacy that MAKE does indeed key (MAKE, BODYFRAME). So hold the tables CARINST and CARMAKE.
Then we could have CAR at the form or concept level (of course it would have a record id at the form level, but ignore that for the moment). And we would join CARINST and CARMAKE to get CAR, and everything would go swimmingly, perhaps for ever. But what would happen on the ill-omened day when we found a MAKE with two different BODYFRAMEs?

Well, at the logical level, no change to the form or concept level. But we’ld send a message down to the physical level to say:

Remember that defeasible functional determinacy of MAKE: forget it. So join your CARINST and CARMAKE for. the last time; store the result – call it “CAR”; and throw away CARINST and CARMAKE.
Then we’ld update CAR, and that would be the end of a key. You see: even here it’s the truth, not the assertion, that matters. The only value of the assertion is that what it conveys is true.

Of all my proposals, I suspect that this relegation of relational data base design to the physical level will annoy the relational bigots most intensely. Good. But I have a second, even better reason for it.

 

 

The Join Trap

 

Let me assume that MAKE really is a candidate key of CARMAKE; “for all time” as they say. Now I’m going to add another column to CARMAKE, so we have:

CARINST:    REGNO  COLOUR  MAKE
CARMAKE:    MAKE  BODYFRAME  TESTMETHOD
And we have the natural join:
CAR:    REGNO  COLOUR  MAKE  BODYFRAME  TESTMETHOD
Let’s try our old friend CI to interpret these, using some specimen values:
CARINST: Car r1 has colour c1 and make m1.
 
CARMAKE: Car make m1 has bodyframe material b1 and has been tested by method t1.
For your information, I’ll tell you that “t1” is testing to destruction by a dummy. And now we can do the join:
CAR: Car r1 has colour c1, make m1, bodyframe material b1, and has been tested by method t1.
Oh dear! We appear to have gone from truth to falsity, because I drive car r1, and I can tell you that although it does have a bodyframe of b1 (steel, actually), it certainly has not been tested to destruction by a dummy; not to destruction. A subtle version of the join trap. But we know how to escape it: go to EI.
CARINST: FOR SOME x, x is the same car as r1, the same colour car as c1, and the same car-make as m1.
 
CARMAKE: FOR SOME y, y is the same car-make as m1, and is the same bodyframe-material-car as b1, and is the same method-tested-car as t1.
That’s better, isn’t it? Now let’s do the join:
CAR: FOR SOME x, FOR SOME y, x is the same car-make as y, and x is the same car as r1, the same colour car as c1, and the same car-make as m1, and y is the same bodyframe-material-car as b1, and is the same method-tested-car as t1.
And now we haven’t said that my dear r1 has been tested to destruction by a dummy, merely that it is the same make as some car that has been tested to destruction by a dummy. Unfortunately we haven’t said that r1 is steel-framed, which it is. We have merely said that it is the same make as some car that is steel-framed.

You see, it didn’t cost us nothing to avoid the join trap: we avoided some invalid, unreliable implications. But some invalid implications lead us to the truth. Remember: all that we ask from implication is that it doesn’t lead from truth to falsity. How do we sort this out? Like this:

CARINST:    REGNO  COLOUR  MAKE  BODYFRAME
CARMAKE:    MAKE  BODYFRAME  TESTMETHOD
And we interpret CARINST as:
FOR SOME x, x is the same car as r1, the same colour car as c1, the same car-make as m1, and the same bodyframe-material-car as b1
And then, leaving the interpretation of CARMAKE unchanged, we get the interpretation of CAR as:
FOR SOME x, FOR SOME y, x is the same car-make as y, and the same bodyframe-material-car as y, and x is the same car as r1, the same colour car as c1, and the same car-make as m1, and y is the same bodyframe-material-car as b1, and is the same method-tested-car as t1.
And that’s OK. But doesn’t it give rise to redundancy, inconsistency, etc. etc? Not at all. Because although the new CARINST and CARMAKE are kept, logically basic, what’s stored are just the old CARINST (without BODYFRAME) and CARMAKE.

Look at it this way: there are lots of reasons why one collection of columns might key another. When we think purely in terms of records – at the store level – these reasons really don’t matter. So decomposing records by such relationships, knowing that they can be restored by joins on keys, is just fine. You see, at the store level, all we have to do is to keep data: that’s what a store is for. You put things in, and you want to go back later and take them out. And that’s all.

But at the logical level our records have interpretations, and the manipulations we perform on them are implications, so we can’t arbitrarily rip them apart and use the manipulations to put them together. In terms specific to EI, when we split a record into two projections we get two interpretations with different existentially quantified variables: that is a loss of information.

I agree that it may not matter. If all the relevant users know that any two cars of the same make have the same bodyframe material, then they can add that knowledge to what the data base tells them. But I’m not in the business of devising data bases that work only if users have lots of supplementary information in their heads. I suspect we may have to support, before very long, unbelievably stupid front-end inference engines. And they’ll think that if Peter is an apostle and the apostles are holy and the apostles are exemplified by James and the apostles are 12, then Peter is not only holy but also exemplified by James, and 12 to boot.

Now you may be tempted to say that CARMAKE should be interpreted as:

FOR EACH y, if y is the same car-make as m1, then y is the same bodyframe-material-car as b1, and FOR SOME z, z is the same car-make as y and z is the same method-tested-car as t1.
And that’s fine, in this case. Now please give the general form of interpretation of a normal record, and explain projection, restriction, join, and so on. That is, explain how their results are to be interpreted, given the interpretations of their operands; and work out what follows from that.[1]

But also think carefully: what, if anything, is wrong with the interpretation I’ve given? It requires a separation between physical and logical levels? There isn’t a one-to-one relationship between them? So logical data base design and physical data base design become very different exercises? What would you count as goodness then?

 

 

Origins of Functional Determination

 

You can imagine that I’m a lot less interested in functional determinacies than the relational people are. Though, of course, I’ld be interested in key relationships if I were into physical data base design, and particularly optimisation, so I’ve generously thrown out the odd suggestion – like defeasible determinacies – as I’ve hacked my way through their forest. But if you are interested in functional determinacies, you ought to ask yourself where they come from.

Well, ignoring the accidental ones – the unasserted, coincidental keying relationships – we have some that derive from business rules. For example, we might have a policy that each employee was assigned a single visual display unit to work at. That would make Employee Number determine VDU Id. And then we’ld see people designing a data base in which VDU Id was found in the EMPLOYEE record: a great idea, at the store level; it does wonders for performance. But it’s not the job of the logical data base designer to make their design dependent on business rules that might change tomorrow.

Then, even worse, there are other enterprise’s business rules: the determination of BODYFRAME by MAKE might be an example. Again, these could change: OK to exploit them, perhaps as defeasible, at the store level.

Sometimes the other enterprise is the government, central or local, with all their laws and regulations. Do you think anyone has ever been caught on that elegant, utterly Wardenesque functional determination of VAT rate by the empty collection of columns, i.e. there is just one VAT rate? You betcha! Vanity of vanities, all is vanity! Even government regulations pass away. That’s an absurd understatement if you happen to be working in the local government sector.

We have social conventions: everyone has just one surname. I wouldn’t bet on it. And then we get to hard reality. Hardish. Every car has one steering wheel? I believe there are prototype cars that you can drive in either direction, like some of those old trams. And some cars – I learnt to drive in one – have two brake pedals and two clutch pedals, and sometimes I wished my instructor could override my steering as well (so did she).

But surely every person has one head! I don’t want to get gruesome, but no.

So what determinacies are reliable enough to use in data base design? If business rules, laws, social conventions, and even what we took for reality, all let us down, what can we rely on? What can we rely on when all else fails?

Logic. Logic alone will never lead us astray. Some designers like to throw in a bit of commonsense as well. I do myself; but I don’t recommend it.

 

 

Duplicate Rows

 

 

(Professor Platoclast was asked whether his “de-emphasising” of primary, and even candidate, keys meant that he admitted duplicate rows.)

“De-emphasising”! I couldn’t manage anything as subtle as de-emphasis; I don’t de-emphasise primary keys, I reject them. Candidate keys I admit informally, and I’ld like to formalise them but it needs a modal logic: a logic with functions that aren’t truth functions. As for duplicate rows, that’s a surprisingly difficult problem, but largely because discussions of it are so confused.

 

 

“Duplicate” Needs Definition

 

First of all, we have to be able to say what duplicate rows are. It seems obvious to me that if Rl is a duplicate of R2 then we need two criteria of identity for rows. Suppose we had only one criterion: then Rl might be the same row as R2 by that criterion, that is, “Rl” and “R2” would merely be different names for one and the same row; but if there’s only one row, with no matter how many names, then it’s not a duplicate. On the other hand, Rl might be a different row from R2 by that single criterion. So we would have two rows, but they wouldn’t be duplicates; they’ld just be two rows.

Obviously, therefore, we need two criteria of identity, so that we can say Rl is a duplicate of R2 when it’s the same by one criterion, but not the same by the other. But those who don’t want duplicate rows can’t just say: if Rl is the same as R2 by some criterion then it’s the same by any criterion. That’s obviously absurd, because any two rows are the same by some criterion, for example by “... is a row, and so is ...”

And I think what those opposed to duplicate rows want to say is, using our criterion of identity for rows (comprising all and only the same fields), that if Rl is the same by that criterion as R2 then Rl is the same by any criterion as R2. Now even that is pretty doubtful. For let Rl be a row in our data base, and let us read in that row and display it on a screen, and let us call the displayed row R2. I think I want to say that R2 is the same row as Rl, otherwise there’s something wrong with our application program: we asked it to display a row and it displayed some other row. Yet clearly Rl is not the same as R2 by any criterion: “is displayed on a screen” and “was written by a display program” are true of R2 but false of Rl, whereas “was read by a display program” is true of Rl but false of R2.

I think you can probably see why I’m not too worried about duplicate rows. Suppose, whatever it means, that Rl is a duplicate of R2, and that Rl is in our data base and that R2 is in our data base. Now, either Rl is the same row as R2 or it isn’t, by our criterion of identity of rows. If it isn’t, then we have two rows; if it is, then Rl is the same row as R2. But really that’s all I’m interested in: is Rl, or is it not, the same row as R2? And – whatever the answer might be – l don’t care whether they’re duplicates or not, in any sense of “duplicate”. As I said, you can get duplicate rows only if you have two criteria of identity for rows; but I’m interested in only one such criterion, so the problem doesn’t worry me.

(The Professor was not allowed to get away with this response. Date had written a paper attacking duplicate rows;[2] SQL allowed duplicate rows; surely the problem could not be so easily dismissed.)

Very well; I admit the problem seems important to Mr Date, but he doesn’t have the theoretical apparatus to address the problem. He never gets round to mentioning the criteria of identity he’s using. And his attack is based on set theory, which he considers simple and sound, and I reckon to be complex and unsound.

His argument starts with a claim that individual objects must be identifiable (distinguished from other objects). If that were so, there would be a criterion of identity for individual objects: “is the same object as”. And that would mean that you could tell me, for instance, how many objects there are on this desk. But you can’t: you can tell me how many books, or how many sheets of paper, or how many sticks of chalk; but not how many objects, or things.

Then Mr Date argues that if we have duplicate rows in a relation we have to distinguish them by position. Actually he says that we have to distinguish them somehow, not necessarily by position, and then he assumes position.

But, as far as I can tell, people who want duplicate rows (in SQL, say) don’t want them distinguished by position.

 

 

Interpretation of Duplicate Rows

 

And then he says ... what does he say? Ah! He then goes on to the question of meaning, and this is where things go hopelessly wrong, because he raises the question but doesn’t answer it. He gives an example with duplicate rows, but doesn’t tell us what it means.

And this is the difficult thing in arguing against duplicate rows. We either have to say: duplicate rows have no meaning, or we have to give a meaning. If we say they have no meaning, ... Well, there’s an end to it. But it’s not very convincing, because people wouldn’t use duplicate rows if they had no meaning. On the other hand, if we give a meaning to duplicate rows, and possibly even translate each table with duplicate rows into a table without duplicate rows, then our last state is worse than our first. We have actually shown how to handle duplicate rows: we can translate them into non-duplicates, manipulate them, and translate them back. Then it seems absurd to criticise duplicate rows so violently.

Mr Date actually states a query on a table with duplicate rows and formalises that query in several different ways, with different results. But the trouble is: we can’t tell which, if any, of his formalisations are correct because he hasn’t given us any interpretation of his table. He claims that the duplicate rows give rise to ambiguity in the result; but the ambiguity comes merely from the failure to specify the meaning of the table.

 

 

A Case for Duplicate Rows

 

Let me try to build a case for duplicate rows, and you’ll see just how difficult it is to argue against them. Suppose I sell what are called “big ticket” items: beds, washing machines, that sort of thing. I might want to display a table of what I have in stock: I’ll show you a part of that table, highly simplified:

BTI:    CLASS  ITEM
 
        Bed    Std Dbl
        Bed    Std Dbl
I have, you see, two rows: because I keep one row for each item. Now, what criticisms could be raised?
I’m relying on order to distinguish the rows (or the beds); no, I’m not, because I don’t care if the rows (or the beds) appear in a different order.
 
The table has only one row, not two; say that if you will, but then you can’t even complain that I have duplicate rows. In any event, the table has two rows.
 
All right, the table has two rows, but they comprise the same fields; nonsense; the objection is a contradiction. If they comprise the same fields, they are the same row.
 
So the table has two rows, so they must comprise different fields; but it seems as if they comprise the same fields: not at all. If they comprised the same fields there would be only one row, but as there are clearly two rows it seems – indeed it must be – that they comprise different fields.
 
But each of the two fields appears the same in the records: yes. indeed; how very observant.
Do we want to say here: not only must two rows differ in at least one field. but they must always appear to do so? No; we couldn’t want to say that. Suppose we have records with long rows, and we have to page a screen right to left along them. At some point many rows might appear to have all the same fields, merely because we couldn’t see all the fields. In any event. even if we show all of the rows, two might appear the same to one user, but not to another: it’s a question. perhaps of acuity of vision. But what has all this to do with data base theory? Surely if the system is correct – theoretically correct – below the presentation level, then what the user sees or doesn’t see is irrelevant.

But is this system correct in the DML? Or does the DML distinguish only one row? Clearly, the DML distinguishes two rows: the table shows that. Then the two rows must differ in the values of at least one field? Quite so. But the table doesn’t show that they differ in at least one field: yes it does, or there wouldn’t be two rows shown.

The table doesn’t show in which field they differ. True. There are a number of possibilities. There might be a third field. not shown at all. This could be a cryptic field. I’ll put it in:

BTI:    CLASS  ITEM     ID
 
        Bed    Std Dbl  *
        Bed    Std Dbl  *
It doesn’t help much: but if you’ll just believe the ID values are different. then all is well. But then they’re not duplicates. So you have no cause to complain, but of course there is an important sense in which one row is the same as the other: there is no overt difference between them. In that sense they are duplicates: the same overt row but not the same row.

But there might not be an extra column. Perhaps the ITEM column contains two different values, each of which appears similar to the other. Again. we would have two rows that were the same overt row. We could say: ITEM is a partially cryptic field.

Look. you would allow this wouldn’t you:

BTI:    CLASS  ITEM     CT
 
        Bed    Std Dbl  2
Now , there are many ways of counting: one is to cut tally marks into a stick. Instead of “2” I’ll write the tally (why not? “2” is merely an ambiguous abbreviation of any tally comprising a pair of marks):
BTI:    CLASS  ITEM     TALLY
 
        Bed    Std Dbl  üü
I know the tally marks look the same, but they’re not. But what is wrong is: this is not normal, because a tally is compound, not atomic. And yes – I do have doubts about count fields in records. Let’s normalise:
BTI:    CLASS  ITEM     TMARK
 
        Bed    Std Dbl  ü
        Bed    Std Dbl  ü
Then we could actually combine ITEM and TMARK:
BTI:    CLASS  ITEM
 
        Bed    Std Dblü
        Bed    Std Dblü
But in any event, as long as the tally marks are there, as long as something is there to distinguish the rows, we don’t need to show them (it doesn’t matter which tally mark goes in which row):
BTI:    CLASS  ITEM
 
        Bed    Std Dbl
        Bed    Std Dbl

 

 

Arbitrary Choice Among Duplicates

 

Now you might complain: but how can I update one of these rows (or delete one of them), without affecting the other. That’s easy enough in a program, which looks at one row at a time. But how about the poor user? Well, it’s a matter of language design, but why shouldn’t we allow a user to select ARBITRARY, or select A. We might already allow, I think, something like:

SELECT I1.CLASS, I1.ITEM
  FROM BTI I1, BTI I2
 WHERE I1.CLASS = "Bed" AND I2.CLASS = "Bed"
   AND I1.ITEM LIKE "Std Dbl*"
   AND I2.ITEM LIKE "Std Dbl*"
 GROUP BY I1.CLASS, I1.ITEM
HAVING I1.ITEM = MAX (I2.ITEM)
And we don’t want “MAX”, we want “ ARB” – just pick any such:
SELECT I1.CLASS, I1.ITEM
  FROM BTI I1, BTI I2
 WHERE I1.CLASS = "Bed" AND I2.CLASS = "Bed"
   AND I1.ITEM LIKE "Std Dbl*"
   AND I2.ITEM LIKE "Std Dbl*"
 GROUP BY I1.CLASS, I1.ITEM
HAVING I1.ITEM = ARB (I2.ITEM)
Of course, we might choose to simplify that, just as people currently want simple ways of requesting the first row, according to some parameter, or even the first n rows. Anyway, there must be some way to allow a user to pick out an arbitrary row for update or delete.

But how could a user pick out a particular row? How distinguish between two items that appear the same? Well, they couldn’t: this appearance of “duplicate rows” is used only when the user can’t, and doesn’t need to, distinguish. Indeed, in some cases, the user mustn’t distinguish: for perishable items, we might want to force the system to extract them from the store first-in-first-out, but to hide that rule from the user and to enforce it unseen.

But notice: I have defended what you might choose to call “duplicate rows” by telling you what they mean. Indeed, I give you a translation into overtly different rows, or a clearly acceptable format, like:

BTI:    CLASS  ITEM     CT
 
        Bed    Std Dbl  2
And that tells you precisely how I want them processed: you can translate between representations with duplicates, and representations without. But you may say: what you have described are not duplicate rows at all, just ordinary, but partially cryptic, rows. Well, I don’t want to argue with that. The duplicate row controversy is one in which neither side will say what they mean. The proposals on both sides are too weak to stand any argumentative strain.

 

 

Duplicate Rows in SQL

 

SQL?... No, of course SQL doesn’t implement Relational Theory without cryptic fields, but then – I presume – it wasn’t intended to. They wanted the sort of duplicate rows that SQL has. But that doesn’t mean it ought to be changed to implement Relational Theory without cryptic fields. Perhaps it ought to be changed to implement Relational Theory with cryptic fields; or even Platoclast’s Picnic Theory with cryptic fields. The main thing is to work out what we’re doing and where we want to go, before we start rebuilding the vehicle.

 

SITE HOME PAGE

Go home. You’ve had the taxpayers’ money’s worth for today.

THE DATABASE PAGE

THE DATABASE PAPERS

 

Preface & Contents

 

DOWNLOAD

Download Lecture XXIV (rtf, Word for Windows compatible)

Platoclast on Data: Lecture XXV

 

Copyright © 1993, 2001 Adrian Larner. The author asserts all moral rights.