(U03)

www.btinternet.com/~adrian.larner/database/pcl28

PLATOCLAST
ON DATA

An Informal Symposium

 

 

Prof Platoclast:
Now that Judy and Douglas have arrived, let’s start and embarrass the rest of the latecomers. I don’t have anything new to teach, but as some of you have questions, worries, and – I hope – disagreements, I thought this would be a good time to air them. Anyone want to kick off?

 

 

Tables and Views

 

Darren:
When you were talking about views, sir ...
Pla:
Please don’t call me “sir”, dear boy. Call me “Professor”.

 

 

Table Types and Instances

 

Dar:
Er, Professor, it seemed to me there was a bit of confusion about what you meant by “table”. Was it merely a collection of rows, like the EMPLOYEE table at this point in time, or was it just the EMPLOYEE table full stop?
Pla:
How would Mr Date distinguish between those?
Dar:
I believe he speaks of a relation instance and a relation.
Pla:
Quite so, and Dr Codd says that a relation – in that sense – is a time-varying collection of rows. But how would we wish to make the distinction?
Dar:
I’m not sure.
Pla:
Well I think we’ld want to say to begin with that if I have a relation instance in my pocket then I have a relation in my pocket. And I couldn’t have a relation without having an instance of it. Isn’t that so?
Dar:
It is just as you say.
Pla:
And doesn’t that mean that they have the same criterion of application?
Dar:
Yes. So they must have different criteria of identity. It’s just another type/instance distinction.
Pla:
Well said. But what is the criterion of identity of a table type, in contrast to that of a table instance?
Dar:
I suppose it’s a matter of it having the same columns.
Pla:
Yes, roughly. Though we’ld want to think carefully about the criteria to use at the form and concept levels. At the concept level we would want to insist on the same record type names as well.
 
But I ought to mention that once we admit record type na:mes we rather open the door to a data base administrator deciding, for instance, to add a new column to all EMPLOYEE records – or even just to some EMPLOYEE records – but still calling them “EMPLOYEE”. People do that in current systems. Then the criterion of table type identity becomes merely sameness of record type names.

 

 

Views

 

Dar:
So people are wrong to distinguish stored tables from views on the grounds that stored tables consist of records, but views are just definitions?
Pla:
Wrong indeed! Worse than wrong, which is a normal human failing: they’re terribly confused. Both stored tables and views comprise records, and both stored table types and view types are conveyed by definitions of some sort. I suppose the confusion comes because we tend by “view” to mean “view type”, and by “table” to mean “table instance”. But it’s quite silly to say that the definition of a view simply is that view, even in the sense of “view type”. Only mathematicians think that definitions bring things into existence.

 

 

Adding Columns to Tables

 

Judith:
Are you saying, Geoff ... Professor, that it shouldn’t be allowed to add columns to a relation?
Pla:
Well personally I’m not very keen on it, but in practice I think it’s going to happen. For instance, we might keep location identifiers in DEPARTMENT records, but not in EMPLOYEE records. And that’s fine as long as every employee is located where their department is located. But when we find we have some employee remote from their department we get problems, so we have to – in some sense – add a location column to the EMPLOYEE record.
Jud:
And you wouldn’t want to call the new sort of record by a different name?
Pla:
Well, I would. But I don’t think the users would like it much.

 

 

Cartesian Product v. Natural Join

 

Jud:
Would it give problems to the DBMS if we kept the old name?
Pla:
It wouldn’t give problems to a decent Cartesian product based, or Combination based, DML; but think what would happen if we followed Father Warden’s advice and did natural joins on common column names. I’m thinking of a natural join of EMPLOYEE and DEPARTMENT: what columns do you think they’ld have in common?
Jud:
Just Department Number, I suppose.
Pla:
And when we added Location to Employee?
Jud:
Department Number and Location ...so we’ld get a different query.
Pla:
Exactly. Instead of all employees along with their departments we’ld get all employees that happened to be in the same location as their departments, along with those departments.
Jud:
But we could get round that by renaming columns, couldn’t we?
Pla:
Yes, in a sense. But the problem is that what the user would call “one and the same query” would change its interpretation: it might be a query that they’d used for many years, and suddenly – but not very obviously – it starts to give the wrong answer.

 

 

The Identity of Queries

 

Dar:
Do you think, Professor, that we need a clear criterion of identity for queries? You’ve appealed to the notion of “the same query”.
Pla:
Yes indeed, and adding columns to relations makes that criterion quite difficult to state. A query is a function on tables, so I think we can say that query q1(t), where t is a list of tables, is the same query as q2(t) when for all instances of the tables in t, q1 and q2 result in the same table instance. But then we have to add a proviso that this also holds even if any table in t has a column added. Interesting.

 

 

User Front-End with Natural Join

 

Jud:
Didn’t you say, though, that the idea of natural join on commonly named columns was quite a good idea in a user front-end?
Pla:
Yes indeed. I envisaged a user asking for the join of EMPLOYEE and DEPARTMENT, and the front-end would come back to them saying: “I assume you mean with the restriction WHERE EMPLOYEE.DEPTNO = DEPARTMENT.DEPTNO”, or – after the change – “AND EMPLOYEE.LOCN = DEPARTMENT.LOCN” as well.
Jud:
And then the user would say “Yes” or “No” ...
Pla:
As the mood took them. Exactly: and that gives us, I think, the best of both worlds.

 

 

Handling Very Large Tables

 

Douglas:
Professor, could I put in an objection to some of your ideas?
Pla:
Of course.
Dou:
It seems to me that some of the things you’ve proposed give us really massive tables. I mean the Combination of two tables of r and s rows gives us a table of r + s + rs rows. And if we assume automatic headlining of all projections of a table, then an r row, n column table would have up to r by 2n rows. Could users cope with that? And wouldn’t it perform badly?
Pla:
Well: you tell me how users and optimisers cope with current systems. Suppose I have a thousand row person table and I join it to itself twice, to find triples of persons. Don’t I first, in an SQL FROM clause say, create a Cartesian product with a thousand million rows? How do users cope with that?
Dou:
Well ...they don’t actually look at it. They apply restrictions and things to get it down to a reasonable size.
Pla:
Exactly. And the optimiser doesn’t just build a huge table: it looks at the join conditions and uses them to speed up the entire query. Probably it would sort on the join columns first: that’s a well-known technique.
 
In formulating a query the user thinks in terms of meanings, and doesn’t worry about size, or actual values. Or perhaps they think of some sample values. It’s just like our talking of, say, pairs of integers: we can just entertain the idea “pair of integers”, or even think “1 and 2, 3 and 8, 2 and 10, etc.” It doesn’t boggle our minds, even though we know there are lots and lots of such pairs.
 
Huge tables are only a problem when it comes to outputting them, and by then we’ve usually got them down to a reasonable size. Of course, there’s nothing in the DBMS to stop a user trying to see a table with a million records, if that’s what they want. Though I wouldn’t recommend it. But, I should say, a decent presentation handler might put some sort of limits on them, or at least warn them how big the result was and perhaps display only a screen full at a time.

 

 

Nested Queries

 

Dou:
Could I raise another question then? Why are you so opposed to nested queries?
Pla:
Well, I don’t know that I’m so opposed to them. But I’m convinced that every query that can be expressed with nested queries can also be expressed without them. That’s because we can think in this way: suppose we use a nested query in a restriction, for instance. Then we have some condition on a row, the row being restricted; and a table, the result of the nested query. We could simply combine that result table with our unnested query, effectively extending all rows (and therefore the row in question) with each row in the nested result table.And then, whatever we did in the nested query, we could do in collections and accumulations.
 
And then, I think, if we have one way to do all the queries we want, it’s unhelpful to have another way: to make the user choose which way is best. But it’s really a practical question: if nested queries really do make it easier for users, then I guess we ought to have nested queries.

 

 

Unnestung Queries

 

Dou:
You’re pretty convinced that every nested query has an equivalent unnested query?
Pla:
Yes, I am.
Dou:
Then in that case we might have a way to translate every nested query automatically into an unnested query?
Pla:
I think so: it would be an interesting thing to formalise and demonstrate.

 

 

Nesting as a Syntactic Variation

 

Dou:
But if it could be done, wouldn’t that mean we could think of the definition of a nested query as just a different way of formulating an unnested query, but no more significant that the difference between “x=2” and “2=x”?
Pla:
So we wouldn’t really be able to distinguish nested and unnested queries, merely a nested syntax and an unnested syntax for definition of one and the same query? And then my objection, such as it is, would be purely stylistic: like an objection to splitting infinitives.
Jud:
Or our objection to your use of singular “they”?
Pla:
Quite.
Dou:
And – if we could make it just a stylistic difference – then there couldn’t be any problem in optimising nested queries.

 

 

The Importance of Syntax

 

Pla:
Well, that’s true enough. So let’s say we agree that the difference is only syntactic. Does that mean it’s not important?
Jud:
Of course.
Pla:
I think not. What sort of difference is there between a relational data base, or any data base comprising merely normal records, and a hierarchical data base, with implicit connections between records? Is that a syntactic difference?
Dou:
I suppose were allowed your trick of talking in terms of paper forms?
Pla:
I suppose, if you get any sharper you’ll cut yourself.
Dou:
Then a hierarchical data base is just a report with level breaks, and a relational data base is just a matrix, and normal records are just rows. So their differences are merely syntactic.
Pla:
Precisely. And that shows how important syntax is. And how we can explain why we prefer relational to hierarchical, and purely normal records to relations: their syntax is much simpler .
Dou:
And you want to claim that the same holds true of the DML itself? You think it should have a very simple syntax?
Pla:
Absolutely! You see, if Dr Codd was right, if users can manage better with simpler data structures, then I see no reason to think that they want DMLs with complex syntax. A query definition, after all, is another piece of data.

 

 

Non-Uniform Format Needed For Meta-Data

 

Jud:
And you think a query definition, something like an SQL SELECT statement, should be a relation? Or a normal record?
Pla:
No. I suspect it would comprise several normal records
Dar:
But not a relation, because they’ld have different formats. You’re back there again, aren’t you Professor?
Pla:
Quite so. Indeed, as soon as we start to consider what they call “metadata”, data about data, whether queries or data definitions or constraints, we’ll find that we can hold all these metadata components – all the constraints say – in normal records, but not in normal records of common format.
Dar:
But is that important?
Pla:
Well I think it’s terribly important. You see, we want to claim: this method – relational or whatever – is good for all the data we might want to hold. Of course, there may be limits on manipulations, because we insist on having a really simple DML. But there shouldn’t be limits on content. If we have to say, “Here’s a fact we can’t record”, then we open the door to someone else coming along with a better theory.
 
And we do rather shoot ourselves in the foot if we can’t even record the sort of facts we actually utter about our records. It’s a bit like standing up on a soap box and saying, “Everything we want to say can be formulated without polysyllables; we can avoid sesquipedalianism.” And then when someone asks us to put that in short words, we just shrug in a rather embarrassed way.

 

 

A Philosophical Objection

 

Dar:
But don’t the mathematicians tell us that using a language to talk about that language itself can lead to contradictions?
Jud:
Darren, what are you going on about?
Pla:
What he’s going on about is, for instance, heterologous epithets. Let’s use English to talk about English epithets. Some epithets, such as “short” and indeed “English” are true of themselves: “short” is a short word, and “English” is an English word. Let’s call them homologous. And the ones that aren’t homologous, like “long” and “German”, are heterologous.
 
And then we ask whether “heterologous” is heterologous. If it is, then it’s true of itself, so it’s not heterologous but homologous. If it isn’t true of itself, then it is heterologous, so it is true of itself. We get a contradiction.
Dar:
Right right; and we mustn’t risk that, so we shouldn’t use our language as our meta-language, or – by extension – our data as our meta-data.
Pla:
On the contrary! Suppose we formulate a theory, like the theory that there are these epithets “homologous” and “heterologous”, with the meanings we gave to them, and all epithets are either homologous or heterologous. We see what happens. With any luck, we find a contradiction, and that shows us our theory must be false.
Jud:
That’s a bit like Russell’s paradox, isn’t it?
Pla:
Exactly! And one of the things the set-theoreticians did was resort to this meta-language nonsense to stop the paradox arising, as they’ld say.
Dar:
But what’s wrong with that?
Pla:
What’s wrong with painting over cracks? We have a chance to advance; we have a risky theory, a theory that goes out on a limb. If it fails, like set theory failed, we’ve made an advance: we’ve marked one path “No Through Road”.
 
We really ought to have these risky theories: if they stand up to self-application, they might even be true. Sir Karl Popper says a similar thing about scientific theories: they should be refutable, empirically. We can’t get empirical refutability in logical theories, so we certainly shouldn’t avoid the next best thing.
Dar:
But Popper accepts Tarski’s theory of truth, and that needs an infinite hierarchy of language, meta-language, meta-meta-Ianguage, ...
Pla:
Well, no-one’s perfectly consistent.

 

 

The End of the Conversaation

 

Jud:
Darren, did you know you’re not registered on Professor Platoclast’s course.
Dar:
Yes. I’m in the Philosophy department, and the course isn’t on our syllabus.
Jud:
How strange; it’s an optional course on our syllabus in Women’s Studies.
Dar:
Who on earth got Data Base Theory accepted as an optional in Women’s Studies?
Pla:
Let’s not get sidetracked here, Darren, dear boy. Why do you bother to attend, and even do some homework?
Dar:
For edification and delight.
Pla:
I’ll ignore the hedonism; have you learnt anything?
Dar:
I’ve learnt how much you’ve taken from Nelson Goodman, without acknowledgement. Beside all that relative identity stuff you got from Peter Geach. They’re pretty strange bedfellows.
Pla:
Well, thanks. I guess that brings us to the top of the hour, as they say. See you tomorrow.... Bye, Darren....
 
Yes Douglas.... Goodness me! You jest, surely?... What can I say? Does she really prefer to be called “Judith”?... How extraordinary!

 

 

On Nested Records

 

 

(This informal exposition followed a Graduate Seminar at which the presenter had described her researches on nested relations, i.e. relations whose values could be sets or, specifically, relations. In the questions after the seminar, Professor Platoclast had – rather savagely – shown how, by retaining normal records, but abandoning uniform format, he could, arguably more simply, specify all the queries claimed to be facilitated or enabled by using nested relations. The presenter had, under this pressure, been driven to the point of admitting that we had either to abandon first normal form, as she preferred, or to abandon uniform format, as Platoclast wanted.)

Lo, my friends, she delivered herself into my hands. You would, I think, be hard put to construct nested relations upon the theory that I have proposed to you: so we can have, as I claim, normal records in non-rectangular tables, without having nested relations. The question is: can we have nested relations without having non-rectangular tables? I mean: can I construct my tables on hers, without using any extra constructs? And I think I can.

Firstly, I won’t even demand nested relations, I’ll accept merely set-valued columns. Secondly, I’ll accept a constraint proposed by Professor Quine that any object is identified with its unit set, that is, with the set that contains merely that object, but only to the extent of avoiding ambiguous cardinality.[1] This means that if the object in question is a set of cardinality other than 1, then we don’t identify it with the set that contains it: so {p} is identified with {{p}} but {p,q} is not identified with {{p,q}}, because we would have ambiguous cardinality. So, if p isn’t a set, or p is a unit set, then we can identify it with {p}; otherwise we can’t.

Thirdly, I’ll confine myself to non-composite sets: sets with cardinality less than 2. I think you’ll see that all I’m left with are non-sets, say p, each of. which is identified with its unit set, {p}; the empty set, {}; and the set that contains just the empty set, {{}}. And all my values can now be sets.

In effect, wherever I would normally have a domain of non-sets – say characters – now I’ll have a domain of sets of characters. It will contain all the unit sets of characters – that is, all the characters – and the empty set.Optionally, we could extend such a domain to include the set that contains the empty set.

But what does this give me? It gives me-in every domain – the empty set, which I will use as a pad. So I can now get non-uniform rows in my tables, merely by padding. And it gives me, optionally, the set containing the empty set, which I will use as my null value. But that’s all I need!

So it’s true, as our friend admitted, that we must have either nested relations or non-uniform format. But what she didn’t see (and even I didn’t see it in time) was that a tiny, highly constrained part of the facilities needed for nested relations – merely non-composite sets – was quite enough to give us non-uniform format. We have to conclude therefore that we do need non-uniform format, but in that case we don’t need the horrid complexity of nested relations.

 

SITE HOME PAGE

What?... Yes, of course, using only non-composite sets isn’t really using set theory at all. You didn’t imagine I’ld sink to that, did you?

THE DATABASE PAGE

THE DATABASE PAPERS

 

Preface & Contents

 

DOWNLOAD

Download An Informal Symposium (rtf, Word for Windows compatible)

Platoclast on Data: Lecture XXIX

 

Copyright © 1993, 2001 Adrian Larner. The author asserts all moral rights.