|
(U03)
|
www.btinternet.com/~adrian.larner/database/pcl28
|
|
PLATOCLAST ON DATA
An Informal Symposium
|
|
|
|
|
- Prof Platoclast:
- Now that Judy and Douglas have arrived, let’s start and embarrass the rest of the latecomers.
I don’t have anything new to teach, but as some of you have questions, worries,
and I hope disagreements, I thought this would be a good time to air them.
Anyone want to kick off?
|
|
|
|
|
|
|
|
- Darren:
- When you were talking about views, sir ...
- Pla:
- Please don’t call me “sir”, dear boy. Call me “Professor”.
|
|
|
|
|
|
|
|
- Dar:
- Er, Professor, it seemed to me there was a bit of confusion about what you meant by “table”.
Was it merely a collection of rows, like the EMPLOYEE table at this point in time,
or was it just the EMPLOYEE table full stop?
- Pla:
- How would Mr Date distinguish between those?
- Dar:
- I believe he speaks of a relation instance and a relation.
- Pla:
- Quite so, and Dr Codd says that a relation in that sense
is a time-varying collection of rows. But how would we wish to make the distinction?
- Dar:
- I’m not sure.
- Pla:
- Well I think we’ld want to say to begin with that if I have a relation
instance in my pocket then I have a relation in my pocket.
And I couldn’t have a relation without having an instance of it. Isn’t that so?
- Dar:
- It is just as you say.
- Pla:
- And doesn’t that mean that they have the same criterion of application?
- Dar:
- Yes. So they must have different criteria of identity.
It’s just another type/instance distinction.
- Pla:
- Well said. But what is the criterion of identity of a table type, in contrast to that of a table instance?
- Dar:
- I suppose it’s a matter of it having the same columns.
- Pla:
- Yes, roughly.
Though we’ld want to think carefully about the criteria to use at the form and concept levels.
At the concept level we would want to insist on the same record type names as well.
- But I ought to mention that once we admit record type na:mes we rather open the door to
a data base administrator deciding, for instance,
to add a new column to all EMPLOYEE records or even just to some EMPLOYEE records
but still calling them “EMPLOYEE”.
People do that in current systems.
Then the criterion of table type identity becomes merely sameness of record type names.
|
|
|
|
|
|
|
|
- Dar:
- So people are wrong to distinguish stored tables from views on the grounds that stored tables consist of records,
but views are just definitions?
- Pla:
- Wrong indeed! Worse than wrong, which is a normal human failing: they’re terribly confused.
Both stored tables and views comprise records,
and both stored table types and view types are conveyed by definitions of some sort.
I suppose the confusion comes because we tend by “view” to mean “view type”,
and by “table” to mean “table instance”.
But it’s quite silly to say that the definition of a view simply is that view, even in the sense of “view type”.
Only mathematicians think that definitions bring things into existence.
|
|
|
|
|
|
|
|
- Judith:
- Are you saying, Geoff ...
Professor, that it shouldn’t be allowed to add columns to a relation?
- Pla:
- Well personally I’m not very keen on it, but in practice I think it’s going to happen.
For instance, we might keep location identifiers in DEPARTMENT records,
but not in EMPLOYEE records.
And that’s fine as long as every employee is located where their department is located.
But when we find we have some employee remote from their department we get problems,
so we have to in some sense add a location column to the EMPLOYEE record.
- Jud:
- And you wouldn’t want to call the new sort of record by a different name?
- Pla:
- Well, I would. But I don’t think the users would like it much.
|
|
|
|
|
|
|
|
- Jud:
- Would it give problems to the DBMS if we kept the old name?
- Pla:
- It wouldn’t give problems to a decent Cartesian product based, or
Combination based, DML;
but think what would happen if we followed Father Warden’s advice and did natural joins on common column names.
I’m thinking of a natural join of EMPLOYEE and DEPARTMENT:
what columns do you think they’ld have in common?
- Jud:
- Just Department Number, I suppose.
- Pla:
- And when we added Location to Employee?
- Jud:
- Department Number and Location ...so we’ld get a different query.
- Pla:
- Exactly. Instead of all employees along with their departments we’ld get all employees
that happened to be in the same location as their departments, along with those departments.
- Jud:
- But we could get round that by renaming columns, couldn’t we?
- Pla:
- Yes, in a sense. But the problem is that what the user would call “one and the same query”
would change its interpretation: it might be a query that they’d used for many years,
and suddenly but not very obviously it starts to give the wrong answer.
|
|
|
|
|
|
|
|
- Dar:
- Do you think, Professor, that we need a clear criterion of identity for queries?
You’ve appealed to the notion of “the same query”.
- Pla:
- Yes indeed, and adding columns to relations makes that criterion quite difficult to state.
A query is a function on tables, so I think we can say that query q1(t), where t is a list of tables,
is the same query as q2(t) when for all instances of the tables in t, q1 and q2 result in the same table instance.
But then we have to add a proviso that this also holds even if any table in t has a column added.
Interesting.
|
|
|
|
|
|
|
|
- Jud:
- Didn’t you say, though, that the idea of natural join on commonly named columns
was quite a good idea in a user front-end?
- Pla:
- Yes indeed. I envisaged a user asking for the join of EMPLOYEE and DEPARTMENT,
and the front-end would come back to them saying:
“I assume you mean with the restriction WHERE EMPLOYEE.DEPTNO = DEPARTMENT.DEPTNO”,
or after the change “AND EMPLOYEE.LOCN = DEPARTMENT.LOCN” as well.
- Jud:
- And then the user would say “Yes” or “No” ...
- Pla:
- As the mood took them. Exactly: and that gives us, I think, the best of both worlds.
|
|
|
|
|
|
|
|
- Douglas:
- Professor, could I put in an objection to some of your ideas?
- Pla:
- Of course.
- Dou:
- It seems to me that some of the things you’ve proposed give us really
massive tables. I mean the Combination of two tables of r and s rows gives us a table of r + s + rs rows.
And if we assume automatic headlining of all projections of a table, then an r row,
n column table would have up to r by 2n rows.
Could users cope with that? And wouldn’t it perform badly?
- Pla:
- Well: you tell me how users and optimisers cope with current systems.
Suppose I have a thousand row person table and I join it to itself twice, to find triples of persons.
Don’t I first, in an SQL FROM clause say, create a Cartesian product with a thousand million rows?
How do users cope with that?
- Dou:
- Well ...they don’t actually look at it. They apply restrictions and things to get it down to a reasonable size.
- Pla:
- Exactly. And the optimiser doesn’t just build a huge table: it looks at the join conditions
and uses them to speed up the entire query.
Probably it would sort on the join columns first: that’s a well-known technique.
- In formulating a query the user thinks in terms of meanings, and doesn’t worry about size, or actual values.
Or perhaps they think of some sample values.
It’s just like our talking of, say, pairs of integers: we can just entertain the idea “pair of integers”,
or even think “1 and 2, 3 and 8, 2 and 10, etc.”
It doesn’t boggle our minds, even though we know there are lots and lots of such pairs.
- Huge tables are only a problem when it comes to outputting them,
and by then we’ve usually got them down to a reasonable size.
Of course, there’s nothing in the DBMS to stop a user trying to see a table with
a million records, if that’s what they want.
Though I wouldn’t recommend it.
But, I should say, a decent presentation handler might put some sort of limits on them,
or at least warn them how big the result was and perhaps display only a screen full at a time.
|
|
|
|
|
|
|
|
- Dou:
- Could I raise another question then? Why are you so opposed to nested queries?
- Pla:
- Well, I don’t know that I’m so opposed to them.
But I’m convinced that every query that can be expressed with nested queries can also be expressed without them.
That’s because we can think in this way:
suppose we use a nested query in a restriction, for instance.
Then we have some condition on a row, the row being restricted; and a table, the result of the nested query.
We could simply combine that result table with our unnested query,
effectively extending all rows (and therefore the row in question) with each row in the nested result table.And then, whatever we did in the nested query,
we could do in collections and accumulations.
- And then, I think, if we have one way to do all the queries we want,
it’s unhelpful to have another way: to make the user choose which way is best.
But it’s really a practical question: if nested queries really do make it easier for users,
then I guess we ought to have nested queries.
|
|
|
|
|
|
|
|
- Dou:
- You’re pretty convinced that every nested query has an equivalent unnested query?
- Pla:
- Yes, I am.
- Dou:
- Then in that case we might have a way to translate every nested query automatically into an unnested query?
- Pla:
- I think so: it would be an interesting thing to formalise and demonstrate.
|
|
|
|
|
|
|
|
- Dou:
- But if it could be done, wouldn’t that mean we could think of the
definition of a nested query as just a different way of formulating an unnested query,
but no more significant that the difference between “x=2” and “2=x”?
- Pla:
- So we wouldn’t really be able to distinguish nested and unnested queries,
merely a nested syntax and an unnested syntax for definition of one and the same query?
And then my objection, such as it is, would be purely stylistic: like an objection to splitting infinitives.
- Jud:
- Or our objection to your use of singular “they”?
- Pla:
- Quite.
- Dou:
- And if we could make it just a stylistic difference then there couldn’t
be any problem in optimising nested queries.
|
|
|
|
|
|
|
|
- Pla:
- Well, that’s true enough.
So let’s say we agree that the difference is only syntactic. Does that mean it’s not important?
- Jud:
- Of course.
- Pla:
- I think not.
What sort of difference is there between a relational data base,
or any data base comprising merely normal records, and a hierarchical data base,
with implicit connections between records?
Is that a syntactic difference?
- Dou:
- I suppose were allowed your trick of talking in terms of paper forms?
- Pla:
- I suppose, if you get any sharper you’ll cut yourself.
- Dou:
- Then a hierarchical data base is just a report with level breaks,
and a relational data base is just a matrix, and normal records are just rows.
So their differences are merely syntactic.
- Pla:
- Precisely. And that shows how important syntax is.
And how we can explain why we prefer relational to hierarchical,
and purely normal records to relations: their syntax is much simpler .
- Dou:
- And you want to claim that the same holds true of the DML itself?
You think it should have a very simple syntax?
- Pla:
- Absolutely! You see, if Dr Codd was right, if users can manage better with simpler data structures,
then I see no reason to think that they want DMLs with complex syntax.
A query definition, after all, is another piece of data.
|
|
|
|
|
|
|
|
- Jud:
- And you think a query definition, something like an SQL SELECT statement, should be a relation?
Or a normal record?
- Pla:
- No. I suspect it would comprise several normal records
- Dar:
- But not a relation, because they’ld have different formats.
You’re back there again, aren’t you Professor?
- Pla:
- Quite so. Indeed, as soon as we start to consider what they call metadata”, data about data,
whether queries or data definitions or constraints,
we’ll find that we can hold all these metadata components all the constraints say in normal records,
but not in normal records of common format.
- Dar:
- But is that important?
- Pla:
- Well I think it’s terribly important.
You see, we want to claim: this method relational or whatever is good for all the data
we might want to hold.
Of course, there may be limits on manipulations, because we insist on having a really simple
DML.
But there shouldn’t be limits on content.
If we have to say, “Here’s a fact we can’t record”,
then we open the door to someone else coming along with a better theory.
- And we do rather shoot ourselves in the foot if we can’t even record the sort of facts we actually utter
about our records.
It’s a bit like standing up on a soap box and saying,
Everything we want to say can be formulated without polysyllables; we can avoid sesquipedalianism.”
And then when someone asks us to put that in short words, we just shrug in a rather embarrassed way.
|
|
|
|
|
|
|
|
- Dar:
- But don’t the mathematicians tell us that using a language to talk about that language itself can lead
to contradictions?
- Jud:
- Darren, what are you going on about?
- Pla:
- What he’s going on about is, for instance, heterologous epithets.
Let’s use English to talk about English epithets.
Some epithets, such as “short” and indeed “English” are true of themselves:
“short” is a short word, and “English” is an English word.
Let’s call them homologous.
And the ones that aren’t homologous, like “long” and “German”, are heterologous.
- And then we ask whether “heterologous” is heterologous.
If it is, then it’s true of itself, so it’s not heterologous but homologous.
If it isn’t true of itself, then it is heterologous, so it is true of itself. We get a contradiction.
- Dar:
- Right right; and we mustn’t risk that, so we shouldn’t use our language as our meta-language,
or by extension our data as our meta-data.
- Pla:
- On the contrary! Suppose we formulate a theory,
like the theory that there are these epithets “homologous” and “heterologous”,
with the meanings we gave to them, and all epithets are either homologous or heterologous.
We see what happens. With any luck, we find a contradiction, and that shows us our theory must be false.
- Jud:
- That’s a bit like Russell’s paradox, isn’t it?
- Pla:
- Exactly! And one of the things the set-theoreticians did was resort to
this meta-language nonsense to stop the paradox arising, as they’ld say.
- Dar:
- But what’s wrong with that?
- Pla:
- What’s wrong with painting over cracks?
We have a chance to advance; we have a risky theory, a theory that goes out on a limb.
If it fails, like set theory failed, we’ve made an advance: we’ve marked one path “No Through Road”.
- We really ought to have these risky theories:
if they stand up to self-application, they might even be true.
Sir Karl Popper says a similar thing about scientific theories: they should be refutable, empirically.
We can’t get empirical refutability in logical theories, so we certainly shouldn’t avoid the next best thing.
- Dar:
- But Popper accepts Tarski’s theory of truth,
and that needs an infinite hierarchy of language, meta-language, meta-meta-Ianguage, ...
- Pla:
- Well, no-one’s perfectly consistent.
|
|
|
|
|
|
|
|
- Jud:
- Darren, did you know you’re not registered on Professor Platoclast’s course.
- Dar:
- Yes. I’m in the Philosophy department, and the course isn’t on our syllabus.
- Jud:
- How strange; it’s an optional course on our syllabus in Women’s Studies.
- Dar:
- Who on earth got Data Base Theory accepted as an optional in Women’s Studies?
- Pla:
- Let’s not get sidetracked here, Darren, dear boy. Why do you bother to attend, and even do some homework?
- Dar:
- For edification and delight.
- Pla:
- I’ll ignore the hedonism; have you learnt anything?
- Dar:
- I’ve learnt how much you’ve taken from Nelson Goodman, without acknowledgement.
Beside all that relative identity stuff you got from Peter Geach. They’re pretty strange bedfellows.
- Pla:
- Well, thanks. I guess that brings us to the top of the hour, as they say.
See you tomorrow....
Bye, Darren....
- Yes Douglas....
Goodness me! You jest, surely?...
What can I say? Does she really prefer to be called “Judith”?... How extraordinary!
|
|
|
|
|
On Nested Records
|
|
|
|
|
(This informal exposition followed a Graduate Seminar
at which the presenter had described her researches on nested relations,
i.e. relations whose values could be sets or, specifically, relations.
In the questions after the seminar, Professor Platoclast had rather savagely shown how,
by retaining normal records, but abandoning uniform format,
he could, arguably more simply,
specify all the queries claimed to be facilitated or enabled by using nested relations.
The presenter had, under this pressure,
been driven to the point of admitting that we had either to abandon first normal form,
as she preferred, or to abandon uniform format, as Platoclast wanted.)
Lo, my friends, she delivered herself into my hands.
You would, I think, be hard put to construct nested relations upon the theory that I have proposed to you:
so we can have, as I claim, normal records in non-rectangular tables, without having nested relations.
The question is: can we have nested relations without having non-rectangular tables?
I mean: can I construct my tables on hers, without using any extra constructs? And I think I can.
Firstly, I won’t even demand nested relations, Ill accept merely set-valued columns.
Secondly, Ill accept a constraint proposed by Professor Quine that any object is identified with its unit set,
that is, with the set that contains merely that object,
but only to the extent of avoiding ambiguous
cardinality.[1]
This means that if the object in question is a set of cardinality other than 1,
then we don’t identify it with the set that contains it:
so {p} is identified with {{p}} but {p,q} is not identified with {{p,q}},
because we would have ambiguous cardinality.
So, if p isn’t a set, or p is a unit set, then we can identify it with {p}; otherwise we can’t.
Thirdly, Ill confine myself to non-composite sets: sets with cardinality less than 2.
I think you’ll see that all I’m left with are non-sets, say p, each of. which is identified with its unit set, {p};
the empty set, {}; and the set that contains just the empty set, {{}}.
And all my values can now be sets.
In effect, wherever I would normally have a domain of non-sets say characters
now Ill have a domain of sets of characters.
It will contain all the unit sets of characters that is, all the characters and the empty set.Optionally, we could extend such a domain to include the set that contains the empty set.
But what does this give me?
It gives me-in every domain the empty set, which I will use as a pad.
So I can now get non-uniform rows in my tables, merely by padding.
And it gives me, optionally, the set containing the empty set, which I will use as my null value.
But that’s all I need!
So it’s true, as our friend admitted, that we must have either nested relations or non-uniform format.
But what she didn’t see (and even I didn’t see it in time) was that a tiny,
highly constrained part of the facilities needed for nested relations
merely non-composite sets was quite enough to give us non-uniform format.
We have to conclude therefore that we do need non-uniform format,
but in that case we don’t need the horrid complexity of nested relations.
|
|
|
|
SITE HOME PAGE
|
What?...
Yes, of course, using only non-composite sets isn’t really using set theory at all.
You didn’t imagine I’ld sink to that, did you?
|
|
THE DATABASE PAGE
|
|
THE DATABASE PAPERS
|
|
|
Preface & Contents
|
|
|
DOWNLOAD
|
Download
An Informal Symposium (rtf, Word for Windows compatible)
|
|
Platoclast on Data: Lecture XXIX
|
|
|
Copyright © 1993, 2001 Adrian Larner.
The author asserts all moral rights.
|