www.btinternet.com/~adrian.larner/review/date6

An Introduction to Database Systems, Sixth Edition
CJ Date

Addison-Wesley, 1995, 839pp, softbound, £24.95

ISBN 0-201-82458-2

An Extended Review by Adrian Larner for the Computer Journal

A DATABASE BOOK REVIEW

 

“But I thought Date was the BIBLE!” – Remark of a final year student on hearing a criticism of the fifth edition.

Not the Bible, but Date’s Introduction to Database Systems is up there with the Bible in the best-seller lists. And thank goodness not the Bible: it is hard enough to handle one edition of the Sermon on the Mount, but a new edition of Date’s text appears every four years or so, and it changes – at times radically – to reflect new technology and new fashion in the database world (and, particularly in this new edition, to record changes in Date’s thinking). Some of us still treasure the third edition, with over a hundred pages devoted to IBM’s Information Management System: perhaps the best exposition of it ever written, and surprisingly eirenic. But Date has something new to be eirenic about.

For the reader that is not already acquainted with an earlier edition, let this suffice: Buy it; read, mark, learn, and inwardly digest it; for the aspiring writer, strive to write half so well and fear no reviewer. This review looks at some of the major changes in the sixth edition. The chapters on the relational model have been completely re-written, and brought forward for study before SQL: “the gulf between [the two] has grown so wide that ... it would be ... misleading to treat SQL first.... I would have preferred to relegate [SQL] to an appendix”. The pre-relational systems, themselves long relegated to appendices, have now been dropped entirely. There is a completely new treatment of domains. The new Part VI, of about 90 pages, is on object-oriented systems. There’s the rub.

VARIABLES: We may briefly characterise a variable, in the sense of a program variable as a designator and a value. The designator is commonly a name, but can be an address such as a pointer, or an object id. It is the designator that determines the identity of the variable, in the sense that if x is a variable and y is a variable, and (within the same scope) x and y differ only in value, so they have the same designator, then x is the same variable as y. Functional programmers do without variables. Imperative programmers use them but, if they are good imperative programmers, do not take them very seriously. The typical statement in an imperative program without side effects is a function from one program state (“working storage”) to another program state of the same format – for simplicity say a tuple of values. The variable names are merely indices into the tuples, so an assignment statement like “J := J+1”, superstitiously shunned by the functional programmer, is merely a function mapping one program state to another, the second differing from the first only in that the J-value of the second is one more than the J-value of the first. Or, as we usually and informally say, retreating from our talk of variables (which is merely a façon de parler), “the new J-value is the old J-value plus one”. But object-oriented programmers take variables very seriously indeed. That is what an object is; it is a variable with an object id for designator.

A moment’s reflection will show that the relational model of data, long championed by Date, is almost entirely on the functional, rather than the object-oriented side. A row (tuple) in a relation is exclusively content addressible; it is not permitted to have anything like a designator (no record sequence number, no address). This is what excludes duplicate rows and row ordering, and makes update of a row exactly equivalent to deletion of one row and insertion of another. The model allows even a candidate key (unique identifier forming part of a row) to be updated.

Within the scope of a single row, an attribute (column) name can serve as a designator of a “field”, because the model imposes first normal form (in the first sense – we will come to the second sense later): no row may contain two values of the same attribute. But attributes are better regarded as more like data types. By contrast, relations (tables) – at least base relations (those in the database, in contrast to mere views) – do have names. And these names are essential (in the sense usefully described in the fifth edition, now alas omitted): they convey information. Consequently a relational database can contain duplicate rows, as long as they appear in different relations. Indeed, attaching a name, such as “REQUEST”, to each of many records is the sole function of the relation construct (of the REQUEST relation). Thus we might very easily have avoided essential relation names entirely by taking each row in each base relation and somehow tagging it with its “relation name”, e.g. appending to each REQUEST row a field of a new attribute, REQUEST_FORM#, and with a unique value for each row. This would have given us a database comprising base rows, but no base relations, and no essential relation names. A relation like REQUEST could easily be specified: it would be merely the collection of base rows that contain a REQUEST_FORM#, with that attribute projected away. So we could still use the name “REQUEST” (e.g. in an SQL FROM clause) to designate each of many rows (each request record) so achieving “file-at-a-time” processing; but the name, “REQUEST” would no longer be essential.

However, as things stand, the relational model includes both relations (as structured values) and relation variables having relations as values. And Date now makes this explicit. But is it wise? If it were merely a matter of exposition, of explaining the relational model, it would perhaps be justifiable. But Date shows clearly – in this edition more than in any of its predecessors – that his intent is not merely to expound but to defend and, where necessary, amend the relational model; to some extent, albeit not entirely, in response to the challenge of object-orientation.

But if the model is to be amended, and specifically moved along “the variable dimension”, between the functional end and the object-oriented end, we need to consider both possible directions of movement. We cannot be sure that the right response to object-orientation is compromise, admitting more sorts of variables: but that is the way Date is going. “Objects ... both mutable (variables) and immutable (values), are clearly essential.” However, “Object IDs [are] unnecessary, and ... undesirable ...” It is not clear how we can have the “clearly essential” variables without object ids: what other designator should we use?

Date is looking for a rapprochement between relational and object-oriented. To this end he equates object classes with domains, so – accordingly, we might expect – objects with values. But this leaves no scope for variables, i.e. mutable objects. In discussing rows (tuples) he rather mysteriously refers to tuple assignments, which seems to imply a need for tuple variables. These assignments, he says, “are performed (implicitly) during INSERT and UPDATE operations.”

While Date’s answer to the question, “What sorts of variable should we have in a database?” is unsatisfactory, and uncharacteristically unclear, at least he has highlighted the question. The correct answer to the question must be found if we are either to decide, or to effect a rapprochement, between the relational and object-oriented approaches. At one (“functional”) extreme we have “the database” as sole variable, comprising merely base records (as above, with “FORM#” attributes to “show what relation they are in”); at (or almost at) the other we have a large collection of variables, or “mutable objects”, aggregated (it seems) into other variables in indefinitely various ways. The traditional relational model is very close to the functional extreme, admitting only relation variables (Codd's “time-varying relations”).

DOMAINS: A domain, Date now boldly asserts, is a data type. It should be an encapsulated abstract data type. Indeed it should: for too long the notion of (relational) “domain” has been left unclear, and many current relational implementations serve us very badly in this respect, having a closed set of non-encapsulated “data types” (in the old-fashioned sense of “data representations”, distinguishing for instance binary from decimal numbers). If the influence of object-orientation on the relational model, and relational implementations, is no more than to establish proper data type encapsulation, it will have served us well. “Encapsulation implies data independence”, says Date. He is too kind: it implies only data representation independence, largely ignored in the relational approach.

However, the relational model imposes a constraint on values that is additional to the constraints of abstract data typing: they must be atomic. Many abstract data types are, of course, structures (non-atomic). This constraint is first normal form (1NF), in the second sense. The connection between the senses is that a row in second sense 1NF but not in first sense 1NF may be simply converted to be in first sense 1NF but not in second sense 1NF by replacing each attribute, A, by A-SET, the sole value of A-SET being the set of the values of A.

The case for atomicity (second sense 1NF) is well known, and clearly made by Date. If some values are themselves structures of values then we have (at least) two kinds of atomic values: those that are and those that are not components of other values. And for each of these kinds we need a distinct kind of INSERT, of UPDATE, and of DELETE; and we are obliged to make specification decisions about how we hold each value (i.e. as, or not as, a component of some other value). If all values are atomic, we have no such specification decision to make (and get wrong); and a far simpler data manipulation language. And yet Date, here greatly influenced by his collaborator, Hugh Darwen, now admits non-atomic values in relations.

Strangely, he talks of values as “scalars”, which seems to imply atomicity. Indeed – here closely following Darwen – he actually claims that they are atomic, and that their atomicity is achieved by encapsulation. Values, he says, “can be as complex as we like.... The only requirement is that any internal structure [they] might possess must be invisible to the DBMS ...” Not so. The argument given above – Date’s own argument – turns on simplicity of specification and of update. If we have a value in a structure that is “invisible to the DBMS” then it will need functions for insertion, update, and deletion that are themselves not part of the DBMS.

Date, quite properly, wants to retain the constraint of atomicity, but he wants to avoid any concept of absolute atomicity, and he cannot see how to formalise atomicity except in terms of “structure invisible to the DBMS”. But he has ended up by abandoning atomicity. He appeals to a paper of Darwen’s (Date’s reference 19.3), but that paper, although claiming to treat of encapsulated relations as values, actually describes an arguably rather elegant treatment of non-encapsulated relations as values. Indeed, the first two examples of the proposed data language constructs show such a structured value being composed, and supposedly encapsulated, and then that “encapsulated” relation being tested for equality with another, perfectly ordinary, relation.

Although perhaps not formalisable, the concept of (non-absolute) atomicity is clear enough. A value is atomic if it has no (putative) component that can be inserted, updated, or deleted while leaving all its other (putative) components unchanged. Relational data is, notoriously, flat. The atomicity requirement is, in effect, that anything to be changed lies on the surface.

Where has Date gone wrong? He has, perhaps, misunderstood what encapsulation hides. It hides implementation (internal representation), and nothing else. Specifically, it does not hide structure; except, trivially, structure used merely for implementation. Looking at (for instance) the specification – the public part – of an abstract data type like STACK or QUEUE, we see quite clearly that these types are structures. The separation of public specification and private implementation makes the former even more clearly an exhibition of the structure of the defined data type. It is lack of encapsulation that hides structure: without encapsulation one cannot tell whether a data type is a structure (say, a pair of reals) or an atom (say, a complex number merely implemented as a pair of reals). But as soon as we hide the implementations, and publish the specifications, it is clear that COMPLEX_NUMBER is atomic, and PAIR OF REAL is not. (The latter clearly has exactly two components, one of which can be changed, leaving the other unaffected. The former either has no component, so is trivially atomic; or its real part, imaginary part, modulus, and angle have equal claims to componenthood: but no one of them can be changed independently of all the others; so it is atomic, as defined above.)

Date’s proposed “CREATE DOMAIN” permits only one implementation for each domain (each data type). In designing a data language we might, for purely pragmatic (but perhaps not very practical!) reasons, apply such a constraint. But for purposes of understanding it is important to remember that the relationship between data types and implementations is many-to-many. This will avoid any temptation to such moves as “domain check override” or certain coercions: given a comparison such as “A = B”, where A and B are of different data types, there can be no possibility of interpreting as “the implemented form of A is the same as the implemented form of B”: if a data type can have more than one implementation then there is, in general, nothing that is the implemented form of A or of B.

When discussing relation-valued attributes, Date speaks of “values that are relations”. What does he mean by this? As these values are atomic, even by Date’s lights, they are not specified as relations. So presumably they are “implemented as relations”, and on this his argument must turn. But, that something is a relation is entirely a question of specification. Something specified as a relation may be implemented in many different ways: no one of those ways is paradigmatically “implementation as a relation”. We used to be able to depend on relational bigots, if on no-one else, to distinguish clearly between specification and implementation. That was one of the principal distinctions between them and the proponents of the pre-relational systems; that was the foundation on which they understood data independence issues. The object-oriented, with a few praiseworthy exceptions, know nothing of data independence (other than representation independence), and are frequently confused between specification and implementation. But we would not have expected their baleful confusions to be catching.

INTERPRETATION: What may be the most significant change in the sixth edition is a new interpretation of rows in relations (not absolutely new: it was probably Codd’s originally intended interpretation). Instead of a row in a relation “representing an entity” of the entity type represented by the relation, it is now interpreted as a proposition of the first order logic. The proposition is formed by inserting the values in the row (or, perhaps, names of the values) into a predicate represented by the relation containing the row. This allows interpretation of data manipulations (which the “entity” interpretation did not), in the sense that given some base relations of known interpretation, and a view formed from them by one or more manipulations (joins, restrictions, projections, etc), we can now formally derive the interpretation of rows in the view.

However: (1) Date includes integrity (e.g. uniqueness) constraints in the predicate, so any change in “business rules” may require re-interpretation of an enterprise’s data. (2) The interpretations of most of the data manipulations are left “as an exercise for the reader”, and not such an easy exercise when the row interpretations include integrity constraints. (3) The proposed interpretation results in connection traps becoming not mistakes in informal reasoning but formally provable (so the new interpretation is unsound, without some very messy adjustments): your reviewer leaves this demonstration as an exercise for the reader. (4) Once we drop the “entity” interpretation, the (always dubious) metaphysical foundation for the entity integrity rule disappears. Yet Date has made a step forward; these are problems to be solved, not cause for retreat.

NULLS: Date describes and rejects the many-valued logic interpretation of nulls. Good; it was always obviously unsound. He does not propound his own “default values” approach, but he still deprecates the use of the expression, “null value”. This is strange because his “default” values are not, in the usual sense, defaults. They are – there seems little more that we can say of them – nulls. One of the advantages of abandoning nulls as “marks”, i.e. not values but something that marks where a value might have been but is not, is that we get back to what Date used to regard as “fundamental”: that all information in a relational database is conveyed by values in the attributes of rows (except, as we have seen, for the information conveyed by attachment of a relation name, via the relation, to each of its rows).

 

 

SUMMARY: If you want to regard Date as a divinely inspired author of scripture, best stick with the fifth edition. But Date does not want to be regarded as divinely inspired. As he unashamedly states (quoting Lord Russell to boot), he has changed his mind. No doubt he will change it again. What you think of the sixth edition depends on what you want from a textbook: if you are prepared (as your reviewer is) to sacrifice a little dependability and finish for the pleasures of the intellectual chase, with its inevitable stumbles, this edition is for you. And the seventh is going to be fun.

 

SITE HOME PAGE

 

THE DB REVIEWS

Another DB review ...

 

Copyright © 1996, 2001 Adrian Larner. The author asserts all moral rights.

The decorative image of a key (cc004239.gif) used on this page was obtained from IMSI's MasterClips/MasterPhotos© Collection, 1895 Francisco Blvd East, San Rafael, CA 94901-5506, USA.