|
(U01) |
www.btinternet.com/adrian.larner/database/newerm1 |
|
A New Foundation for the ER Model A database paper by Adrian Larner |
||
|
|
||
Abstract |
|
|
|
|
A formal theory, based on first order logic, is proposed as an interpretive foundation for the Entity-Relationship (ER) model. Under the proposed interpretation, records are construed as existence assertions in which the kinds of entities asserted to exist are specified and/or their relationship is stated; in addition, a number of identities (one defining each attribute) are asserted of those entities. This theory allows an interpretation of derived records to be formally concluded from base records according to logical inference rules associated with data manipulation operations. In contrast with the Relational interpretation theory (the Domain Relational Calculus), the derivations are safe (false inferences join traps are avoided). The theory provides clarifications and explanations in a number of areas of ER theory, including entity and attribute definitions; meta-theoretical definitions, including those of entity and attribute; higher normalisation; entity dependence (referential integrity); and the teaching of ER analysis of data. |
|
|
|
||
|
|
MOTIVATION |
|
|
Technically, the second places in the predicates, ... represents ... and ... is interested in ..., are termed intentional (pertaining to wanting, aiming, etc.) Formalisation of intentional predicates is ill-understood. (All predicates of the First Order Logic are non-intentional). See PT Geach, Logic Matters, Blackwell 1972, Section 4 Intentionality. The definitions of entity and attribute and the informal explanations of representation can be found in most standard texts on the ER Model. Some go back to PP Chens seminal The Entity-Relationship Model Towards a Unified View of Data in ACM TODS 1, No 1, March 1976. |
A new, formal, foundation for the Entity/(Attribute/)Relationship (ER) Model is proposed. In general, it is difficult to justify the need for a new foundation of a well-established model: can there be anything seriously wrong with the current foundation? It is perhaps even more difficult to demonstrate the value of a proposed foundation, for the payoff of such a theory is not simple and single, but complex and various. What is wrong with the ER model? In practical terms, not much. Although many of us analyse and design for Relational database management systems, we use the ER model for our analysis, and for our interpretations of the relational structures: tables, rows, and columns. It works very well. But some of us have serious doubts, not of what we are doing, but of what we say about what we are doing, and especially of what we say when we are teaching how to do it. We are rather like those extraordinary teachers of singing. They are, no doubt, good singers; and more than competent teachers. But listen to them when they are teaching: When you sing; sing from the forehead. When you breathe; breathe from the hips. This may be effective advice; given their results it presumably is effective. But as a theory of singing it leaves something to be desired. And we really do say equally extraordinary things. What sort of problems do we have?
|
|
|
the analyst has failed to give the interpretation if they do not produce the person ...: non-sexist singular they: to the grammatically conservative I can say only, Thou art i the right; I would not willingly offend thee. Singular they will for ever offend our sensibilities, as much as singular you (or singular we, as Her Majesty remarked to me only the other day). |
|
|
|
|
|
|
|
|
||
|
|
THE RELATIONAL INTERPRETATION |
|
|
|
Oddly enough, before we had the ER model, we had a standard form of interpretation of records. With the introduction of the Relational model came (although it was rarely made explicit) an interpretation of normal records and of their manipulations in a DML that was slightly sugared the first order predicate calculus (FOPC). The interpretation is that of the Domain Relational Calculus. A relation, in the logician’s sense, is a predicate, of one or more places; for instance:
Alas, the FOPC, the foundation of this interpretation, requires that the names inserted in the places of predicates be proper names. But “P1” is a common name (like, say, “dog” a perfectly good name, but it designates each of many objects). The system (the data, as shown) distinguishes two different things, both named “P1” (the one supplied by S1, the other by S2): so “P1” is not a proper name. We do not generally ensure that all the values we insert in databases are interpretable as proper names. So we always have potential join traps, most of which are avoided quite informally by users who understand their application domain. In the relational model, a join trap is not a misinterpretation by a user: it is the failure of a user to reject the misleading interpretation that follows from the interpretation of the model. This does not bode well for intelligent (i.e. stupider than human) front-ends. |
|
|
|
||
|
|
THE ENTITY/RELATIONSHIP INTERPRETATION |
|
|
|
A row of a table, in the ER model (assuming that relational structures, but not the interpretations, are used), is said to “represent”, or “stand for”, an entity (a thing? an existent?) in “the real world”; or (in the case of some tables) a relationship between two (or, in some versions of the ER model, two or more) entities. The columns hold attributes of the entity, or identifiers of the related entities (and, if attribute-bearing relationships are allowed, attributes of the relationship). We can understand restriction well enough: it gives us some of the entities or relationships that we started with. But what entity or relationship does projection give? What entity or relationship does Cartesian product, natural join, or composition give? Consider that projection of the Surnames of the join on Religion of PERSON with itself:
|
|
|
|
||
|
|
CRITERIA OF DEFINITION |
|
|
For the rejection of absolute identity, and the introduction of criteria of application and identity, see PT Geach, Logic Matters, Section 7 Identity Theory, and other works. I am indebted to my colleague, Nigel Roberts, for the coinage of the useful collective term, criteria of definition, and for the exploration of criteria of definition in data analysis and its teaching: see ND Roberts, The Reality Modelling Approach to Entity-Relationship Analysis in Proc Sixth ISTIP Conf, University of Hertfordshire, April 1994. |
Most neglected of all the fields on our Entity Definition forms is: Entity Description. Let us replace it by Entity Definition. What does it take to define an entity? We certainly need a criterion of application. When we know the criterion of application of a concept, e.g. “cat” or “moon” or “book”, we know when we have got one (i.e. when we have spotted a cat, or seen the moon, or found a book when we have got one and not none). A criterion of application of “book” might include talking books, but exclude graphic novels, magazines, and newspapers. Sometimes a criterion of application is obvious (but we must be careful: is a lion a “cat”?) A criterion of application decreases the vagueness of a concept; but is it enough to define the concept? Imagine a child that said “cat” every time it saw a cat (and at no other time), and likewise said “moon” when it saw the moon. We are tempted to say: the child has grasped the concepts, “cat” and “moon”. But suppose the child thinks that there is only one cat, with an extensive wardrobe of fur coats; and that there are many moons round, half-round, crescent; golden, silver, blue perhaps. So a criterion of application is not enough: we also need a criterion of identity. This tells us what counts as the same cat, or the same moon. It tells us when we have got one (i.e. one and not two); it removes, not vagueness, but ambiguity. Is my copy of “Persuasion” the same book as your copy of “Persuasion”? When we do data analysis we (formally or informally) postulate criteria of identity (they are largely undefined in natural languages). The use of these criteria of definition of application and identity shows considerable promise in the teaching and performance of ER analysis. For example, the dependence of an order line entity on its order entity (a many-to-one relationship with cascade delete) is explicable in terms of their criteria of identity. If I pick out an order line and you pick out an order line, my order line is the same order line as yours only if the order containing my order line is the same order as the order containing yours. The criterion of identity of “order line” involves, or is dependent on, the criterion of identity of “order”: it is this dependence of identity that causes the dependence of entity (of existence); hence the cascade delete but the dependence would still hold even if no deletes were allowed. In a subtype (“ISA”) relationship, the supertype and subtype entities have the same criterion of identity, but different criteria of application: to be a person is not to be a house owner; but if A is a person that owns a house, and so is B, then A is the same person as B if and only if A is the same house owner as B. |
|
|
|
||
|
|
||
|
|
Continue reading A New Foundation for the ER Model |
|
|
|
||
|
|
|
|
|
|
||
|
|
||
|
|
Download A New Foundation for the ER Model in Restricted Text Format (rtf, Word for Windows compatible) |
|
|
|
||
|
Copyright © 1994, 2001 Adrian Larner. The author asserts all moral rights. |
||
|
The decorative image of a key (cc004239.gif) used on this page was obtained from IMSI's MasterClips/MasterPhotos© Collection, 1895 Francisco Blvd East, San Rafael, CA 94901-5506, USA. |
||