(U01)

www.btinternet.com/adrian.larner/database/newerm1

A New Foundation for the ER Model

A database paper by Adrian Larner

 

Abstract

 

 

A formal theory, based on first order logic, is proposed as an interpretive foundation for the Entity-Relationship (ER) model. Under the proposed interpretation, records are construed as existence assertions in which the kinds of entities asserted to exist are specified and/or their relationship is stated; in addition, a number of identities (one defining each attribute) are asserted of those entities. This theory allows an interpretation of derived records to be formally concluded from base records according to logical inference rules associated with data manipulation operations. In contrast with the Relational interpretation theory (the Domain Relational Calculus), the derivations are safe (false inferences – join traps – are avoided). The theory provides clarifications and explanations in a number of areas of ER theory, including entity and attribute definitions; meta-theoretical definitions, including those of “entity” and “attribute”; higher normalisation; entity dependence (referential integrity); and the teaching of ER analysis of data.

 

 

MOTIVATION

Technically, the second places in the predicates, “... represents ...” and “... is interested in ...”, are termed intentional (pertaining to wanting, aiming, etc.) Formalisation of intentional predicates is ill-understood. (All predicates of the First Order Logic are non-intentional). See PT Geach, Logic Matters, Blackwell 1972, Section 4 Intentionality. The definitions of “entity” and “attribute” and the informal explanations of “representation” can be found in most standard texts on the ER Model. Some go back to PP Chen’s seminal The Entity-Relationship Model – Towards a Unified View of Data in ACM TODS 1, No 1, March 1976.

A new, formal, foundation for the Entity/(Attribute/)Relationship (ER) Model is proposed. In general, it is difficult to justify the need for a new foundation of a well-established model: can there be anything seriously wrong with the current foundation? It is perhaps even more difficult to demonstrate the value of a proposed foundation, for the payoff of such a theory is not simple and single, but complex and various.

What is wrong with the ER model? In practical terms, not much. Although many of us analyse and design for Relational database management systems, we use the ER model for our analysis, and for our interpretations of the relational structures: tables, rows, and columns. It works very well. But some of us have serious doubts, not of what we are doing, but of what we say about what we are doing, and especially of what we say when we are teaching how to do it.

We are rather like those extraordinary teachers of singing. They are, no doubt, good singers; and more than competent teachers. But listen to them when they are teaching: When you sing; sing from the forehead. When you breathe; breathe from the hips. This may be effective advice; given their results it presumably is effective. But as a theory of singing it leaves something to be desired. And we really do say equally extraordinary things.

What sort of problems do we have?

We are hard put to say what we mean by our terms of art: entity, attribute, relationship. Each record (row in a table) is to represent an entity or relationship (instance). But, in general, x can represent an E, without there being anything, y, such that x represents y, and y is an E. The noun phrase, Moonwind the Unicorn, represents a unicorn; but (alas) there is no unicorn that it represents. Likewise “the enterprise is interested in an E” does not imply that there is an E that they are interested in: they want to employ a clerk, but it does not follow that there is a clerk that they want to employ. We are tempted to say, “Not a specific clerk”, and somehow we understand that: but there are no non-specific clerks about; any attempt to employ such an indeterminate creature will be vain. Yet an entity is supposed to be an existent. Also, an attribute is intended to be, not an entity, but a “property” of an entity: but why should a record not represent such a property? why should the enterprise not be interested in such a property? why should such a property not exist? So are attributes really distinct from entities? In analysis we readily redefine entities as attributes; and relationships as entities: what are we doing (if anything) other than merely designing records?

“the analyst has failed to give the interpretation if they do not produce the person ...”: non-sexist singular “they”: – to the grammatically conservative I can say only, Thou art i’ the right; I would not willingly offend thee. Singular “they” will for ever offend our sensibilities, as much as singular “you” (or singular “we”, as Her Majesty remarked to me only the other day).

We may begin analyses with unproblematic entities: persons, houses, cars. Plausibly, a person is an existent and a person record represents a person. But we soon move on to vacancies: a vacancy record represents a vacancy, but is a vacancy an existent? Or to invoices: an invoice (record) does not represent an invoice, nor a demand for payment: it is an invoice; it is a demand for payment (and copying it from a magnetic medium to paper does not make it something different). It makes sense to say, “Is there really a person represented by this registration?” but not to say, “Is there really a demand for payment represented by this invoice?”
 
If a user asks an analyst to interpret a record (that “represents an entity”), the analyst gives an explanation of what the record means, in English (say). Yet, strictly, the analyst has failed to give the interpretation if they do not produce the person or the house (or the vacancy?) that – according to a straightforward understanding of the ER model – is represented by the record. But if the analyst can give an interpretation in words, why should we not say: it is the words that constitute the record’s interpretation? Why need we go any further, and insist on having a (more or less dubious) entity, existing in the “real world”, and represented by the record?

 

But suppose we can, ignoring all these quibbles, interpret each base record (each row in each stored table) as an entity (or a relationship). Now, using a data manipulation language (DML) we construct other records – by restriction, projection, and joins. How do we derive the interpretations of these records? How do we know what entities (or relationships) they represent? If a PERSON table has columns PERSON_ID, SURNAME, and RELIGION, and we join it to itself on equality of RELIGION value, and project the distinct SURNAME columns: what entity does each row represent? or is it a relationship, and – if so – between what entities?
Let us be brutally frank: the ER model is informal, incomplete (we have no interpretation of data manipulations), and it sounds like bad metaphysics. It needs to be formal, complete, and not metaphysical at all. Who knows whether the world consists of entities with attributes and associated by relationships? It might be a single, complex, eleven-dimensional field of energy; but we still have to design the employee data base.

 

 

THE RELATIONAL INTERPRETATION

 

Oddly enough, before we had the ER model, we had a standard form of interpretation of records. With the introduction of the Relational model came (although it was rarely made explicit) an interpretation of normal records and of their manipulations in a DML that was – slightly sugared – the first order predicate calculus (FOPC). The interpretation is that of the Domain Relational Calculus. A relation, in the logician’s sense, is a predicate, of one or more places; for instance:

... is father of ...
An example of a proposition (a true or false sentence), formed from this predicate, is:
Abraham is father of Isaac.
Such a proposition is true if the predicate is true of the things designated by the names (Abraham and Isaac, designated by “Abraham” and “Isaac”) inserted in the places of the predicate, true pairwise of them in this case, as the predicate has two places. A relation, in the mathematician’s sense (a relation in extension, a relation instance in the database sense), is the set of ordered pairs (or, more generally, n-tuples for an n place predicate) of which the predicate holds true. So, we start from the logician’s propositions, all formed from the same predicate:
Abraham is father of Isaac; Isaac is father of Jacob; Isaac is father of Esau; ...
We abstract out the predicate, giving it a name (say, FATHERHOOD); we give the places names (FATHER and PERSON); and we thereby reduce each proposition to a pair of names:
FATHERHOOD:    FATHER   PERSON
 
               Abraham  Isaac
               Isaac    Jacob
               Isaac    Esau
               ...      ...
Thus we get the relation, or table, that we know and love. And interpretation is no problem; it means: Abraham is father of Isaac, and Isaac is father of Jacob, and Isaac is father of Esau, and .... What is more, we can manipulate it, and each manipulation comes provided with its own method of interpretation (i.e. the method by which we interpret the result of the manipulation). Thus, for instance, the restriction, “FATHER = Isaac”, gives us:
... is father of ..., AND the former is Isaac.
or, using variables as placeholders,
x is father of y, AND x is Isaac.
So restriction is a conjunction (ANDing), and so is Cartesian product. We interpret the Cartesian product of FATHERHOOD (say F1) with itself (say F2) as:
x is father of y AND is father of .
Projection is interpreted as the existential quantification of all the variables (placeholders) that are not projected. So the FATHER projection of FATHERHOOD is interpreted as:
$y x is father of y
 
(In English, “... is father of someone”, or, “... is a father”. Notice that only the unbound variables – those that are not quantified – represent columns, places of the predicate.)
We can now derive the interpretation of natural join (F1.PERSON with F2.FATHER):
$ x is father of y and y = and is father of (or equivalently, x is father of y and y is father of ).
And of composition:
$y x is father of y and y is father of .
 
(In English: ... is paternal grandfather of ....)
So the Relational model gives us a firm, formal foundation: we have a general form of interpretation for all relations; we state the desired interpretation of each base relation; and we can then derive the interpretation of each view or query, according to the manipulations used in its specification. Moreover, the rules of derivation are logical implications (of the FOPC), so we can never derive a falsehood from a truth. If all our base records have true interpretations, all our derived records – the answers to our queries – will have true interpretations; which is just what we would wish (given that we can have no protection against simply false base records). Why seek for more in the way of interpretation? Because one person’s firm foundation is where another can rest their lever. Consider the relations:
SUPPLY:    SUPPLIER   PART
 
           S1         P1
           S2         P1
           S2         P2
           ...        ...
 
 
USE:    PART   PROJECT
 
        P1     J1
        P2     J1
        P1     J2
        ...    ...
These have interpretations (assume all instances true):
(SUPPLY)   x supplies y.
 
(USE)        is used in z.
We form their composition, according to the standard interpretation:
$y $ x supplies y and y = and is used in z.
 
or $y x supplies y and y is used in z.
In English, one instance is: There is something supplied by S1 and used in J1; or, S1 supplies something used in J1. But this does not follow from our base data; it could be false while the base data (shown above) was true (if none of the P1s supplied by S1 were used in J1): the classic join trap.

Alas, the FOPC, the foundation of this interpretation, requires that the names inserted in the places of predicates be proper names. But “P1” is a common name (like, say, “dog” – a perfectly good name, but it designates each of many objects). The system (the data, as shown) distinguishes two different things, both named “P1” (the one supplied by S1, the other by S2): so “P1” is not a proper name.

We do not generally ensure that all the values we insert in databases are interpretable as proper names. So we always have potential join traps, most of which are avoided – quite informally – by users who understand their application domain. In the relational model, a join trap is not a misinterpretation by a user: it is the failure of a user to reject the misleading interpretation that follows from the interpretation of the model. This does not bode well for intelligent (i.e. stupider than human) front-ends.

 

 

THE ENTITY/RELATIONSHIP INTERPRETATION

 

A row of a table, in the ER model (assuming that relational structures, but not the interpretations, are used), is said to “represent”, or “stand for”, an entity (a thing? an existent?) in “the real world”; or (in the case of some tables) a relationship between two (or, in some versions of the ER model, two or more) entities. The columns hold attributes of the entity, or identifiers of the related entities (and, if attribute-bearing relationships are allowed, attributes of the relationship).

We can understand restriction well enough: it gives us some of the entities or relationships that we started with. But what entity or relationship does projection give? What entity or relationship does Cartesian product, natural join, or composition give? Consider that projection of the Surnames of the join on Religion of PERSON with itself:

P1.SURNAME   P2.SURNAME
 
Robinson     Robinson
Smith        Smith
Smith        Thomas
Thomas       Smith
Thomas       Thomas
What does one row of this table represent? Clearly these are not person-pair entities. Smith and Smith might be one person, or two persons and one pair of persons of common faith considered in two orderings. Could they be pairs of surnames? – in that case a join can construct entities from attributes. Could the table represent a relationship? – what relationship? between what entities (persons? groups of persons of common surname?) The ER model is incomplete in interpretation: no interpretation of data manipulations is defined. So – unlike the well defined Relational interpretation – it gives us no place to rest our lever; but that is because it gives us no place to rest.

 

 

CRITERIA OF DEFINITION

For the rejection of absolute identity, and the introduction of criteria of application and identity, see PT Geach, Logic Matters, Section 7 Identity Theory, and other works.

I am indebted to my colleague, Nigel Roberts, for the coinage of the useful collective term, “criteria of definition”, and for the exploration of criteria of definition in data analysis and its teaching: see ND Roberts, “The Reality Modelling Approach to Entity-Relationship Analysis” in Proc Sixth ISTIP Conf, University of Hertfordshire, April 1994.

Most neglected of all the fields on our Entity Definition forms is: Entity Description. Let us replace it by “Entity Definition”. What does it take to define an entity? We certainly need a criterion of application. When we know the criterion of application of a concept, e.g. “cat” or “moon” or “book”, we know when we have got one (i.e. when we have spotted a cat, or seen the moon, or found a book – when we have got one and not none). A criterion of application of “book” might include talking books, but exclude graphic novels, magazines, and newspapers. Sometimes a criterion of application is obvious (but we must be careful: is a lion a “cat”?) A criterion of application decreases the vagueness of a concept; but is it enough to define the concept? Imagine a child that said “cat” every time it saw a cat (and at no other time), and likewise said “moon” when it saw the moon. We are tempted to say: the child has grasped the concepts, “cat” and “moon”. But suppose the child thinks that there is only one cat, with an extensive wardrobe of fur coats; and that there are many moons – round, half-round, crescent; golden, silver, blue perhaps.

So a criterion of application is not enough: we also need a criterion of identity. This tells us what counts as the same cat, or the same moon. It tells us when we have got one (i.e. one and not two); it removes, not vagueness, but ambiguity. Is my copy of “Persuasion” the same book as your copy of “Persuasion”? When we do data analysis we (formally or informally) postulate criteria of identity (they are largely undefined in natural languages).

The use of these criteria of definition – of application and identity – shows considerable promise in the teaching and performance of ER analysis. For example, the dependence of an order line entity on its order entity (a many-to-one relationship with cascade delete) is explicable in terms of their criteria of identity. If I pick out an order line and you pick out an order line, my order line is the same order line as yours only if the order containing my order line is the same order as the order containing yours. The criterion of identity of “order line” involves, or is dependent on, the criterion of identity of “order”: it is this dependence of identity that causes the dependence of entity (of existence); hence the cascade delete – but the dependence would still hold even if no deletes were allowed. In a subtype (“ISA”) relationship, the supertype and subtype entities have the same criterion of identity, but different criteria of application: to be a person is not to be a house owner; but if A is a person that owns a house, and so is B, then A is the same person as B if and only if A is the same house owner as B.

 

 

Return to the start of A New Foundation for the ER Model.
 

 

Continue reading A New Foundation for the ER Model
with the section on THE “EPI” INTERPRETATION
.

 

 

SITE HOME PAGE

 

 

THE DATABASE PAGE

 

THE DATABASE PAPERS

 

DOWNLOAD

Download A New Foundation for the ER Model in Restricted Text Format (rtf, Word for Windows compatible)

Another database paper ...

 

Copyright © 1994, 2001 Adrian Larner. The author asserts all moral rights.

The decorative image of a key (cc004239.gif) used on this page was obtained from IMSI's MasterClips/MasterPhotos© Collection, 1895 Francisco Blvd East, San Rafael, CA 94901-5506, USA.