|
(U01)
|
www.btinternet.com/adrian.larner/database/newerm2
|
|
A New Foundation for the ER Model
continued
A database paper by Adrian Larner
|
|
|
|
|
|
|
THE EPI INTERPRETATION
|
|
|
Here is the proposed interpretation of a row in a table.
It is a proposition of a certain form, called “EPI” for Existential quantifications (“EXISTS”),
Predicate (giving a criterion of application),
and Identities; in its most general form:
- (E)
- $x1
$x2 ...
$xm
- (P)
- P (x1, x2, ... xm)
- (I)
- & x1
=1 v1
& x2
=2 v2 ...
& xp
=p vp
There are any number (m) of existence assertions (E),
a predicate (P) asserted of the thing(s) said to exist,
(xi from x1 to xm),
and for each column (j, from 1 to p)
an identity asserted between one of the things that was said to exist
each xj
is one of the xi and the value, vj, in that column.
Notice that the value, vj, is a common name:
it names each thing that is identical (in the sense, “=j”) to anything that it names
(just as “dog” names each thing that is identical, in the sense, “is the same species as”, to anything that it names).
To take a very simple example, a PERSON record with columns PersonId, Surname, and Religion:
- $x
x is a person & x is the same person as 47273 &
x has the same surname as Robinson & x has the same faith as Buddhism.
So we say that something exists; it falls under the criterion of application of “person”, it is a person;
it is the same person as 47273, i.e. 47273 names (perhaps) a number of things
but their differences are so slight as to make us classify them under the identity, “is the same person as”;
it has the surname, “Robinson” the identity classes as “the same”
any persons that have the same surname;
and it has the faith, Buddhism the identity classes as “the same” any persons of common faith.
Notice that the only constraint placed on the column identities (=j)
is that each must be symmetric and transitive:
IF x =j y THEN y =j x
IF x =j y
& y =j z
THEN x =j z
Consequently we have an interpretation of an attribute:
an attribute is an equivalence relation (an identity),
which classifies entities (the things asserted to exist) into equivalence classes;
and a value of that attribute is a common name that designates each entity in one of those equivalence classes.
One consequence of this interpretation is that an attribute is defined by giving a criterion of identity
(so now we know what to do about Attribute Description on our data definition forms).
This is intuitive enough:
would we wish to say that the attribute, “Religion”,
was well-defined if the analyst could not explain what it meant
to join two PERSON records on equality of Religion value?
|
|
|
|
|
RELATIONSHIPS AND ENTITY DEPENDENCY
|
|
I am indebted to my former colleague in IBM United Kingdom Ltd,
William Reynolds, for drawing my attention to the importance of dependency links,
which the EPI attempts to explain.
I am also indebted to William for his definition of entity:
a thing, in the sense of an object of thought.
Following this definition, we are led away from reality (metaphysics) to human thought,
or more manageably human language. (Spoken communications)
|
An entity like PERSON, where the EPI interpretation uses a single quantified variable,
is (usually) an independent entity.
On an ER diagram with the many-to-one dependency links running down the page from
“parent” (one) to “child” (many) entities (the so-called “live crows” convention),
such an independent entity has no parents.
Suppose that both PERSON and HOUSE are independent entities.
An EPI interpretation of HOUSE is left to the reader; we assume here a key of HouseId.
Suppose we want a table representing the relationship of (possibly shared) current ownership:
- $x $y
x owns y & x is the same person as 47273
& y is the same house as H2371
- where “H2371” is a HouseId value.
And we have a neat distinction between an independent entity (or subtype
of such an entity), with a single quantified variable;
and a relationship, with two or more such variables in the predicate (“... owns ...”).
But suppose that we wish to show period of ownership, not just current ownership:
of what is “period” an attribute (a classification)?
Clearly of neither person (x) nor house (y). We introduce a third variable:
- $w
$x
$y
w is an ownership by x of y & x is the same person as 47273 &
y is the same house as H2371 &
w is of the same period as 19 Aug 1974 to 30 Dec 1985
This gives us a dependent OWNERSHIP entity,
with parents PERSON and HOUSE.
It formalises that well-known move in data analysis:
the resolution of a (typically many-to-many) relationship into an entity,
with two one-to-many links, and we can see why they are links (relationships that do not bear attributes).
|
|
|
|
|
DATA MANIPULATIONS
|
|
|
Now we must provide an interpretation of the operations of our DML.
Again, we treat restrictions and Cartesian products as conjunctions.
A restriction is a predicate, say:
- Q(u1,
u2, ...
uq )
We simply conjoin it with the original predicate, thus:
- (E)
- $x1
$x2 ...
$xm
- (P)
- P (x1, x2, ... xm)
& Q(u1,
u2, ...
uq )
- (I)
- & x1
=1 v1
& x2
=2 v2 ...
& xp
=p vp
Each uk
is one of the xi in a language like SQL
it would be designated by the column name, say column j
and is accordingly the same xi as
xj.
Now consider the Cartesian product with a record interpreted as:
- (E)
- $y1
$y2 ...
$yn
- (P)
- Q (y1, y2, ... yn)
- (I)
- & u1
=m+1 vm+1
& u2
=m+2 vm+2 ...
& uq
=m+q vm+q
Again, each of the uk
is one of the yh.
(The column subscripts run from m+1 to m+q to avoid conflict
with those of the other relation.)
We simply conjoin the interpretations, and shuffle the E,
P,
and I components to restore the standard sequence:
- (E)
- $x1
$x2 ...
$xm
$y1
$y2 ...
$yn
- (P)
- P (x1, x2, ... xm)
& Q (y1, y2, ... yn)
- (I)
- & x1
=1 v1
& x2
=2 v2 ...
& xp
=p vp
& u1
=m+1 vm+1
& u2
=m+2 vm+2 ...
& uq
=m+q vm+q
Projection has quite a different interpretation,
but it is still an implication of the FOPC:
we simply drop the identities associated with the columns that are not projected.
Thus, for example, the projection of the first g columns of the table with the
general interpretation (above) gives us:
- (E)
- $x1
$x2 ...
$xm
- (P)
- P (x1, x2, ... xm)
- (I)
- & x1
=1 v1
& x2
=2 v2 ...
& xg
=g vg
We have merely dropped:
- & xg+1
=g+1 vg+1
& xg+2
=g+2 vg+2 ...
& xp
=p vp
|
|
|
|
|
COMPOSITIONS AND JOIN TRAPS
|
|
|
We can now show how natural join and composition are interpreted.
Take the SUPPLY and USE
records, with the EPI interpretations:
- (SUPPLY)
$x
$y x supplies y
&
x is the same supplier as S1 & y is the same part as P1
- (USE)
$y´
$z y´ is used in z
&
y´ is the same part as P1 & z is the same project as J1
In the restriction, that SUPPLY.Part = USE.Part,
obviously, we express SUPPLY.Part as y,
and USE.Part as y´ .
But most importantly we interpret the equality, =,
not as absolute identity but as is the same part as:
the (criterion of) identity that defines the Part columns.
Applying Cartesian product and this restriction, we get the equijoin:
- $x
$y
$y´
$z
x supplies y & y´ is used in z & y is the same part as y´
&
x is the same supplier as S1 & y is the same part as P1 &
y´ is the same part as P1 & z is the same project as J1
To obtain the natural join we use projection to drop one Part column,
i.e. to drop either
y is the same part as P1 or
y´ is the same part as P1.
To obtain the composition we drop both Part columns, thus:
- $x
$y
$y´
$z
x supplies y & y´ is used in z & y is the same part as y´
&
x is the same supplier as S1 & z is the same project as J1
- In English: S1 supplies something that is the same part as something used in J1.
But we avoid the conclusion that S1 supplies something that is used in J1
|
|
|
|
|
EVALUATION
|
|
|
There can, of course, be no knock-down argument to demonstrate the value of the EPI interpretation,
or of any other foundation.
What we are looking for are a variety of gains throughout the ER model:
- The EPI interpretation is formal, as is that of Relational DMLs;
indeed it is the very same logic, the FOPC,
but somewhat differently applied.
In particular, it avoids the use of absolute identity,
and this enables values to be interpreted as common rather than as proper names.
Consequently, although it covers data manipulations,
and therefore derived records as well as base records, it does not give rise to join traps.
- A method of defining (rather than, more vaguely, “describing”) entities is provided:
criteria of definition, i.e. of application and identity.
Attributes also are defined using a criterion of identity.
The links (non-attribute bearing relationships)
between entities are describable in terms of the criteria of definition of the connected entities.
- We can say precisely what we mean by a record “representing” an entity (instance):
the interpretation of the record is a proposition asserting (1) that something exists
(hence “entity” an existent),
(2) that it is a certain sort of thing (falls under a given criterion of application),
and (3) that it meets certain criteria of identity (one for each attribute).
- Notice that if an attribute or attributes form a candidate key of a record,
then its criterion of identity (or the conjunction of their criteria of identity)
is the criterion of identity of the entity represented by the record.
The foundation of higher normalisation theory, from 2nd to Boyce/Codd normal form is determinacy.
It will be appreciated that attribute c determines attribute c´ under this condition:
for all x, x´: IF x =c x´
THEN x =c´ x´
- We can distinguish between independent entities (and their subtypes) on the one hand,
and relationships or dependent entities on the other,
because the interpretations of the former use only one existentially quantified variable.
We have a formal interpretation of the resolution of many-to-many relationships into entities:
the introduction of an additional variable to which are applied
attribute identities that do not hold of any of the originally related entities.
- Metaphysics and metaphysical correctness are replaced by logic and logical correctness:
instead of asking “What is an entity?”, or “Are there such things in reality?”,
we ask “What is asserted to exist, with what criteria of definition?”
Whether the thing asserted to exist is a record-keeping entity
(like an invoice, where the thing asserted to exist by the record is the record itself),
or a real world entity (like a person), is no part of the formal theory.
This avoids any problems with dubious entities
(be they vacancies, intimations or mortality, or anything else):
once provided with criteria of definition, logical propriety is assured.
- Finally, we can even use the EPI approach to distinguish “types” from “instances”
without resort to the complexities of set theory.
A person instance is something that falls under the criterion of application, “... is a person”,
and has the criterion of identity, “... is the same person as something” (or even, “as itself”).
The person type is something that falls under the same criterion of application, but has the criterion of identity,
“... is a person, and so is ...”:
the type is an entity with a very coarse classification
(a single equivalence class into which each person falls).
-
|
|
|
|
|
CONCLUSION
|
|
|
The ER model has proved extremely useful in practice,
especially in the analysis of data, the design of databases, and the informal interpretation of records.
The interpretation proposed here (EPI)
is that a record be interpreted as a proposition asserting the existence of one or more things (entities),
a predicate giving their criterion of application and/or their relationship,
and a number of identities (attributes) that hold of them.
This interpretation is formal (it is expressed in the first order logic),
and it enables the interpretations of derived records to be found effectively and safely (without join traps).
The EPI interpretation appears promising as an aid to understanding and teaching the analysis of data;
it clarifies and systematises the definitions of entities and attributes;
it provides explanations of the terms of art of the ER model (entity, attribute, relationship);
and it gives insight into the nature and importance of higher normalisation.
|
|
|
|
|
Return to the start of A New Foundation for the ER Model.
|
|
|
Reread this section of A New Foundation for the ER Model.
|
|
|
|
|
SITE HOME PAGE
|
|
|
|
THE DATABASE PAGE
|
|
|
THE DATABASE PAPERS
|
|
|
DOWNLOAD
|
Download
A New Foundation for the ER Model
in Restricted Text Format (rtf, Word for Windows compatible)
|
|
Another database paper ...
|
|
|
Copyright © 1994, 2001 Adrian Larner.
The author asserts all moral rights.
|
|
The decorative image of a key
(cc004239.gif) used on this page was obtained from IMSI's
MasterClips/MasterPhotos© Collection, 1895 Francisco Blvd East, San Rafael,
CA 94901-5506, USA.
|