(U01)

www.btinternet.com/adrian.larner/database/ddnota1

Notations for Data Diagrams

A data analysis paper by Adrian Larner

 

Firstly ...

 

 

... a word on notations in general: if you have a notation to use, whether to read or to write, you must make sure that you understand it. To understand it is to be able to translate it into English if reading it, or to translate from English into the notation if you are writing it.

 

Entities

 

 

Let us begin with a symbol used in most data diagrams, the rectangle, usually in “landscape” orientation (long sides horizontal), and named:

What does it mean? (Ignore the fancy edge: that’s web page designer’s licence to prettify.) It represents a kind of record, a person record. When we say that it represents a kind of record, we mean that it represents each record of that kind. In much the same way, the name “person” designates (that is, names) each thing of a certain kind in the real world.

There is, of course, a connection between the real world object, or entity (a person), and a person record in a file or database. If we examined a person record we might find that it comprised, perhaps, these fields:

PersonNumber:  89280
Surname:       Smith
GivenName:     William
BirthDate:     7th March 1950
Notice that a field is like a question and an answer (“What is the Surname of this entity?” – “Smith”). So the whole record is like a filled-in questionnaire: it is a filled-in form. But what does it mean? It means this rather rigidly structured English sentence:
There is something that is the person 89280, has the surname “Smith”, has the given name “William”, and has the date of birth 7th March 1950.
This sentence gives three kinds of information:
Something exists (“There is something”)
 
That thing is a person
 
That thing is classified in four ways: it is in (1) the class of things that are the same person as person 89280 (a class that contains only one person); (2) the class of things that have the same surname as everything surnamed “Smith” (a class that contains lots of people – all the Smiths); (3) the class of things that have the same given name as everything named “William” (all the Williams); and (4) the class of things born on 7th March 1950.
The word, “entity” means “something that exists” (in the real world). What we call an “entity” in a database is a sentence (a record like the one shown) that says that something exists. We call this sort of record a person record because what is says to exist it also says to be exactly one person. The record tells us about four attributes of the thing it says to exists, that is, four ways to classify that thing: by person, by surname, by given name, and by date of birth; and it says which of each of those classes it belongs to: 89280, “Smith”, “William”, and 7th March 1950.

We can now say that “person record”, and therefore the rectangle in the diagram, represents each sentence recorded by the database owner, having this form:

There is something that is the person ..., has the surname ..., has the given name ..., and has the date of birth ....
The database owner is, of course, the business – the enterprise – whose database it is, and who are responsible for entering data into it. Notice that we should therefore think of a “real world” entity as something said to exist by the enterprise. This is why it is often said that a database represents an enterprise’s view of the world.

Sometimes we think of the person rectangle in the diagram as representing the file, or table, that contains the person records. Strictly, this is a mistake: it does not matter whether we keep all the person records in one file of their own (as we usually do), or in two or more files, or perhaps in a single file or database that holds all kinds of record. But, of course, if we keep each kind of record in its own file then there will be one file (or table) for each rectangle in our data diagram. But remember: we write “person” in the rectangle because it represents a single record (that is, each person record, any person record), but not all the person records together: so we do not write “persons” in the rectangle. The general rule is that an entity name must be the name of a thing, singular not plural, and not the name of a stuff. Therefore “person” and not “persons” and not “people”; “employee” or “staff member” and not “staff” and not “employees”. Here is a simple test of a good name, say “such-and-such”: does it make sense to ask, “How many such-and-suches are there?” So “How many persons ...?” or “How many employees ...?”, but not “How many peoples ...?” or “How many staffs ...?” or “How many employeeses ...?”

Actually, it is useful not only to check if that question makes sense, but also to answer it, using one of the only three numbers known to systems analysts (we analysts are a simple folk): none (0, zero), one (1), or more-than-one (“m”, that is, 2 or 3 or 4 or...) If the answer to “How many such-and-suches ...?” is “0” then we won’t need a such-and-such entity; if the answer is that big number, “m”, then we will need a such-and-such entity on our diagram. If the answer is “1” then we will probably need the entity (that is, we will need a such-and-such record), but we will not put it in our diagram. We call an entity that has only one example – only one instance of its type – a singulary entity. The commonest singulary entity we find is: “ourselves”, that is, the enterprise owning the database. It is important to remember this special singulary entity – our own company – and it is also important not to put it in the diagram, because it makes the diagram very cluttered.

 

Associations

 

 

If you think about the sorts of things an enterprise needs to say, and to keep a record of, for instance, “Sold one hundredweight of potatoes to Mr William Smith on 19th August 1874,” you will appreciate that it needs to be able to say more than that things exist, what sorts of things they are (persons, products, buildings, etc.) and how they can be classified. It needs to be able to associate or relate or link one thing to another. In the example we relate a person – Mr Smith – with a product – potatoes. (We shall use these three words – association, relationship, link – in a strictly defined way: you may find somewhat different usage in other places.) The major differences between notations for data diagrams are in how they express associations – connections – between related entities.

 

Data Mapping Notation

 

 

This simple notation is recommended for the initial “mapping” of an enterprise. It can also be used later in systems analysis, and in design. We begin with two kinds of entity: Person and Product:

We have a relationship between them: the person buys or purchases the product. Or we may express the converse relationship, the product is sold to or is bought by the person. For the purposes of this notation, we have to treat such a relationship as an entity. We give it an appropriate name – the name of a thing, remember – such as “purchase”. We think of the (real world) thing, a Purchase, as: a person, on some occasion, buying some quantity of a product. We draw the new entity on the diagram in the usual way, as a labelled rectangle. To show that it relates a person and a product, we “hang” it from them, like this:

Notice that direction – up and down – is important in this notation. If Purchase were drawn above Person and Product the diagram would have a quite different meaning. (Other notations take no account of direction, as you will see.) What fields might we find in a purchase record?

PersonNumber:   89280
ProductNumber:  P5637
Quantity:       1cwt
Date:           19th August 1999
This purchase record has the same PersonNumber as some person record: it shows which person made the purchase. Remember the definition of a purchase: a person, on some occasion, buying ... So there has to be such a person, and if we are to keep that purchase record, there has to be such a person record in our database. We say that the purchase record is dependent on the associated person record. Notice that this connection, shown by a line in the diagram, is what we mean by a link. It is nothing but the identifier (the primary key) of a person record being present (as a foreign key) in the purchase record. (In an object-oriented system, objectIds would be used instead of keys: Purchase would hold its Person objectId.)

Not surprisingly now, the purchase record has the same ProductNumber as some product record: it shows which product was purchased. Remember the definition of a purchase: a person ... buying some quantity of a product. So, just as there had to be such a person, there has to be such a product, and if we are to keep that purchase record, there has to be such a product record in our database.

The purchase record is therefore dependent not only on the associated person record but also on the associated product record, the link being shown by a line in the diagram. ProductNumber is, of course, the identifier (the primary key) of a product record and a foreign key in the purchase record.

In general, entities fall into two major kinds: independent entities, which are not dependent on any other entity, also called simple or strong entities; and dependent entities, dependent on sometimes one, usually two, and very occasionally three or more other entities; dependent entities are also called complex or weak entities.

What does the purchase record mean?

Some things exists – call them a, b, and c.
 
a is a person, c is a product, and b is a purchase of c by a.
 
a is in the class of things that are the same person as person 89280; c is in the class of things that are the same product as product P5637; and b is in the class of purchases of a quantity of 1 hundredweight of a product, and in the class of events happening on 19th August 1999. (Of course, b is also in the class of purchases by the person 89280, and in the class of purchases of product P5637. But that follows from the other information given.)
Notice that Quantity and Date are attributes neither of the person nor of the product but of the purchase, while PersonNumber and ProductNumber are attributes of (that is, ways of classifying) the purchase as well as of the person and product respectively.

We can now distinguish between the concepts, association, relationship, and link. If we think of a purchase and of the person making the purchase, we can see that the person is, so to speak, “written in” to the purchase: the purchase is that person purchasing a particular product on a particular occasion – Mr Smith buying those potatoes. We cannot know what purchase it it without knowing what person it is. (This follows from our chosen definition of “purchase”.)

To put it another way, the association from purchase to person conveys no information: the information about what person made the purchase is written into the purchase itself. The same is true of all links, including of course that from purchase to product. A link therefore cannot have any attributes, other than those of the linked entities. It is usually difficult to name a link, or express it as a relationship, because when we have defined the linked entities – Person 89280 and such-and-such purchase made by person 89280 – there is nothing to add. What could we call it? “is the person making the purchase”?, “purchaser”? “is by”? “makes”? – whatever these express is already present in the definition of “purchase”.

Contrast such a link with a relationship, such as “purchases” (which we expressed with the purchase entity). This does convey information: not merely is there the person, William Smith, and the product, potatoes, but – and here is the extra information – the one purchased some of the other on a certain occasion. Also, this relationship can have attributes – a quantity, a date – and these are attributes neither of the person nor of the product.

We use the word, association, to mean a connection between entities, whether it is a link or a relationship. Because of the way that relationships are handled in the data mapping notation – being expressed as entities – all associations shown as connections (lines) in the data mapping notation are links. One rather fortunate effect of this, for those that like simple rules, is that in creating a data dictionary from a data diagram in this notation one writes a table definition for each entity, and for nothing else.

 

Characteristics of Links

 

 

Because of the definition of the child or dependent (that is, “hanging”) entity, Purchase, on its parent entity, Person, if a person record is deleted then each purchase record linked to it is also deleted: this is called “cascade delete”. Because a purchase is defined with reference to exactly one person, a purchase record is associated with (linked to, and dependent on) exactly one person record. No restriction is placed on how many purchases a person may make. Recalling that systems analysts use only three counting numbers – 0, 1, and m – where “m” is pronounced “many” and means “more than one”, the order of the association from purchase to person is 0..m to 1. It means merely that each purchase is associated with exactly 1 person, and each person is associated with 0, 1, or more purchases. (Some people use the terms multiplicity or cardinality instead of “order”.)

If you wish, you may use the convenient abbreviations “c” (conditional) for 0..1, “m” for 0..m, and “s” (some) for 1..m. There are then 10 different orders that an association might have (or sixteen, if you count reflections):

c:c                       1:c (and c:1)               1:1
c:m (and m:c)    c:s (and s:c)
1:m (and m:1)    1:s (and s:1)
m:m                     m:s (and s:m)              s:s
Almost all links have the order m:1 upwards. And all the other links have the order c:1, also upwards. To show that a link is one-to-one (that is, c:1), we may write a little “1” beside the lower end of the link, or an explanation:

or

We would expect that one Account could have joint holders: the “1” shows that the enterprise does not allow this. The comment “first named” serves the same purpose, and also tells us: if an account really does have joint holders, we pretend it does not by treating the first named person as the sole holder. The commonest reason for such c:1 links are “current” and “preferred”. Perhaps a Department might have many managers over a period of time, but we choose to record only the current manager; or perhaps a person might live at different addreses, but we choose to record only their preferred address (the address at which they want us to contact them).

Notice that other notations do not express the concept of dependency. The interpretation in the data mapping notation of “each purchase is linked up to (that is, dependent on) exactly one person” is that each purchase is associated with only one person for all time. But, in general, if we know that the order of an association, say from A to B, is something to 1 (c:1, 1:1, m:1, or s:1), all we know is that each A is at any time associated with one B; but (in contrast to a dependency of A on B) an A could be associated with one B at one time and another B at another time. (Notice that such an association must be a relationship, not a link, because it has an attribute of its own: the period over which it holds.)

 

Characteristics of Relationships

 

 

In general, in contrast to links, relationships are not dependencies. The order of a relationship (until some “business rule” is applied) is m:m (0..m to 0..m): any number in either direction. Treatment of a relationship as a new, dependent, entity, linked to (and thereby “hung from”) the related entities, expresses a m:m association (quite correctly) as a 1:m association and a m:1 association: and both of these associations are links.

 

UML Notation

 

 

In the UML class diagram notation no distinction between links and relationships is made: all associations are treated in the same way, although specialisations (subtypes) and aggregations (part/whole associations) are specially marked – but we do not discuss these here. The orders of associations are marked on the diagram, using an asterisk for “m”. Association names are written by the lines, with an arrow to show (for example) which entity purchases which. Objects (entities) may have their attributes and methods (applicable functions) written in the rectangle. The “person purchasing product” diagram would therefore be written in either of the following ways:

Remember that, in contrast to the data mapping notation:

The association “makes” from person to purchase is not said by the UML notation to be a dependency; nor is it said not to be a dependency: the question is left open.
 
UML does not distinguish links from relationships.
 
The position of the object (entity) symbols does not matter: purchase could be written (as here) beside person, or above it, or below it.
UML also permits an intermediate notation, showing purchase as an association and as an entity, called an “association class”:

 

Reading the UML Notation

 

 

First a quick revision. In reading the data mapping notation, we may say simply, “purchase is dependent on person” (reading up the link). It implies the constraint that

each purchase is linked to exactly one person for all time.
If we have a little “1” or an explanatory phrase, we read the constraint going down as
each account is linked to at most one holding.
A UML association must be read in both directions. Here are the rules:
From (first UML diagram) person to product: we see that the order inscription at product is (0..*), that is, m. It means: no constraint. If we choose, we can say, that each person purchases zero, one, or more products. But we knew that before we looked at the diagram! Similarly, reading from product to person, with the order inscription (0..*) at person: no constraint. (Each purchase is made by zero, one, or more persons.) A similar lack of constraint is found (second UML diagram) person to purchase and product to purchase.
 
From (second UML diagram) purchase to person: we see that the order inscription is not (0..*). So it could express either one or two constraints. If it begins with “1” – if it is (1) or (1..*) – then it expresses the constraint:
 
Each purchase is made by at least 1 person.
 
Same association: If the order inscription ends with “1” – if it is (1) or (0,1) – then it expresses the constraint:
 
Each purchase is made by at most 1 person.
 
Same association, also purchase to product: in fact, we have both constraints. Each purchase is made by at least one and by at most one person (and is of at least one and at most one product), so we may combine them and say:
 
Each purchase is made by exactly 1 person.
Each purchase is of exactly 1 product.

 

Degree and Membership Class

 

 

Some more technical terms: The degree of an association tells us the upper limit of its order inscriptions: “1” or “m” (“*” in UML). So, checking the list above, an association, say from A to B, may have degree 1-to-1, 1-to-m, m-to-1, or m-to-m.
 

Degree 1-to-1
means that each A is associated with at most one B, and
each B is associated with at most one A.
 
Degree 1-to-m
means that each B is associated with at most one A.
 
Degree m-to-1
means that each A is associated with at most one B.
 
Degree m-to-m
means that neither of those constraints apply.
(If you express order using “1”, “c”, “m”, and “s”, simply translate “1” and “c” to “1” and “m” and “s” to “m” to get the degree.)

Some systems (rather inconveniently) mark degree rather than order, so they also need some way to express the “at least” constraints. For this, they use the (not very well named) concept of membership class. If in an association from A to B:

Each A is associated with at least one B
A has a membership class of obligatory.
 
Each B is associated with at least one A
B has a membership class of obligatory.
If an entity in a relationship does not have a membership class of obligatory, then it has a membership class of non-obligatory. To add further confusion, some authors say “mandatory” for “obligatory” and “optional” for “non-obligatory”. Again checking the list above, membership class is equivalent to having an order inscription beginning with “1” (obligatory) or “0” (non-obligatory). However, this gives the membership class of the other entity. The “makes” association from person to purchase is of order (1) to (0..*) - that is, 1:m. The “0” in the order inscription of purchase shows that the membership class of person in makes is non-obligatory; the “1” in the order inscription of person shows that the membership class of purchase in makes is obligatory. (Each purchase is made by at least one person).

The underlying idea of “membership class” is that the making of a purchase by a person is an ordered pair, such as:

(William Smith, William Smith’s purchase of potatoes on 19/8/1874)
And “makes” is understood as the set of all such ordered pairs. Obviously, we could have a person who has never made a purchase, and who is therefore not a member of one of those ordered pairs that is a member of the set, “makes”. It is non-obligatory for a person to have such membership. By contrast, if we have a purchase, then it must have been made, so it must be a member of one of those ordered pairs that is a member of the set “makes”. It is obligatory for a purchase to have such membership.

(Look: Don’t blame me. I didn’t invent it; I’m just trying to explain it. Take a deep breath, and continue.)

 

 

SITE HOME PAGE

 

 

THE DATABASE PAGE

 

THE DATABASE PAPERS

 

DOWNLOAD

Download Notations for Data Diagrams in Restricted Text Format (rtf, Word for Windows compatible)

Next part of
Notations for Data Diagrams

 

Copyright © 1999, 2001 Adrian Larner. The author asserts all moral rights.

The decorative image of a key (cc004239.gif) used on this page was obtained from IMSI's MasterClips/MasterPhotos© Collection, 1895 Francisco Blvd East, San Rafael, CA 94901-5506, USA.