|
(U01) |
www.btinternet.com/adrian.larner/database/ddnota1 |
|
Notations for Data Diagrams A data analysis paper by Adrian Larner |
||
|
|
||
Firstly ... |
|
|
|
|
... a word on notations in general: if you have a notation to use, whether to read or to write, you must make sure that you understand it. To understand it is to be able to translate it into English if reading it, or to translate from English into the notation if you are writing it. |
|
|
|
||
Entities |
|
|
|
|
Let us begin with a symbol used in most data diagrams, the rectangle, usually in “landscape” orientation (long sides horizontal), and named:
What does it mean? (Ignore the fancy edge: thats web page designers licence to prettify.) It represents a kind of record, a person record. When we say that it represents a kind of record, we mean that it represents each record of that kind. In much the same way, the name “person” designates (that is, names) each thing of a certain kind in the real world. There is, of course, a connection between the real world object, or entity (a person), and a person record in a file or database. If we examined a person record we might find that it comprised, perhaps, these fields:
We can now say that “person record”, and therefore the rectangle in the diagram, represents each sentence recorded by the database owner, having this form:
Sometimes we think of the person rectangle in the diagram as representing the file, or table, that contains the person records. Strictly, this is a mistake: it does not matter whether we keep all the person records in one file of their own (as we usually do), or in two or more files, or perhaps in a single file or database that holds all kinds of record. But, of course, if we keep each kind of record in its own file then there will be one file (or table) for each rectangle in our data diagram. But remember: we write “person” in the rectangle because it represents a single record (that is, each person record, any person record), but not all the person records together: so we do not write “persons” in the rectangle. The general rule is that an entity name must be the name of a thing, singular not plural, and not the name of a stuff. Therefore “person” and not “persons” and not “people”; “employee” or “staff member” and not “staff” and not “employees”. Here is a simple test of a good name, say “such-and-such”: does it make sense to ask, “How many such-and-suches are there?” So “How many persons ...?” or “How many employees ...?”, but not “How many peoples ...?” or “How many staffs ...?” or “How many employeeses ...?” Actually, it is useful not only to check if that question makes sense, but also to answer it, using one of the only three numbers known to systems analysts (we analysts are a simple folk): none (0, zero), one (1), or more-than-one (“m”, that is, 2 or 3 or 4 or...) If the answer to “How many such-and-suches ...?” is “0” then we won’t need a such-and-such entity; if the answer is that big number, “m”, then we will need a such-and-such entity on our diagram. If the answer is “1” then we will probably need the entity (that is, we will need a such-and-such record), but we will not put it in our diagram. We call an entity that has only one example only one instance of its type a singulary entity. The commonest singulary entity we find is: “ourselves”, that is, the enterprise owning the database. It is important to remember this special singulary entity our own company and it is also important not to put it in the diagram, because it makes the diagram very cluttered. |
|
|
|
||
Associations |
|
|
|
|
If you think about the sorts of things an enterprise needs to say, and to keep a record of, for instance, “Sold one hundredweight of potatoes to Mr William Smith on 19th August 1874,” you will appreciate that it needs to be able to say more than that things exist, what sorts of things they are (persons, products, buildings, etc.) and how they can be classified. It needs to be able to associate or relate or link one thing to another. In the example we relate a person Mr Smith with a product potatoes. (We shall use these three words association, relationship, link in a strictly defined way: you may find somewhat different usage in other places.) The major differences between notations for data diagrams are in how they express associations connections between related entities. |
|
|
|
||
Data Mapping Notation |
|
|
|
|
This simple notation is recommended for the initial “mapping” of an enterprise. It can also be used later in systems analysis, and in design. We begin with two kinds of entity: Person and Product:
We have a relationship between them: the person buys or purchases the product. Or we may express the converse relationship, the product is sold to or is bought by the person. For the purposes of this notation, we have to treat such a relationship as an entity. We give it an appropriate name the name of a thing, remember such as “purchase”. We think of the (real world) thing, a Purchase, as: a person, on some occasion, buying some quantity of a product. We draw the new entity on the diagram in the usual way, as a labelled rectangle. To show that it relates a person and a product, we “hang” it from them, like this:
Notice that direction up and down is important in this notation. If Purchase were drawn above Person and Product the diagram would have a quite different meaning. (Other notations take no account of direction, as you will see.) What fields might we find in a purchase record?
Not surprisingly now, the purchase record has the same ProductNumber as some product record: it shows which product was purchased. Remember the definition of a purchase: a person ... buying some quantity of a product. So, just as there had to be such a person, there has to be such a product, and if we are to keep that purchase record, there has to be such a product record in our database. The purchase record is therefore dependent not only on the associated person record but also on the associated product record, the link being shown by a line in the diagram. ProductNumber is, of course, the identifier (the primary key) of a product record and a foreign key in the purchase record. In general, entities fall into two major kinds: independent entities, which are not dependent on any other entity, also called simple or strong entities; and dependent entities, dependent on sometimes one, usually two, and very occasionally three or more other entities; dependent entities are also called complex or weak entities. What does the purchase record mean?
We can now distinguish between the concepts, association, relationship, and link. If we think of a purchase and of the person making the purchase, we can see that the person is, so to speak, “written in” to the purchase: the purchase is that person purchasing a particular product on a particular occasion Mr Smith buying those potatoes. We cannot know what purchase it it without knowing what person it is. (This follows from our chosen definition of “purchase”.) To put it another way, the association from purchase to person conveys no information: the information about what person made the purchase is written into the purchase itself. The same is true of all links, including of course that from purchase to product. A link therefore cannot have any attributes, other than those of the linked entities. It is usually difficult to name a link, or express it as a relationship, because when we have defined the linked entities Person 89280 and such-and-such purchase made by person 89280 there is nothing to add. What could we call it? “is the person making the purchase”?, “purchaser”? “is by”? “makes”? whatever these express is already present in the definition of “purchase”. Contrast such a link with a relationship, such as “purchases” (which we expressed with the purchase entity). This does convey information: not merely is there the person, William Smith, and the product, potatoes, but and here is the extra information the one purchased some of the other on a certain occasion. Also, this relationship can have attributes a quantity, a date and these are attributes neither of the person nor of the product. We use the word, association, to mean a connection between entities, whether it is a link or a relationship. Because of the way that relationships are handled in the data mapping notation being expressed as entities all associations shown as connections (lines) in the data mapping notation are links. One rather fortunate effect of this, for those that like simple rules, is that in creating a data dictionary from a data diagram in this notation one writes a table definition for each entity, and for nothing else. |
|
|
|
||
Characteristics of Links |
|
|
|
|
Because of the definition of the child or dependent (that is, “hanging”) entity, Purchase, on its parent entity, Person, if a person record is deleted then each purchase record linked to it is also deleted: this is called “cascade delete”. Because a purchase is defined with reference to exactly one person, a purchase record is associated with (linked to, and dependent on) exactly one person record. No restriction is placed on how many purchases a person may make. Recalling that systems analysts use only three counting numbers 0, 1, and m where “m” is pronounced “many” and means “more than one”, the order of the association from purchase to person is 0..m to 1. It means merely that each purchase is associated with exactly 1 person, and each person is associated with 0, 1, or more purchases. (Some people use the terms multiplicity or cardinality instead of “order”.) If you wish, you may use the convenient abbreviations “c” (conditional) for 0..1, “m” for 0..m, and “s” (some) for 1..m. There are then 10 different orders that an association might have (or sixteen, if you count reflections):
We would expect that one Account could have joint holders: the “1” shows that the enterprise does not allow this. The comment “first named” serves the same purpose, and also tells us: if an account really does have joint holders, we pretend it does not by treating the first named person as the sole holder. The commonest reason for such c:1 links are “current” and “preferred”. Perhaps a Department might have many managers over a period of time, but we choose to record only the current manager; or perhaps a person might live at different addreses, but we choose to record only their preferred address (the address at which they want us to contact them). Notice that other notations do not express the concept of dependency. The interpretation in the data mapping notation of “each purchase is linked up to (that is, dependent on) exactly one person” is that each purchase is associated with only one person for all time. But, in general, if we know that the order of an association, say from A to B, is something to 1 (c:1, 1:1, m:1, or s:1), all we know is that each A is at any time associated with one B; but (in contrast to a dependency of A on B) an A could be associated with one B at one time and another B at another time. (Notice that such an association must be a relationship, not a link, because it has an attribute of its own: the period over which it holds.) |
|
|
|
||
Characteristics of Relationships |
|
|
|
|
In general, in contrast to links, relationships are not dependencies. The order of a relationship (until some “business rule” is applied) is m:m (0..m to 0..m): any number in either direction. Treatment of a relationship as a new, dependent, entity, linked to (and thereby “hung from”) the related entities, expresses a m:m association (quite correctly) as a 1:m association and a m:1 association: and both of these associations are links. |
|
|
|
||
UML Notation |
|
|
|
|
In the UML class diagram notation no distinction between links and relationships is made: all associations are treated in the same way, although specialisations (subtypes) and aggregations (part/whole associations) are specially marked but we do not discuss these here. The orders of associations are marked on the diagram, using an asterisk for “m”. Association names are written by the lines, with an arrow to show (for example) which entity purchases which. Objects (entities) may have their attributes and methods (applicable functions) written in the rectangle. The “person purchasing product” diagram would therefore be written in either of the following ways:
Remember that, in contrast to the data mapping notation:
|
|
|
|
||
Reading the UML Notation |
|
|
|
|
First a quick revision. In reading the data mapping notation, we may say simply, “purchase is dependent on person” (reading up the link). It implies the constraint that
|
|
|
|
||
Degree and Membership Class |
|
|
|
|
Some more technical terms: The degree of an association tells us the upper limit
of its order inscriptions: “1” or “m” (“*” in UML).
So, checking the list above, an association,
say from A to B,
may have degree 1-to-1, 1-to-m, m-to-1, or m-to-m. Some systems (rather inconveniently) mark degree rather than order, so they also need some way to express the “at least” constraints. For this, they use the (not very well named) concept of membership class. If in an association from A to B:
The underlying idea of “membership class” is that the making of a purchase by a person is an ordered pair, such as:
(Look: Don’t blame me. I didn’t invent it; I’m just trying to explain it. Take a deep breath, and continue.) |
|
|
|
||
|
|
|
|
|
|
||
|
|
||
|
|
Download Notations for Data Diagrams in Restricted Text Format (rtf, Word for Windows compatible) |
|
|
|
||
|
Copyright © 1999, 2001 Adrian Larner. The author asserts all moral rights. |
||
|
The decorative image of a key (cc004239.gif) used on this page was obtained from IMSI's MasterClips/MasterPhotos© Collection, 1895 Francisco Blvd East, San Rafael, CA 94901-5506, USA. |
||