|
(U01) |
www.btinternet.com/~adrian.larner/database/objent |
|
Objects and Entities A data analysis paper by Adrian Larner |
||
|
|
||
Introduction |
|
|
|
|
If you are studying database specification and object oriented analysis and design, you will no doubt have observed that the entities and relationships shown on entity/relationship diagrams seem to be very like the objects and associations in class diagrams. How are they related? What are their differences? | |
|
|
||
Entities |
|
|
|
|
Let us first be clear about entities and relationships. We classify these further, into:
When we talk about entities, we are talking about records (that is, rows). We may talk about records individually, as when we say that the record for student 9561340 contains a home postcode of LE1 2PB. However, when we are merely specifying a database, rather than using it, and do not have any records yet, we ignore the differences between records of the same kind, such as Student records; that is to say, we ignore the different values in their attributes, and talk more generally or abstractly of the Student record. We say that when we talk like this we talk of record (or entity) types. Hereafter, we shall distinguish between our talk about individual records, when we shall say record instance or merely record, and our more general talk about records of common types, when we shall say record type. We know that a record is an aggregate of (let us say) fields, where a field is a value of some attribute. But how are we to interpret records? what do they mean? how are we to understand them? To this question, two main answers have been given:
Notice that just as we can talk about record instances and types, so we can talk about real world entity instances (such as the actual student, 9561340) and entity types (the student). Having distinguished talk of instances from talk of types, we might ask how instances and their types are related. To this question too, there are two different answers:
Let us track through all these interpretations and explanations. On an entity/relationship diagram we see an independent entity depicted: a rectangle containing the word, Student, which designates each of a number of records (that we plan to have) in our database, and almost certainly they will go in one table. The rectangle abstractly represents such a record (abstractly because we do not, at this stage, distinguish one student record from another), that is to say, the rectangle containing the word, Student is a specification of each student record (albeit a very brief and abstract specification). The word Student, because it designates each of those records, denotes the class of them (the Student table). Each student record is a proposition telling us that there exists a (real world) student, and giving us further information about that student. If you go along with the second bulleted item, above, you will also say that in addition to the individual (real world) students there exists an extra real world thing, the class of students. | |
|
|
||
Objects |
|
|
|
|
Now let us consider object class diagrams, which are very like entity/relationship diagrams. What differences are there? One minor difference is that we often see attributes listed in the rectangles representing object classes, but rarely in the rectangles representing entity types; but that is a mere detail. Far more important is that we see methods listed for object classes, but never for entity types. This is because the methods of an object constitute its behaviour, and all objects have identity, state, and behaviour. In an object oriented system, process (behaviour) is tightly bound to data (state). By marked contrast, databases do not permit tight binding of data and process: they separate data and process by means of views (external schemas). This is why database specifications do not refer to methods. Consider, therefore, an object class diagram where the rectangles contain nothing but the object names: Student, say for the student object class. We may certainly interpret this exactly as we would interpret the same symbol on an entity/relationship diagram, as far as the student object instances are concerned: it is a specification of each student object instance, and the name student designates each of those student object instances. There are two kinds of Object Oriented system, differing in the way that they treat object classes:
The first problem is a matter of the unorthodox and eccentric use of terms. An object class, as implemented in a class object, is not a set or class: it is not a collection of which its instances are members. If in such a system you wanted an object that was the set of student object instances you would have to specify a new object instance to be that set, and contain those instances. What then, in this second kind of Object Oriented system, are object classes? | |
|
|
||
Object Classes |
|
|
|
|
You may have heard object defined as instance of a class (that is, instance of an object class). Taking class in its usual (collective) sense, it is true if we accept that there are classes that each object of some kind is an instance (in the sense, a member) of the class of objects of that kind. True, but hardly informative enough to serve as a definition. It would be like defining animal as instance of a class of animals. The definition becomes even more suspect if we interpret class (that is, object class) as what is implemented in a class object, which is not a set or collection at all, not a class in the usual sense of that word. For clarity, let us hereafter use class in this new sense (which we have yet to clarify, but which certainly is not the sense of set). If classes are themselves objects, it is impossible to define object as instance of a class, because the class is itself an object, that is, a class object. It seems therefore that in order to understand what an object class is, it is first necessary to understand what an object is, that is, an object instance. As we have already said, an object has identity, state, and behaviour. Consider an ordinary program variable, such a variable called i, used to control a loop, and having successive integer values within the loop, 0, 1, 2, ... 9, and (on exit) 10:
An object is very like a variable, but with these two differences:
And now, Object Classes (in those systems in which they are implemented as class objects): what are they for? Indeed, why should there be class objects at all? Clearly, class objects are not necessary, because some Object Oriented systems work without them. But when they are present, their principal function appears to be to create new instances. So to create a new student object one would send a create message to the Student class object. This is why class objects are sometimes described as factories: the student factory manufactures students. But, apart from any other oddness in the analogy, to speak of class objects in this way is to reduce them to a single method, create. It is like calling a number object an adder. In terms of data, what is a class object? In other words, what does its state comprise? It seems obvious that if the class object is to be used to create instances, or for that matter to recognise whether or not an object is of the pertinent type, then its state must comprise a specification of an object of that class. That is to say, the state of the Student class object is pretty much the same as the column specification of an SQL Student table: (StudentNumber CHAR(9), Surname CHAR VARYING(20), GivenName CHAR VARYING(16), BirthDate DATE, ...) It is what we find, abbreviated perhaps, on a class (or entity/relationship) diagram: what is written in a rectangle (plus what is written in the associated table definition). Some other metaphors that are used to describe class objects are: moulds, stamps, blueprints, templates. These are all misleading, and not only because they pertain almost entirely to the Create method. They all imply that the state of a class object is the same shape as (technically, isomorphic with) the state of its instance objects; but this is false. (The specification of an array of ten elements need not itself comprise ten elements.) Mould and stamp have the further misleading implication of the class being an inverted form of the instance! Template has the further misleading implication of the class being the same size as the instance, as if a class of fixed length one megabyte images had itself to take up a megabyte. In terms of creation of instances, it would be better to say that the specification that is the state of a class object is used as a recipe. It says how an instance is to be created, but is not the same shape as the instance. It says, for instance, that to make an instance you take a StudentNumber field of 9 characters, a Surname field of up to 20 characters, and so on. (Recipe is Latin for Take.) In any event, if we say that a rectangle on a class diagram represents an object class, we have to remember that it does so in quite a different way from that in which a rectangle on an entity/relationship diagram represents a type of entity. The rectangle marked Student on a class diagram exhibits the Student object class: like the Student object class, it is a specification of each student object instance. The rectangle marked Student on an entity/relationship diagram is also a specification of each student record instance (and that is what is meant by saying that it represents a type of record: it represents each instance of that type); but it does not represent another very similar specification. We may say, if we choose, that it represents the class of student records, but in the traditional sense of class, the collection of them (the Student table): more specifically, the name, Student, in the rectangle denotes that class. Nothing in a class diagram represents the set of student objects, unless we add an extra object class for that purpose. | |
|
|
||
Subtyping |
|
|
|
|
What do we mean by saying that, for instance (and in the real world), Postgraduate is a subtype of Student? We mean two things:
Is subtyping the same as subsetting? Subsetting is a restricted sort of subtyping: the use of set theory limits us to a single identity, so we cannot count using is the same copy as and is the same edition as. So everything is counted in the same way, and we do not need to test the second condition given above. And what about subclassing, using class still in the Object Oriented sense? (This may be called object subtyping by some authors.) It certainly is neither subsetting nor subtyping, and one presumes that it pertains to object classes, that is, to specifications of objects. One is tempted to say, by analogy with subtyping, that, if the Postgraduate class is a subclass of the Student class (and remembering that these classes are specifications) then:
What in an entity/relationship diagram is a subtype? Suppose we have the two linked entity types, Student and Postgraduate, of which the following are required to hold true:
It is important to note that the subtype relationships shown on an Entity/Relationship diagram pertain to the real world entities, not to the records in the database (better to say that the relationship between the records is specialisation rather than subtyping). Even if Postgraduate is a subtype of Student and so shown on the diagram, Postgraduate record is not a subtype of Student record. If, however, we consider the set of primary keys (say StudentNumber) found in the Student records, and the set of them that are foreign keys (and probably also primary keys) in the Postgraduate records, these two sets are superset and subset. But that, of course, holds true of any relationship that is obligatory at the foreign key end. What is special about subtyping is that the foreign key StudentNumber in Postgraduate is a candidate key (and often it is the primary key). Specialisation, or subtyping in an entity/relationship diagram is what we may call specialisation by extension. The attributes in a postgraduate record could be used to extend those in its associated student record to give what we might call a full postgraduate record, thus:
Although specialisation by extension can be used to represent, or model, subtyping (of real world entities), only specialisation by constraint actually gives us subtyping (of records). As mentioned above, objects cannot be subtyped, that is, they cannot be specialised by constraint. What is called subtyping or subclassing in Object Oriented systems is usually specialisation by extension. It should be noted that a postgraduate object corresponds not to a postgraduate record but to what was defined above as a FullPostgraduate record, and (assuming no other specialisation of Student), for certain purposes (it seems) a student object corresponds not simply to a Student record but either to a FullPostgraduate record or (for non-postgraduate students) to a Student record. However, if one attempted to assign the value of a postgraduate object to a student object, the extension data (that of the equivalent Postgraduate record) would not be assigned. In a relational database, it is not possible to hold both Student and FullPostgraduate records in a common table (because all rows in a table have to have exactly the same attributes). But an Object Oriented system could have a collection object that held all student and postgraduate objects. If one kind of record is specialised by extension to give another kind, the relationship between the two is not supertype to subtype, nor superset to subset. But what is it? It is, fairly obviously, that each specialised record has as part a non-specialised record, just as each FullPostgraduate record has a part that is a Student record; and in addition we would say that no other FullPostgraduate record has the same Student record as part. Now, how does this part relationship work in an Object Oriented system. (Remember that it is a part relationship in the system, modelling a supertype relationship in the world.) As far as state and behaviour is concerned, the relationship is simple. A postgraduate object has a state of which the state of a student object is part (just as in the relational case, as long as we remember that the postgraduate object is equivalent to the FullPostgraduate record). A similar relationship holds with respect to behaviour: a postgraduate object has all the behaviour of a student object and some extra behaviour, or at least that is the intention. We find a difference when we consider identity. In a relational database, we would identify the records by primary key, StudentNumber. This ensures that no two students (whether postgraduates or not) have exactly the same student data (the same attribute values in the Student record). But in an Object Oriented system, we would probably be tempted to use the object identifier for this purpose. This would mean that we had to take special steps to ensure that the state of each student object (or relevant part of the state of any postgraduate object) differed from that of any other student object: almost certainly, we would do this by using a (human readable) StudentNumber, and the create method of the Student class object would be written to enforce this uniqueness (along with any update methods of the student objects that might modify the StudentNumber value). Interestingly, we might consider how, in such an Object Oriented system, we would show that a non-postgraduate student had become a postgraduate. (In a relational database we would simply insert the required Postgraduate record, and the FullPostgraduate view would change accordingly.) Clearly, we have to create a postgraduate object (by sending a message to the Postgraduate class object, which itself would have to co-operate with the Student class object). Into this new object we would have to copy the current state of the student object (StudentNumber, Name, Addresses, and so on). Notice immediately that object identity within our system does not model student identity in the world: we have two objects, with different object identifiers of course, for one and the same student. Student identity is modelled by StudentNumber identity within the system (just as in a relational system). This solution creating a new object and copying state does not merely prevent object identity modelling real world identity (which may not be very important). It also raises other problems: references (pointers) to the original student object no longer lead to the object that we wish to represent the student, which is now the postgraduate object. Either these references must be modified (which means that we must have back-pointers to them), and then the student object can be deleted; or we must have a reference to the new (postgraduate) object in the old (student) object, and apart from any other problems this would still leave us with redundant data (replicated in the old object and the new), and a risk of inconsistency, and we would have two ways to represent a postgraduate student, either by a postgraduate object alone or by an old student object and a new postgraduate object. As specialisation of records by constraint is trivially easy in a relational system (the constraint being merely a restriction condition), and impossible in an Object Oriented system, the specialisations shown on entity/relationship and object class diagrams are not subtypings of records or of objects but whole/part relationships of records or of the states and behaviours of objects, which may represent subtype/supertype relationships in the real world. The implementation of such relationships between objects gives rise to a number of problems, not resolved here. But we should now be able to formulate the relationship between two class objects when one is specified to be a specialisation by extension (a subclass) of the other:
| |
|
|
||
Aggregation (Part/Whole Relationships) |
|
|
|
|
We have seen that specialisation subtyping by extension is actually achieved (in terms of records, or object states and behaviours) by a part/whole relationship (the whole being the subtype, the part the supertype). Such relationships are very common in relational databases: they are joins. But sometimes, on class diagrams, we see certain associations marked as part/whole (that is, aggregation) relationships. These are intended to represent real world aggregations. All that needs to be said is that there is nothing special about these relationships: knowing that a real world relationship is an aggregation tells us nothing of how it should be represented in a system. In other words, we should treat special aggregation notation as purely informal. |
|
|
|
||
|
|
|
|
|
|
||
|
|
||
|
|
Download Objects and Entities in Restricted Text Format (rtf, Word for Windows compatible) |
|
|
|
||
|
Copyright © 1994, 2001 Adrian Larner. The author asserts all moral rights. |
||
|
The decorative image of a key (cc004239.gif) used on this page was obtained from IMSI's MasterClips/MasterPhotos© Collection, 1895 Francisco Blvd East, San Rafael, CA 94901-5506, USA. |
||