(U01)

www.btinternet.com/~adrian.larner/database/objent

Objects and Entities

A data analysis paper by Adrian Larner

 

Introduction

 

 

If you are studying database specification and object oriented analysis and design, you will no doubt have observed that the entities and relationships shown on entity/relationship diagrams seem to be very like the objects and associations in class diagrams. How are they related? What are their differences?

 

Entities

 

 

Let us first be clear about entities and relationships. We classify these further, into:

Independent Entities: An entity is independent if each relationship in which it is involved is non-obligatory for that entity.
 
Links: These are relationships that can have no attributes, and for which no table is required.
 
Dependent Entities: Anything else! That is, either an entity involved in a relationship that is obligatory for the entity, or a relationship that can have attributes or for which a table is required.
Almost invariably, for each entity (in the sense we shall now use, “dependent or independent entity”, first or third item, above) there is one table, and for each table there is one entity. This is because (1) different entities usually have different attributes (that is, columns) and a table (technically, a relation) may not contain records with different attributes; and (2) if we find we have two tables with the same attributes we usually respecify them as one table.

When we talk about entities, we are talking about records (that is, rows). We may talk about records individually, as when we say that the record for student 9561340 contains a home postcode of LE1 2PB. However, when we are merely specifying a database, rather than using it, and do not have any records yet, we ignore the differences between records of the same kind, such as Student records; that is to say, we ignore the different values in their attributes, and talk more generally or abstractly of “the Student record”. We say that when we talk like this we talk of record (or entity) types. Hereafter, we shall distinguish between our talk about individual records, when we shall say “record instance” or merely “record”, and our more general talk about records of common types, when we shall say “record type”.

We know that a record is an aggregate of (let us say) fields, where a field is a value of some attribute. But how are we to interpret records? what do they mean? how are we to understand them? To this question, two main answers have been given:

Each record represents an entity (a thing) in the world. This is the assumption that lies behind the idea of “data modelling” and the Entity/Relationship theory: that a database models or maps the world (or, better, an aspect of part of the world).
 
Each record is, in a highly regimented language (Relationese!) a sentence that is either true or false (we hope, true). Technically, such a sentence is called a proposition. Such a sentence can be fairly easily translated into English. To use the example above, we might read the record in
 
   SELECT StudentNumber, HomePostcode
     FROM Student
     WHERE StudentNumber = ‘9561340’

 
as the proposition (in English), “There is a student with number 9561340 whose home postcode is LE1 2PB.”
There is a connection between these two interpretations: the propositions that are independent entity records tell us of one thing and say that it exists; this thing is, in the other interpretation, the real world entity that the record supposedly “models”. The propositions that are dependent entity records tell us about more than one thing, and say how they are related; that is, they tell us of relationships.

Notice that just as we can talk about record instances and types, so we can talk about real world entity instances (such as the actual student, 9561340) and entity types (“the student”). Having distinguished talk of instances from talk of types, we might ask how instances and their types are related. To this question too, there are two different answers:

There are only instances! There are not, in addition, types. To talk of types (whether of records or of real world entities) is merely to talk generally or abstractly of instances (“abstractly” because we have ignored – abstracted away from – their differences).
 
There are types in addition to instances. The type Student is the class (or set) of students, a collection of which each student is a member.
Notice that, whichever of these answers we accept, the common name, “student” is said to designate (to name or title) each student, and to denote the class of students. (But, if we accept the first answer, when we say that “student” denotes the class of students we can mean no more than: “student” designates each student.) Likewise, in our database definition, the common name, “Student”, designates each row in the Student table and denotes the class of these rows, which is nothing but the Student table itself.

Let us track through all these interpretations and explanations. On an entity/relationship diagram we see an independent entity depicted: a rectangle containing the word, “Student”, which designates each of a number of records (that we plan to have) in our database, and almost certainly they will go in one table. The rectangle abstractly represents such a record (abstractly because we do not, at this stage, distinguish one student record from another), that is to say, the rectangle containing the word, “Student” is a specification of each student record (albeit a very brief and abstract specification). The word “Student”, because it designates each of those records, denotes the class of them (the Student table). Each student record is a proposition telling us that there exists a (real world) student, and giving us further information about that student. If you go along with the second bulleted item, above, you will also say that in addition to the individual (real world) students there exists an extra real world thing, the class of students.

 

Objects

 

 

Now let us consider object class diagrams, which are very like entity/relationship diagrams. What differences are there? One minor difference is that we often see attributes listed in the rectangles representing object classes, but rarely in the rectangles representing entity types; but that is a mere detail. Far more important is that we see methods listed for object classes, but never for entity types. This is because the methods of an object constitute its behaviour, and all objects have identity, state, and behaviour. In an object oriented system, process (behaviour) is tightly bound to data (state). By marked contrast, databases do not permit tight binding of data and process: they separate data and process by means of views (external schemas). This is why database specifications do not refer to methods.

Consider, therefore, an object class diagram where the rectangles contain nothing but the object names: “Student”, say for the student object class. We may certainly interpret this exactly as we would interpret the same symbol on an entity/relationship diagram, as far as the student object instances are concerned: it is a specification of each student object instance, and the name “student” designates each of those student object instances.

There are two kinds of Object Oriented system, differing in the way that they treat object classes:

Object classes, are not components of any application.
 
Object classes are components of applications, and they – or something one-one related to them – are implemented as objects, that is, as additional object instances, which we will call “class objects” (because “object class object instance” seems a little long-winded).
Consider a system having three student object instances. In the first kind of system, that means we have three objects (three object instances). In the second kind of system it means that we have four objects, the three student objects and the Student class object. The first kind of system should give us no problem of understanding: it is, conceptually, in respect of instances and classes, just like the relational database system. The second kind, which we shall now discuss, presents more problems.

The first problem is a matter of the unorthodox and eccentric use of terms. An object class, as implemented in a class object, is not a set or class: it is not a collection of which its instances are members. If in such a system you wanted an object that was the set of student object instances you would have to specify a new object instance to be that set, and contain those instances. What then, in this second kind of Object Oriented system, are object classes?

 

Object Classes

 

 

You may have heard “object” defined as “instance of a class” (that is, “instance of an object class”). Taking “class” in its usual (collective) sense, it is true – if we accept that there are classes – that each object of some kind is an instance (in the sense, a member) of the class of objects of that kind. True, but hardly informative enough to serve as a definition. It would be like defining “animal” as “instance of a class of animals”. The definition becomes even more suspect if we interpret “class” (that is, “object class”) as what is implemented in a class object, which is not a set or collection at all, not a “class” in the usual sense of that word. For clarity, let us hereafter use “class” in this new sense (which we have yet to clarify, but which certainly is not the sense of “set”). If classes are themselves objects, it is impossible to define “object” as “instance of a class”, because the “class” is itself an object, that is, a class object.

It seems therefore that in order to understand what an object class is, it is first necessary to understand what an object is, that is, an object instance. As we have already said, an object has identity, state, and behaviour. Consider an ordinary program variable, such a variable called “i”, used to control a loop, and having successive integer values within the loop, 0, 1, 2, ... 9, and (on exit) 10:

i := 0
DO WHILE i ¬= 10
   ...
   i := i + 1
ENDDO
The identity of the variable, i, is simply that of its name, “i” (within a certain scope, perhaps one activation, that is, execution of the program): no matter what value it contains (0 at entry, then 1, then 2, ...), it is the same variable. The state of i is simply the value it contains at any given point in the execution of the program. (In general, such values may be of unlimited complexity; an integer value is a very simple case.) The behaviour of i comprises (1) all the processes that one can perform on an integer – adding, multiplying, testing for equality, and so on – and (2) the process of assignment of an integer value to i (or any combination of these). Notice that a specimen of behaviour of i can equally be a specimen of behaviour of some other variable, even of some other type of variable. Thus, if our loop were intended to sum the values of an array, a, we might code “a(i)” to mean “element number i of a”. The process involved in obtaining this element is part of the behaviour of i and also part of the behaviour of a.

An object is very like a variable, but with these two differences:

Instead of a name, like “i”, an object has a (not human-readable) object identifier. We say that an object is an anonymous (that is, nameless) variable.
 
The behaviour of an object comprises only part of the behaviour of the equivalent variable. This is because (strictly speaking) each specimen of behaviour in an Object Oriented system belongs to (or “is the responsibility of”) one and only one object. This is compromised to some degree in any Object Oriented programming language. Objects of the same class have similar behaviours.
An object is therefore an anonymous variable along with the subset of the processes applicable to variables of that type chosen to belong to it. Each of these processes is called a “method”, and it is invoked by sending a message to the object to which it belongs.

And now, Object Classes (in those systems in which they are implemented as class objects): what are they for? Indeed, why should there be class objects at all? Clearly, class objects are not necessary, because some Object Oriented systems work without them. But when they are present, their principal function appears to be to create new instances. So to create a new student object one would send a “create” message to the Student class object.

This is why class objects are sometimes described as “factories”: the student factory manufactures students. But, apart from any other oddness in the analogy, to speak of class objects in this way is to reduce them to a single method, “create”. It is like calling a number object an “adder”.

In terms of data, what is a class object? In other words, what does its state comprise? It seems obvious that if the class object is to be used to create instances, or for that matter to recognise whether or not an object is of the pertinent type, then its state must comprise a specification of an object of that class. That is to say, the state of the Student class object is pretty much the same as the column specification of an SQL Student table: (StudentNumber CHAR(9), Surname CHAR VARYING(20), GivenName CHAR VARYING(16), BirthDate DATE, ...) It is what we find, abbreviated perhaps, on a class (or entity/relationship) diagram: what is written in a rectangle (plus what is written in the associated table definition).

Some other metaphors that are used to describe class objects are: moulds, stamps, blueprints, templates. These are all misleading, and not only because they pertain almost entirely to the Create method. They all imply that the state of a class object is the same shape as (technically, “isomorphic with”) the state of its instance objects; but this is false. (The specification of an array of ten elements need not itself comprise ten elements.) “Mould” and “stamp” have the further misleading implication of the class being an inverted form of the instance! “Template” has the further misleading implication of the class being the same size as the instance, as if a class of fixed length one megabyte images had itself to take up a megabyte.

In terms of creation of instances, it would be better to say that the specification that is the state of a class object is used as a recipe. It says how an instance is to be created, but is not the same shape as the instance. It says, for instance, that to make an instance you take a StudentNumber field of 9 characters, a Surname field of up to 20 characters, and so on. (“Recipe” is Latin for “Take”.)

In any event, if we say that a rectangle on a class diagram represents an object class, we have to remember that it does so in quite a different way from that in which a rectangle on an entity/relationship diagram represents a type of entity. The rectangle marked “Student” on a class diagram exhibits the Student object class: like the Student object class, it is a specification of each student object instance. The rectangle marked “Student” on an entity/relationship diagram is also a specification of each student record instance (and that is what is meant by saying that it “represents a type of record”: it represents each instance of that type); but it does not represent another very similar specification. We may say, if we choose, that it represents the class of student records, but in the traditional sense of “class”, the collection of them (the Student table): more specifically, the name, “Student”, in the rectangle denotes that class. Nothing in a class diagram represents the set of student objects, unless we add an extra object class for that purpose.

 

Subtyping

 

 

What do we mean by saying that, for instance (and in the real world), Postgraduate is a subtype of Student? We mean two things:

Each postgraduate is a student.
 
If you choose any postgraduate and I choose any postgraduate, your postgraduate is the same postgraduate as mine if and only if your postgraduate is the same student as mine. In other words, we identify postgraduates and students (count them as the same) in the same way.
Contrast subtyping – that is, supertype/subtype relationships – with type/instance relationships, such as Copy (of a book) as an instance of Edition. Here we meet the first requirement, as each copy is an edition, but not the second. You and I might easily choose two copies, but only one edition: your choice could be the same edition as mine, without being the same copy.

Is subtyping the same as subsetting? Subsetting is a restricted sort of subtyping: the use of set theory limits us to a single identity, so we cannot count using “is the same copy as” and “is the same edition as”. So everything is counted in the same way, and we do not need to test the second condition given above.

And what about subclassing, using “class” still in the Object Oriented sense? (This may be called “object subtyping” by some authors.) It certainly is neither subsetting nor subtyping, and one presumes that it pertains to object classes, that is, to specifications of objects. One is tempted to say, by analogy with subtyping, that, if the Postgraduate class is a subclass of the Student class (and remembering that these “classes” are specifications) then:

Each object specified by the Postgraduate class (that is, each postgraduate object) is an object specified by the Student class (that is, a student object).
 
If you choose any postgraduate object and I choose any postgraduate object, your postgraduate object is the same postgraduate object as mine if and only if your postgraduate object is the same student object as mine.
But it may be that the second condition is not applicable: it is not at all clear whether object subclassing is intended to be similar to subtyping, or similar to instancing, or is a mixture of the two. Actually, whatever object subclassing is intended to be, it cannot actually be subtyping: although values (states of objects) can be subtyped, variables (including the anonymous variables we call “objects”) cannot be subtyped.

What in an entity/relationship diagram is a subtype? Suppose we have the two linked entity types, Student and Postgraduate, of which the following are required to hold true:

For each postgraduate record there is exactly one student record.
For each student record there is no more than one postgraduate record.
Deletion or update of a student record cascades to any linked postgraduate record.
In this case, Postgraduate is a subtype of Student. In other words, a subtype to supertype relationship in an entity/relationship diagram is merely an ordinary dependency – like OrderLine on Order – with the extra constraint that the relationship is not one-to-many but one-to-one.

It is important to note that the subtype relationships shown on an Entity/Relationship diagram pertain to the real world entities, not to the records in the database (better to say that the relationship between the records is specialisation rather than subtyping). Even if Postgraduate is a subtype of Student and so shown on the diagram, Postgraduate record is not a subtype of Student record. If, however, we consider the set of primary keys (say StudentNumber) found in the Student records, and the set of them that are foreign keys (and probably also primary keys) in the Postgraduate records, these two sets are superset and subset. But that, of course, holds true of any relationship that is obligatory at the foreign key end. What is special about subtyping is that the foreign key – StudentNumber in Postgraduate – is a candidate key (and often it is the primary key).

Specialisation, or “subtyping” in an entity/relationship diagram is what we may call “specialisation by extension”. The attributes in a postgraduate record could be used to extend those in its associated student record to give what we might call a “full postgraduate record”, thus:

CREATE VIEW FullPostgraduate AS
   SELECT *
     FROM Student, Postgraduate
     WHERE Student.StudentNumber = Postgraduate.StudentNumber
But it must be stressed that neither FullPostgraduate nor Postgraduate is a subtype or subset of Student. To obtain subtyping or subsetting of records we need to use what is called “specialisation by constraint”. This is far simpler, but for obvious reasons would not be shown on an entity/relationship diagram:
CREATE VIEW LeicesterStudent AS
   SELECT *
     FROM Student
     WHERE HomePostcode LIKE "LE%"
(begins with ‘LE’)
LeicesterStudent is a subtype and a subset of Student. Put simply, the difference is that in specialising by extension we find we have more to say about Postgraduates (which is why they have additional attributes in a postgraduate record); but in specialising by constraint we find we have less to say about Leicester students (because we could omit part of their HomePostcode, and some of their address as it happens). The constraint is, of course, merely the restriction (WHERE) condition.

Although specialisation by extension can be used to represent, or model, subtyping (of real world entities), only specialisation by constraint actually gives us subtyping (of records). As mentioned above, objects cannot be subtyped, that is, they cannot be specialised by constraint. What is called “subtyping” or “subclassing” in Object Oriented systems is usually specialisation by extension.

It should be noted that a postgraduate object corresponds not to a postgraduate record but to what was defined above as a FullPostgraduate record, and (assuming no other specialisation of Student), for certain purposes (it seems) a student object corresponds not simply to a Student record but either to a FullPostgraduate record or (for non-postgraduate students) to a Student record. However, if one attempted to assign the value of a postgraduate object to a student object, the extension data (that of the equivalent Postgraduate record) would not be assigned. In a relational database, it is not possible to hold both Student and FullPostgraduate records in a common table (because all rows in a table have to have exactly the same attributes). But an Object Oriented system could have a collection object that held all student and postgraduate objects.

If one kind of record is specialised by extension to give another kind, the relationship between the two is not supertype to subtype, nor superset to subset. But what is it? It is, fairly obviously, that each specialised record has as part a non-specialised record, just as each FullPostgraduate record has a part that is a Student record; and in addition we would say that no other FullPostgraduate record has the same Student record as part.

Now, how does this “part” relationship work in an Object Oriented system. (Remember that it is a part relationship in the system, modelling a supertype relationship in the world.) As far as state and behaviour is concerned, the relationship is simple. A postgraduate object has a state of which the state of a student object is part (just as in the relational case, as long as we remember that the postgraduate object is equivalent to the FullPostgraduate record). A similar relationship holds with respect to behaviour: a postgraduate object has all the behaviour of a student object and some extra behaviour, or at least that is the intention.

We find a difference when we consider identity. In a relational database, we would identify the records by primary key, StudentNumber. This ensures that no two students (whether postgraduates or not) have exactly the same student data (the same attribute values in the Student record). But in an Object Oriented system, we would probably be tempted to use the object identifier for this purpose. This would mean that we had to take special steps to ensure that the state of each student object (or relevant part of the state of any postgraduate object) differed from that of any other student object: almost certainly, we would do this by using a (human readable) StudentNumber, and the create method of the Student class object would be written to enforce this uniqueness (along with any update methods of the student objects that might modify the StudentNumber value).

Interestingly, we might consider how, in such an Object Oriented system, we would show that a non-postgraduate student had become a postgraduate. (In a relational database we would simply insert the required Postgraduate record, and the FullPostgraduate view would change accordingly.) Clearly, we have to create a postgraduate object (by sending a message to the Postgraduate class object, which itself would have to co-operate with the Student class object). Into this new object we would have to copy the current state of the student object (StudentNumber, Name, Addresses, and so on). Notice immediately that object identity within our system does not model student identity in the world: we have two objects, with different object identifiers of course, for one and the same student. Student identity is modelled by StudentNumber identity within the system (just as in a relational system).

This solution – creating a new object and copying state – does not merely prevent object identity modelling real world identity (which may not be very important). It also raises other problems: references (pointers) to the original student object no longer lead to the object that we wish to represent the student, which is now the postgraduate object. Either these references must be modified (which means that we must have back-pointers to them), and then the student object can be deleted; or we must have a reference to the new (postgraduate) object in the old (student) object, and – apart from any other problems – this would still leave us with redundant data (replicated in the old object and the new), and a risk of inconsistency, and we would have two ways to represent a postgraduate student, either by a postgraduate object alone or by an old student object and a new postgraduate object.

As specialisation of records by constraint is trivially easy in a relational system (the constraint being merely a restriction condition), and impossible in an Object Oriented system, the specialisations shown on entity/relationship and object class diagrams are not subtypings of records or of objects but whole/part relationships of records or of the states and behaviours of objects, which may represent subtype/supertype relationships in the real world. The implementation of such relationships between objects gives rise to a number of problems, not resolved here. But we should now be able to formulate the relationship between two class objects when one is specified to be a specialisation by extension (a “subclass”) of the other:

The specialised class object specifies, in respect of state and behaviour, those parts of the object constituting the required extension.
 
The specialised class object also specifies (directly, or more likely indirectly via the non-specialised class object) those parts and behaviour constituting a non-specialised object.

 

Aggregation (Part/Whole Relationships)

 

 

We have seen that specialisation – subtyping by extension – is actually achieved (in terms of records, or object states and behaviours) by a part/whole relationship (the whole being the subtype, the part the supertype). Such relationships are very common in relational databases: they are joins. But sometimes, on class diagrams, we see certain associations marked as part/whole (that is, aggregation) relationships. These are intended to represent real world aggregations.

All that needs to be said is that there is nothing special about these relationships: knowing that a real world relationship is an aggregation tells us nothing of how it should be represented in a system. In other words, we should treat special aggregation notation as purely informal.

 

 

SITE HOME PAGE

 

 

THE DATABASE PAGE

 

THE DATABASE PAPERS

 

DOWNLOAD

Download Objects and Entities in Restricted Text Format (rtf, Word for Windows compatible)

Another database paper ...

 

Copyright © 1994, 2001 Adrian Larner. The author asserts all moral rights.

The decorative image of a key (cc004239.gif) used on this page was obtained from IMSI's MasterClips/MasterPhotos© Collection, 1895 Francisco Blvd East, San Rafael, CA 94901-5506, USA.