(U03)

www.btinternet.com/~adrian.larner/database/pcl11

PLATOCLAST
ON DATA

Lecture XI
Nulls

 

 

I am now going to talk about data at the form level. A form or record – I will assume a normal record – comprises its fields, each a question and answer. The form is, we might say, entirely filled in; otherwise we would have a question without an answer, and that would be less than a field, and then fields wouldn’t be atomic, and we wouldn’t be talking about formatted – two-dimensional – data at all.

If you think in terms of paper forms then you don’t have to insist on all their fields being literally filled in: you may say, if you wish, that leaving a blank is one sort of answer. For instance, the question might be: Tick here if over the age of 70. Leaving a blank then is an answer: Not over 70.

We may assume, as the forms are in our data base, that they have passed all edit checks. So a field will hold a value (possibly “blank”) if and only if that value passed the edit check. And I think this means, as Dr Codd would say, the value is in the domain of the field.

Now suppose that we wish to handle several records together: we would think of them collectively as a table, and would so display them to the user. In a Relational system, such a table is, as we have remarked, rectangular: every row intersects every column; every row is of the same format.

 

 

Non-Uniform Format

 

The operation used to collect records together into a table is called “Union”. Strictly, Union collects records from two or more tables into one, but that “strictly” is because set-theoreticians distinguish between a record and the set (table) that has that record as sole member. Relational Theory imposes Union Compatibility: two relations can be unioned (their rows collected into a single table) only if they have all and only the same columns. This is necessary, of course, to ensure closure over relations: to make sure that each operation on a relation or relations results in a relation.

But suppose that we wish to collect records of different formats into a table: the resulting table will not be a relation; it will be two-dimensional but not rectangular. Some of its rows will fail to intersect some of its columns; i.e. some of its rows will lack field-types that other of its rows have. I want to stress here that in proposing such non-rectangular tables I am denying the importance of closure over relations. But that is because I have doubts about relations, not because I have doubts about closure.

We know that, given a relation – a rectangular table – by picking out one row (by its key, say) and one column (by its name) we will pick out exactly one value, neither more nor less. If we allowed polynormal records, we could pick out more than one value, but – for the moment – we don’t. If we allow non-rectangular, though two-dimensional, tables we will pick out either one value, if the chosen row and column intersect, or no value, if they do not intersect.

And now we have – I stress, at the form level – a sort of Nullity. It is merely this: we tried to pick out a value by giving a row and a column, but we failed (we did not “succeed in picking out a Null”!). We could say: the row is “null at” the column. But notice that there is no funny value; there is not even anything missing: the row, as a record or form, is completely filled in; it lacks nothing.

I really do want to stress the utter lack of mystery about this “Nullity”: it happens whenever we speak collectively of different sorts of things. We may speak of cups and saucers as items of crockery; and some items of crockery have handles; but saucers are not defective in lacking handles though we might choose to say that a saucer is null at handle.

In treating this sort of Null therefore we have to cope only with the way we speak of records: these Nulls arise only because we choose – no doubt for good reasons – to talk collectively of records of different format. And you may feel – at this point you should feel – that Dr Codd was probably right to insist on relations, and simply to avoid this sort of Null.

The question of relations versus tables (I will say “table” meaning two-dimensional, not necessarily rectangular, table) is very like the question of normal versus polynormal records. The latter add complexity: we should avoid them if possible. We have yet to see what is possible. And yet, there is this difference: the question of normality or polynormality seems to be one of structure, essential structure. But the question of relations or tables is one of language, of our DML: shall we admit a simpler language in which we speak collectively only of commonly formatted records, or a more complex language in which we can speak of any normal records?

 

 

Operations

 

Notice that the operations on records that we have already considered, Restriction, Projection, Cartesian Product, and so forth, extend very easily from normal records to relations. But they extend less easily to tables. Cartesian product is all right, but consider projection: we state a list of column names, the columns we want projected. Apply such a list to a record lacking some of the columns: do we want all the stated columns that the record happens to have (perhaps none at all)? or do we want a result only if the record contains all the stated columns? This is relatively easily solved (of course, with added complexity): we need to be able to specify multiple column lists, each list being applied to each record and producing a result only if the record has all the stated columns. Alternatively, or additionally, we might specify a list of columns in which some could be marked as mandatory and the others as optional: a record would give a result if it had all the mandatory columns, and that result would contain all the mandatory columns and those of the optional columns that the record contained.

But I’m not certain whether we should really say that projections of tables are more complex than projections of relations. I mean, suppose we have a query to be performed on some tables or on some relations. We might find that the query needed three column lists to be applied at one blow to one table, or three projections to be performed separately on three relations. So the query might turn out no more complex, and perhaps even simpler, on the table.

Restriction presents a more difficult problem, for two reasons. Firstly, a simple restriction condition, say “C@D”, where “C” and “D” are column names, may refer to one or two absent values: but that problem can be solved by using the “]” and “[” operators, as we’ve seen. Secondly, we shall probably want to use expressions, such as “C+D”, “C-5”, or “SUBSTRING (C, 3, 2)”, so functions (addition, subtraction, substringing) could be applied to absent values. I am not, for the moment, going to attempt to handle this problem: its solution is part of the general problem of how to handle operations on various types of data, not even – necessarily – in a DML. So suffice it for now to say that the problem of evaluating “C-5”, when “C” refers to no value, is not a problem in data base theory but a problem in numeric data type processing. We’ll come back to it.

So, if you’ll trust me for the moment to solve the expression evaluation problem, it seems that we can get the mechanisms needed to handle tables with absent fields, rather than just relations; if necessary. But we are some way from showing whether they are necessary.

Well, that was absent fields, and data at the form level. Remember: at the form level there are records, each complete in itself. The “Nulls” we have considered, and now called “absent fields”, appear only when (1) we take a tabular, or collective, view of records, and (2) such a view is not constrained to be a relation (i.e. of uniform format).

We now turn to the concept level. Naturally, if we take a tabular view of records at the concept level, and we do not constrain this view to be a relation, we shall get absent fields, just as we did at the form level.

 

 

Nulls

 

But there is another sort of Null that we can find at the concept level, a Null that is unrelated to tabulation; a Null that can appear within a single record. This Null is not an absent field: it couldn’t be, could it? It must be a field with a null value. Dr Codd doesn’t talk about null values. He says that Nulls are not values; he calls them “Marks”.[1] We will consider the possible causes for his eccentricity later.

Of course, if a field is not absent – and a field can’t be simply absent from a record, only absent from a record with respect to a table – then it has a field-type and a value. That, it seems to me, is utterly basic. That precedes and underlies Relational Theory, or any other theory of formatted data. I don’t say that nothing will make me change my view on that, but anything which did would require a total revolution in our thinking. And though the problem of Nulls is important, it’s not that important. You will appreciate that a record at the concept level is – i.e. it is the same record as – a record at the form level (though probably not a kept record at the form level). So, as each record at the form level comprises fields, each with a type and value, the same must hold true of records at the concept level. So Nulls – if they are not absent fields – are fields with null values. But what are null values?

As it happens, that is the crucial question: what are null values? It amazes me that – so far as I can tell – solutions to the problem, or problems, of Nulls have been attempted without an answer to that question being offered. Instead, people introduce particular examples that are supposed to convince us that we need some sort of “funny thing” in our data bases, and that funny thing is a Null. I don’t go in for funny things myself (at least, not professionally), but I’ll have a look at some examples.

 

 

Missing Information

 

Suppose we keep PERSON records, and among the fields in them are SURNAME and BIRTHDATE. What happens when we want to create a record for someone whose date of birth is unknown? This is supposed to be a difficult question, but actually it’s an easy one. We look at the values allowed to be entered in BIRTHDATE. If we can pick one of these values, then we are OK; if we cannot, then we simply cannot create the record, and there’s an end to it.

But perhaps I ought to expand on that a little. Suppose the only values allowed are dates. Then we could pick an arbitrary date, by guesswork, and risk having false data: that would be formally correct. We could do that and revise the definition of the BIRTHDATE column to “Actual or estimated date-of-birth”: that would be formally and factually correct. If we don’t make that revision, and we don’t allow estimation, and we won’t risk false data, then we cannot create the record; and again, that’s that. I mean: we still have a problem in that we can’t create a record. But uncreatable records certainly don’t give us problems in data base theory.

Suppose values other than dates are allowed, perhaps blank (remember: we treat blank as a value!), perhaps “unknown”, perhaps “?” Then we can use one of these non-date values.

Now you’ll find in most of the literature that when an author starts asking a question like, “What happens when we want to create a record for someone whose date of birth is unknown?” they’re going to introduce you to funny things – Nulls. But I haven’t: I’ve answered the question without considering Nulls at all; without mentioning Nulls, or anything even remotely humorous.

A similar question is: What happens when we want to create a record for someone who has no surname? But now you can answer that for yourself; of all the options, blank is probably the best. Please notice that when I say “blank” I don’t mean those characters “spaces”, nor the empty character string. I just mean the answer you give when you leave a field blank in the course of filling in a form.

I have a problem: I know what Nulls are; and I want to correct a very common and totally mistaken view of what Nulls are. But I find it difficult to state this view because it’s not merely mistaken but totally confused and incoherent. Dr Codd and Mr Date talk about “missing information”: let’s start from there. They would say that the unknown date of birth is missing information. Well, I suppose it is: but there can’t be a problem about how to record missing information, can there? Apart from the big problem, that is. It can’t be recorded at all. But what can’t be recorded doesn’t give us problems in data base theory: we have enough of those with things that can be recorded.

Let’s put the problem starkly: Fill in a PERSON record for this person with unknown date of birth! The BIRTHDATE field must contain their actual, known, date of birth. Well, you just can’t create the record, and there’s an end to it.

 

 

Modalities

 

So “recording missing information” doesn’t give us problems (though missing information – not having some information we want – might give us problems, but not in data base theory). Perhaps the question is: How to record that information is missing? Or, how to record why information is missing? But that, or why, information is missing is itself perfectly straightforward information. If we want to record whether a person has or has not provided their date of birth we can quite easily have a “BIRTHDATE PROVIDED” column (values Tick and Blank, or Yes and No). If we want to record what I’ll call the “modality” of the BIRTHDATE data in a record we can have a “BIRTHDATE MODE” column, with values like OK, Estimated, Unknown, Believed false, etc. Notice that a modality could apply whether or not a BIRTHDATE value was a date. Probably “Estimated” and “Unknown” would be the same modality value: we would say “Unknown” if the BIRTHDATE value were not a date – but a blank, say – and we would say “Estimated” if it were a date. Likewise the SURNAME MODE column might contain OK when SURNAME contained the person’s surname, or when it contained blank and the person had no surname.

How about this? We have a BIRTHDATE PROVIDED column; if this has the value “Yes” then BIRTHDATE has to be filled in with a date; otherwise it has the value “No”, and BIRTHDATE doesn’t have to have a value? Sounds fine to me, but if BIRTHDATE has no value then it’s not a field, not part of the record. So there would be PERSON records with a BIRTHDATE field and PERSON records without a BIRTHDATE field. In which case, we wouldn’t even need BIRTHDATE PROVIDED. But we would probably want a PERSON table still: it would have to contain records of different formats. We couldn’t have a PERSON relation, and Dr Codd wouldn’t like that.

 

 

What is the Problem of Nulls?

 

Look, let’s try another tack. Sometimes we choose to, or have to, record values like blank, unknown, not applicable, or whatever. And these values – which we call Nulls – give us all sorts of problems that other values don’t. How about that?

Let’s put aside all the problems that aren’t data theory problems: I mean, I often forget people’s names, and that’s a missing information problem for me, but I don’t think I’m going to solve it in the FOPC. And, as I think I’ve shown above, we either can get these values – blank, unknown, or whatever – into our records (so where’s the problem?) or we can’t. If we can’t then we have a problem, but not a problem in data theory. Or, if it is a problem in data theory, then merely relaxing the relational constraint – allowing non-rectangular tables – will solve it. But the only thing that drives us that far is that we refuse to do on disk what we would happily do on paper: leave a blank when there’s nothing else to write.

I think we had better start looking at the solutions people propose to the Null problem: perhaps that will give us a clearer picture of what they think the problem is. Dr Codd says, for instance, that the restriction condition “BIRTHDATE = 12/01/44” should come out neither true nor false but “unknown “, when BIRTHDATE is Null. But Dr Codd thinks “=” means “is the same as”, and if BIRTHDATE is Null in the sense of having a value like Blank or Unknown, then “BIRTHDATE = 12/01/44” is clearly false. I may not understand much about absolute identity, but I understand that much.

I don’t quite understand Dr Codd’s argument, but I suppose it goes something like this: “BIRTHDATE = 12/01/44” means that the person represented by the record in question has the date of birth 12/01/44. So if they do have that date of birth then it should come out true and if they don’t it should come out false, and if we don’t know ... But that’s not a third option: either they do or they don’t, there isn’t a third option. Try again: if it’s known that they have that date of birth then it should come out true, and if it’s known that they don’t then it should come out false, and if it’s not known then it should come out unknown.

Let’s notice what Dr Codd is doing. Firstly, he’s changing the meaning of “=” from “is the same as” to “is known to be the same as”. That’s what justifies his use of “true”. Secondly, he’s changing the meaning of “false”, which normally means “not true”, so “BIRTHDATE = 12/01/44” would normally be called false if it were not true, which means (under the new interpretation of “=”) that BIRTHDATE is not known to be the same as 12/01/44. And these radical redefinitions allow him to introduce “unknown”, supposedly as a third truth-value. Of course it’s not a third truth value: “false” has been redefined so that it is not a truth value at all. The old truth value, “false”, has been split into the new false and unknown.

Or perhaps – who can say? – Dr Codd has left the interpretation of “=” unchanged, but abandoned truth values altogether, using “true” to mean “known that”, “false” to mean “known that not”, and “unknown” to mean “not known whether”. You’ll appreciate that it’s just a minor modification to the FOPC to abandon truth values entirely. (Don’t just note that remark down, Judy. It was intended to be ironical.)[2]

In any event, if the date of birth of someone is not recorded (whether merely not recorded – Blank or Unknown – or even recorded wrongly) then it’s a hopeless task to ask a DBMS to tell us whether it is or is not 12/01/44. We might as well ask for a system that gave us the answer to the riddle of the universe. And we can see that all this talk of “knowledge” comes down to what’s recorded, rather than what’s known. And surely our DBMS will tell us what’s recorded: what there is at the form level. There won’t be any unknowns.

Mr Date, I should say, is a lot more circumspect. He has a proposal – to use default values instead of Nulls – and I’ll tell you now it’s ever such a good idea to have default values.[3] It is, however, a solution entirely orthogonal to the problem of Nulls: but I can’t prove that to you until I’ve solved the problem of Nulls. And I can’t solve it until I’ve said what it is.

 

 

Nulls and Joins

 

Now, I’ve spent a long time teasing you about the problem of Nulls: pretending to introduce it as others introduce it, and then stonewalling so that it doesn’t emerge. But here is a place where it emerges clearly, at least as an example. Consider a table of Tennis Players, and let it have a SKILL LEVEL column, with values High, Medium, Low, Novice, and Unknown. I want to join the table with itself to show the possible pairs of players, who should have the same skill level.

Before you ask, there is no prize for guessing that the notion of criteria of identity will be used in solving the problem of Nulls. I think you see the problem: if Barbara and Joan both have skill level High, or both Medium, or both Low, or both Novice, then we do say: they have the same skill level. But if they both have skill level Unknown?

Well, it’s getting to be obvious isn’t it? “Unknown” – although a permissible value in the SKILL LEVEL column – is not a skill level. So if we take “=” to mean “has the same skill level as” (the column criterion of identity), we find that “Unknown = Unknown” is false. But “Unknown º Unknown” is true.

Of course Dr Codd and Mr Date couldn’t solve this problem. They couldn’t see how to deny “Unknown = Unknown”.[4] But then, they take “=” to mean absolute identity. And no, I’m not going to tell you how to do the query on the Tennis Player table: it can’t be done, not completely done, because there isn’t enough information recorded.

I should say that this problem with Joins, specifically with Natural Joins, is quite devastating to current Relational Theory. That’s because, in many cases, a relation is not stored if it could be decomposed into two relations and recreated by a Natural Join, typically on a key of one of the two relations. So what get stored are a number of relations, which in practice (often through views) have to be joined. And now you can see – without appeal to all that stuff about keys representing the self-identity of entities – why Dr Codd’s first integrity rule excludes Nulls from keys: it’s because these important Natural Joins are on keys, and Nulls wreck Natural Joins.

I will go into all this data base design by decomposition, Natural Joins, “higher normalisation”, and the like in a later lecture.[5] A lot of it is just a horrid mistake, so we won’t worry about it for the moment: it is enough for now to note that Nulls give problems with Natural Joins. No, it’s not enough. You should also worry about the very concept of Natural Join: when we do a “natural” join that involves an equality between two columns, which equality is “natural”? Is it the column criterion of identity, “=” say, or its completion, “º”? I’ll tell you: sometimes the one seems natural, sometimes the other. But what “feels natural” is not always a good guide to formal theory construction. (Judy: add “characteristically Platoclastian understatement” in parentheses.)

 

 

Semantic Nulls

 

So we have some insight, just a peep really, into the problem of Nulls. But I return to the question: what are Nulls? Almost everyone seems to take it that Nulls are values (or marks, or things) like Blank, Unknown, Not Applicable, Withheld, and so on. I think we might say, in a vague sense, that they are improper answers to questions, perhaps socially improper, like “Get lost!” Let me characterise this theory as the Semantic Theory, i.e. that there are a range of values, including but not exhausted by the above, whose meanings mark them as Null. Thus, if I find a value “Unknown” in a column, then it is a Null. I have, in my relatively short life, argued hard and long against the Semantic Theory. If it is true, then we are pretty well doomed for these reasons:

1
The formalisation of a single concept such as “known”, “recorded”, or “asserted” – a modal concept – requires a logic that is ill-understood (I’m working on it) and whose mechanisation would use more computing power than we could possibly afford. And, incidentally, such a logic cannot be finitely-many-valued (we have a proof of that).[6]
2
The collection of Semantic Nulls appears to be unlimited.
Of course, the collection of ordinary values, Semantic Non-Nulls, is also unlimited. But we have a general logic that covers them, the FOPC. But notice what that means: the logic runs disinterpreted. It doesn’t understand that red is a colour: for all it knows, “Red” is a make of car, and “COLOUR” means “Car Make”. A general logic to cover Semantic Nulls would have to treat “Unknown” under “BIRTHDATE” just as if “Unknown” meant “Withheld” and “BIRTHDATE” meant “Religion”.

But if such a general logic could be found, then the Semantic Theory would be dead anyway: the whole point of the theory is that there is a range of Null meanings. The very existence of a general logic would show that this was not so, but that there was one common feature of Nulls.

Well, I’m fed up arguing against the Theory of Semantic Nulls, so I’m going to accept it. And then I’ll show that it is totally irrelevant.

 

 

Improper Values

 

The theory I’m going to propound is about what I will still call “Improper Values”, but in a well-defined sense. I’ve already said what I mean by a value being proper to a column: V is proper to C if the column criterion of identity of C holds between V and something (and therefore between V and V). I will say that V is an Improper Value of C if V is not proper to C. I will also give the criterion of identity of an Improper Value. Let “=c” be the criterion of identity of C, then V is proper to C if V=cV; and V is the Improper Value of C if VºcV AND NOT V=cV. You will observe, again, that the definition of the completed criterion of identity ensures that each column has only one improper value.

I now show that the concepts of Semantic Null and Improper Value are orthogonal. A value can be neither, either, or both.

 

 

Proper Semantic Nulls

 

I begin with “Unknown” and “None”, clearly Semantic Nulls. I consider a table of Customers with a column INCOME having values High. Medium. Low, and Unknown. I also have a table of Treatments keyed on INCOME and with columns CREDIT and DUNNING. The former gives the credit limit, £1000, £400, £100, or £l00, and the style of dunning letter sent (when necessary), Fawning. Polite, Rude, or Polite, accordingly as the INCOME is High, Medium, Low, or Unknown.

To discover a customer’s credit limit, or the style of dunning letter to send, the Customer record is joined with the Treatment table on INCOME. This works, despite Unknown being a Semantic Null. When the user is asked, “But do you say that two Customers with INCOME Unknown have the same income?” they reply: “Yes, for all practical purposes we say that.” The value Unknown is therefore proper to INCOME: it is a Semantic Null but not an Improper Value. And it gives no problem on the Natural Join, indeed it is a key value (so it breaches the first integrity rule, if that rule really applies to Semantic Nulls). But it doesn’t give rise to any integrity problems.

A glassmaker has a Product table with column COLOUR. Values of COLOUR are Red, Blue, ..., None. “Do you say that a sheet of glass with COLOUR None has the same colour as any other sheet of glass with COLOUR None?” we ask. “Of course.” comes the reply. “None” is clearly a Semantic Null but not an Improper Value. We could have an Increments table keyed on COLOUR, with columns showing COST and PRICE increments for each colour, including None. No problem on Natural Join.

 

 

Improper Semantic Non-Nulls

 

I think you will just grant me that there are values that are both Semantic Nulls and Improper Values; and, of course, values that are neither. I have just shown that there are Semantic Nulls that are not Improper Values, and I now turn to Improper Values that are not Semantic Nulls.

A hospital keeps PREFERRED FIRST NAME in its Patient records. This column has the value “Baby” for as yet unnamed newborn babies. The users do not say that two such babies have the same preferred first name. A Household relation has the column SIZE with values 1 to 9, meaning “number of persons in the household”, except that 9 means “9 or more”. The users do not say that two households with “SIZE: 9” have the same number of persons. A Personnel file has a SALARY field, with the value 0 for voluntary workers. Two voluntary workers are not said to have the same salary; they have no salary at all. An Employee file has a LOCATION field, whose values are London, Birmingham, etc. depending on their usual office location, or Home if they work from home. Two employees with “LOCATION: Home” are not said to have the same work location.

I do not think that even the wildest holder of the Semantic Theory would wish to classify “Baby”, 9, 0 (well, perhaps), or Home, as Nulls. So the concepts of Semantic Null and Improper Value are indeed orthogonal. But which one is it that gives problems with Natural Join?

Well, it’s obvious that “Improper Value” is pretty much defined as “value that gives a problem with Natural Join”. Obviously if a value meets the criterion of identity of its column, it’s not going to give a problem with Natural Join; and if it doesn’t, then it is going to give a problem. And that question is quite independent of whether the value is a Semantic Null or not. In fact, merely by virtue of being a Semantic Null a value gives no problems whatsoever. Though naturally Semantic Nulls often do give problems with Natural Joins: they give such problems just when they are also Improper Values. So shall we just ignore Semantic Nulls, with a sigh of relief?

And I hear such a sigh: thank you. Oh ...I see ...end of the lecture.

 

 

Semantic Nulls

 

 

(The editor is unable to date the following fragments of exposition.)

All right, let’s be unbelievably optimistic about the semantic theory of nulls. Let’s suppose that we could find them all, or find all the important ones: unknown, withheld, none, not applicable, and so on. Now I suppose it would be reasonable to say that if n1 was a semantic null and n2 was a semantic null, then if n1 had all and only the characteristics of n2, and n1 was to be processed under all circumstances in the same way as n2, then n1 would be the very same semantic null as n2.

And now imagine that we wanted to state all the characteristics and processing rules applicable to these semantic nulls. How would we do it? One pretty obvious, and surely unexceptionable, way would be to have a table with one row for each semantic null, and columns showing their characteristics and rules. And what would the primary key of such a table be? Clearly, the null itself.

What a pity that the Entity Integrity rule doesn’t allow that. Never mind, we’ll give a name to each semantic null: we’ll give the name “Call-me-unknown” to unknown, “Call-me-none” to none, and so on. And these names are perfectly ordinary, non-null values. Well, that’s just fine: I think I’ll simply forget the nulls and use their names instead wherever the nulls now occur. So I’ll find every unknown value, or – in Dr Codd’s terms – every missing value with mark “unknown”, and replace it with the perfectly straightforward value “Call-me-unknown”, and I’ll adjust the domains of columns accordingly. Now we don’t have any semantic nulls: problem solved!

I’m not certain, naturally, what problem is solved, but presumably it’s whatever problem semantic nulls were supposed to give. We would still have the problem that I want to solve: the problem of improper values. Because if unknown is not the same date as anything then Call-me-unknown is not the same date as anything. So exchanging a non-null for a null – in the semantic sense – doesn’t make that problem go away.

Notice, incidentally, that we could have – in our table of nulls – not only their names as primary key column, but also the nulls themselves in another column. But that column couldn’t be even a candidate key column, because it contains nulls you see. So we’ld be hard put, having found a null in some other table, to look it up in the table of nulls. Using Dr Codd’s logic it wouldn’t even give an equal comparison: it would give some sort of “maybe-equal” comparison...

 

 

Which Semantic Null?

 

The proponents of semantic nulls – it seems to me – claim a remarkable ability to distinguish between them. “Date of Birth unknown”, they say, “so you need an Unknown Null”. But suppose I call the column, “Known Date of Birth”: presumably I then need a “None”, as there is no known date of birth. Or they say that in an EMPLOYEE record with JOB and COMMISSION, if the JOB isn’t “Sales Rep” the COMMISSION is “Not Applicable”. But what happens if we project away JOB? The COMMISSION was “Not Applicable to employees other than sales representatives”; but it’s certainly applicable to employees, or the column wouldn’t be in the EMPLOYEE record. So does the “Not Applicable” null have to become a “None”?

 

SITE HOME PAGE

Or assume that we keep separate record types for employees with commission, and those without. Then the COMMISSION in the latter is not null at all, but absent.

THE DATABASE PAGE

THE DATABASE PAPERS

 

Preface & Contents

 

DOWNLOAD

Download Lecture XI (rtf, Word for Windows compatible)

Platoclast on Data: Lecture XII

 

Copyright © 1993, 2001 Adrian Larner. The author asserts all moral rights.