Gumbleton One Name Study: Database project

When I joined the Guild of One Name Studies (GOONS), I took seriously its aim "to promote the preservation and publication of [one-name genealogical] data, and to maximise its accessibility to interested members of the public". Hence I became very interested in the use of the internet as the ideal medium for providing widespread access to data.

I had made patchy use of computers for storing genealogical data, since the early 1980s but now wanted to make the bulk of my researches available on the Web. Initially, I put a lot of information into word-processed documents and used built-in filters (eg menu items such as "Save as HTML") to convert the information to web pages. However, I was very dissatisfied with the results and ended up preparing most of the web pages by hand. The trouble is, this creates a maintenance problem: when you want to add a new item, you have to add it to your 'master' information, in a word-processor document, and also into the relevant web page. This provides plenty of opportunity for inconsistencies to arise between the two versions.

I tried several commercial genealogical packages, to see if they would help, but none of them seemed to be the right tool for the job. The problem seemed to be that they start at 'the wrong end'! For a conventional genealogist, the central body of data is the people who are linked together through relationships. Sources are referenced to support assertions about these relationships. One name genealogy (for me, at least) seems to start at the other end of the problem: I have a central body of source information which refers to the name but can only gradually be associated with particular people.

It has been this need to cope, not just with source data, but also with information about individuals, that has been the undoing of my current website. I started to create web pages that included biographical information about individuals and, naturally, it made sense to link this information to a transcript of its source. So, if I am stating that an individual was married on a particular date, it would be sensible to provide a link to the place on the website where I have transcribed that person's marriage certificate. What's more, hypertext lends itself very well to this kind of linking.....up to a point. And that point is one of complexity. As you add more records and more individuals, the number of links between them becomes very large and starts to reveal a weakness of HTML encoding: namely that all the information about the relationships between items of information is distributed between pages and cannot be easily managed. So, for example, if I edit a particular page, there is no easy way to discover which pages link to it and whether these links will need to be updated to reflect the changes to the page.

By this stage, I knew that I needed a more sophisticated technology and decided to experiment with a Relational Database Management System (RDBMS): a software data repository which stores information in tables and allows it to be accessed and indexed in complex ways. Plenty of people use databases to store indexes, but my plan was to store the entire contents of the website and, at the touch of a mouse, to be able to regenerate the website automatically, every time I updated any records. To start with, I took a couple of example pages from my website and tried storing the main items of data in tables of a database and wrote a simple program that would pull the data out of the database and assemble it back into the original pages. This worked well enough, and I now embarked on a more substantial experiment. I selected an appropriate database package: almost any would suffice, but I chose MySQL, which has the advantage of having a GPL licence so is, to all intents and purposes, free. MySQL also interfaces very easily with the PERL programming language, which I use extensively for manipulating text files. So I now had an RDBMS and a way of customising information that goes in and out of it. I would, however, admit that these particular technologies are not for the fainthearted: they are certainly outside the realm of shrink-wrap software, even if they don't quite qualify as geek-ware!

My next step was to choose a manageable subset of information to use to check the feasibility of this approach. In terms of data sources, most of the stuff that was already on my website (GRO births, deaths & marriages, censuses, wills, etc) could be quite quickly converted and stored in the database. I then chose as a group of individuals, all the Gumbletons whose births were registered by the GRO from 1837 to 1901. This amounted to about 230 people: a large enough sample to test the approach but just about small enough to make manual corrections to data, if necessary. It also represented a period of history when a lot of information about individuals can be found and, furthermore, includes (almost) all the people who will appear on the 1901 census, when that becomes available.

The current design of the database is as follows: firstly it is divided, conceptually, into two main areas which store information about records and people, respectively. These two categories are reflected in the design of the physical website, which has two major items of contents: the 'compendium' of Gumbleton records and the details of people. Each page in the compendium is represented within the database as a 'collection' of data which includes a web page template and various records. The data extraction software assembles the webpages from these templates and records. Records can be nested to an arbitrary level, so that the whole 1841 census can be referred to as a record, in which a household is a lower-level record and the detail about an individual is an even lower level. Each type of record has an associated template that is used to represent it on a web page. By allowing records to be broken down in this way, it is possible to assign a unique identifier to each fine-grain record as well as to broader record categories. These identifiers are used within the system for linking things together -- more on this later.

Details of people are stored differently, with one central table that stores the main information, such as name, birth and death dates and occupations for each individual. Again there are HTML templates which determine how this information is rendered as web pages. Each person's data also includes (if know) the identity of their parents, and the computer program that generates the 'people pages' is able to find the parents and create hyperlinks to them. Because of the existence of these parent links, the program is equally able to locate each individual's children and include references to them from the parent's web page. Hence, each person's information does not need to include references to multiple children, so long as each child's information refers to its two parents.

As well as the two main sets of tables, relating to records and people, there is a third table that connects the two: this is the 'Events' table. A particular record may be associated with a number of events: for example, a marriage record may represent an event for the bride, the groom and the witnesses. Each of these entries in the events table provides the link between the record and the person to whom the event relates. When the web page for an individual is generated, the programme searches for all the events that refer to them and summarises these on the page.

To summarise, then, whenever a page is generated for an individual, the programme sticks together the basic information about that person, plus references to any children who identify this person as a parent, plus any events that identify this person. All this data is then rendered by means of a configurable template.

So how well has this worked? The project is still not complete: the results to date can be seen on the Gumbleton.com website. Some things have worked well:

Some things haven't worked so well: The real proof of the pudding is in the eating and it would be very useful if other people who look at this website would provide feedback on what they think of it. Any feedback would be much appreciated. Please email steve.m.west@btinternet.com.
Back to Gumbleton Home

© Steve West, June 2001
steve.m.west@btinternet.com