Developing a Database Interface:
The Continental Origins of English Landholders
Introducing Prosopography
The concept of prosopographical study, where an historian examines, particularly through naming conventions, an historical character's position within his family, institutions and society, has been in existence since the sixteenth century. An historian, for example, discovers a name in a manuscript similar to one seen in a previous manuscript. How to investigate if the two documents are referring to the same person? The prosopographical historian does this by exploring the available primary sources to identify the figure's parents, siblings and children. If the family members overlap, then the figure mentioned in the two separate documents is presumed to be the same. As with other disciplines, advances in computer technology have been able to expand the traditional parameters of the field. Because prosopography demands investigating and analysing a large number of documents, scattered around various libraries, it can be an arduous and time-consuming task. With these documents digitised, however, researchers can compare and cross-reference the historical figures at a much quicker speed.
Prosopography is well suited to the study of the pre-modern period, where a relative paucity of documents, as well as complex naming practices, mean that the precise placement of an historical character within society is often difficult to trace. Dr Katharine Keats-Rohan's database Continental Origins of English Landholders (COEL) is a database designed to help the prosopographical researcher. The COEL database houses documents and records, most notably elements of the Domesday Book, pertaining to the acquisition of English land by the Norman conquerors of the century following 1066. Additionally, it includes commentary by Dr Keats-Rohan on these records. Because of the complexity of both the original sources and the type of analysis that a prosopographer wants to make, the database could not immediately be incorporated into an existing package such as Access or Paradox. It was necessary to exploit the customisable possibilities of Microsoft Access, extending and enhancing the original database program until a suitable database interface was established. This case study examines the planning and decisions that went into developing a customised database that could accommodate heterogeneous data, and an interface that would allow the prosopographical researcher to extract and analyse the relevant data with the minimum of difficulty.
A Trial Period
On the project's inception, in 1992, there had been no thought of using computers as analytical tools. Technology's involvement only came when Dr Keats-Rohan, having gathered a large, and rather unwieldy, card index system, realised the potential for recording such data in database form. As stated above, initial attempts were made on existing packages such as Paradox. This was fine for simply transposing the card index into digital form, but Dr Keats-Rohan quickly saw how a well-designed database could facilitate some of the more complex research needs of the prosopographer. It became apparent that Paradox could not adequately cope with the full range of relevant sources. Dr Keats-Rohan wanted a database that contained some tabular records (including individual names extracted from the Domesday Book) and some full text documents (such as the Cartae Baronum, records relating to the property and tenants of medieval landholders) that could be a good few hundred words long. Such lengthy chunks of text were too large for Paradox to handle satisfactorily. Precise functions that would have aided the prosopographer's task (for example, a graphical tool outlining family relationships) were also not available in the usual database programs. Even more simple operations, like text-searching, were difficult to execute in some databases. Therefore, after a year's work following the possibilities with existing programs, Dr Keats-Rohan took the decision to create a more flexible system. Microsoft Access was the tool for doing this. While Access is used by some as a straightforward database, many historians, such as Dr Keats-Rohan, are realising its customisable nature. Users can, by employing the programming code understood by Access, develop their own shell that is specifically suited for their own data. By enhancing Access so to develop her own structure, Dr Keats-Rohan would have a system that would exploit the richness of her academic data.
While this initial year produced no tangible results, the experiences furnished Dr Keats-Rohan with a thorough knowledge of the issues involved in constructing and managing databases: the differences between relational and flat-file databases, and source- and model-orientated database systems. She considered this learning curve a crucial aspect in the development of what would become a very successful database. Other such projects, she added, might gain from beginning with a pilot study. This would provide clues for the best ways for the database shell to be structured, and also uncover any possible problems that would hinder its creators when developing the full-scale database.
Incorporating Precise Academic Needs
While developing a customisable shell in Access obviously produced many additional technical issues, it also allowed Dr Keats-Rohan to create an interface that integrated the specific academic needs of the prosopographer. The earlier trial and error period not only taught her about databases, but also clarified the academic issues at play in creating a computer-driven prosopograhical tool. These could now be enmeshed into the design of the database.
Deciding how the records within the database system would be structured was an important part of this. Which information would be extracted from the Domesday Book? Obviously, the fields had to relate to the interest of prosopographer, i.e. concentrating on the named characters in the sources. Thus, in Figure 1, one can see fields for 'forename', 'surname', 'manor of residence' (in both the original Latin and translated English) and various other important pieces of data. Intelligently organising the distribution of fields in the database is essential if the database is going to be a useful tool.
For Dr Keats-Rohan, one very particular issue to be addressed was the tendency of medieval historians to accept previous interpretations of medieval documents (rather than re-question given assumptions) and thereby duplicate earlier academic errors. She insisted that the COEL database should not encourage historians to fall into this trap. Two related conditions were therefore essential to the development of the database. Firstly, that whatever position in the database the researcher is in, she is only a double-click away from returning to the original primary source. Secondly, the database interface should never offer definitive interpretations, but only allow the clarification of the possible interpretations open to the historian.

Figure 1 - Names on the COEL database extracted from the Domesday Book
While this initial year produced no tangible results, the experiences furnished Dr Keats-Rohan with a thorough knowledge of the issues involved in constructing and managing databases: the differences between relational and flat-file databases, and source- and model-orientated database systems. She considered this learning curve a crucial aspect in the development of what would become a very successful database. Other such projects, she added, might gain from beginning with a pilot study. This would provide clues for the best ways for the database shell to be structured, and also uncover any possible problems that would hinder its creators when developing the full-scale database.
The database was therefore constructed on three inter-dependent levels. At the first level were the original sources, including full text transcriptions of the original medieval documents and tabular records of persons extracted from lengthy documents such as the Domesday Book. Figure 1 shows some of the names extracted from the Domesday Book. The second record in this table, for example, shows the existence of someone with the Christian name 'nigellus' from the manor of Walingford. The second level (as shown in Figure 2) is a list of each name mentioned in the sources, retaining the full appellation in the Latin form. The 23rd record shows the mention of a 'roberto' from Essex with the surname 'lincolne episcopo', presumably Bishop of Lincoln. The third level is the interpretative work done by Dr Keats-Rohan on the names existing in the original sources. The names in the Level 2 list are merged if they turn out to describe the same person, and each individual is grouped according to their relationships within their family. As can be seen in Figure 3, the relatives of Cecilia Bigod are listed on the screen: her parents, husband, siblings and children. To each person or family a commentary, composed by Dr Keats-Rohan, has been added, referring the person or family to other data (whether primary or secondary) not included in the COEL database.

Figure 2 - A sample from the full list of names on COEL database
Incorporating Precise Academic Needs (continued)
The genealogical relationships and many detailed commentaries that Dr Keats-Rohan has included (as in Figure 3) make the COEL database a slightly unusual one; it is more common to find databases which house only the digitised data, without any interpretations by the creator. Unlike traditional historical resources, the COEL database is both a primary and a secondary source. This gives the database the quality of being a genuine electronic book - like any traditional book it offers historical insights, but readers can verify the reasoning supporting these insights by switching to the primary sources. If users do not agree with these insights it is possible for them to record this - Level 3 permits users to insert their own exposition of the primary sources. At no point are historians locked into accepting the interpretation of other historians. Users of the COEL database system can decide if two separate appellations in two separate documents actually refer to the same person. They can also judge if one character is actually a relative of another. Level 3 is the point on the database where the prosopographer can mark these relationships between the various historical figures that inhabit the post-conquest documents. It is entirely possible that a researcher could add records to Level 3 that document opinions entirely different to those supplied by Dr Keats-Rohan.

Figure 3 - Possible family relationships based on the evidence held on Level 1
Dr Keats-Rohan's development of her database system was driven by the belief that though computers can enlarge the scope of humanities research, they cannot answer questions in disciplines that are heuristic in essence. The COEL database is not answering questions about prosopographical research but making it easier for the prosopographer to do that research. For example, the computer can create for the user a graphical representation of families based on the relationships decided upon by the user. Figure 4, for example, demonstrates the genealogy of the Bigod family. All of the relatives are definitively labelled. Roger's sons, for instance, are marked the light blue: Wilhem, Roger II, Humfrid and Hugo Comes. If the historian is not sure of an exact familial relationship, the database can reflect this. The colour code on the left of Figure 4 shows that the user can assign a 'probable' tag to a relationship. The system always allows the user to cater for the inherent ambiguity in its Level 1 sources.

Figure 4 - A graphical depiction of possible family relationships
Programming the Database
Dr Keats-Rohan hired a professional programmer to tailor Access to COEL's needs. It was a complex process for the programmer, requiring not only a through knowledge of Visual Basic (the code used to customise Access), but a variety of other programming codes. It was an expensive process too, but it was entirely necessary in the light of the desire to develop a customised database shell. Difficulties during the construction of the database shell were minimised by remaining focussed on the simple yet fundamental aim of the database - a database which displays the primary source to the user, extracts certain elements from the sources and then permits possible analytic interpretations. With a clear idea of the design organised beforehand, the development process was relatively unproblematic. When difficulties arose, returning to the initial concept of the three-layered structure clarified the problem. A well-planned model was essential to the success of the database.
There were two discrete parts to this process. The shell of the database, without any content being added, was constructed first. Having completed this skeleton, the COEL database could be fleshed out with the content. Initially, the text sources were entered manually; with developments in Optical Character Recognition (OCR), the sources (some of which were taken from existing printed versions, some of which Dr Keats-Rohan prepared herself) were scanned in. Research assistants helped with the painstaking task of inserting the tabular records. Finally, Dr Keats-Rohan inserted her own research and commentary.
The employment of a professional programmer also permitted the project to have various off shoots. A portable demonstration version, i.e. something that can be fitted on a floppy disc, of the COEL database was designed, allowing Dr Keats-Rohan to showcase her database at various national and international conferences. (This demonstration program can be downloaded from the Prosopography homepage). Perhaps more importantly, this technical expertise has allowed the current development of a customisable version of the COEL database. Users of the database will be able to re-organise how the data is examined so historians can make different analyses of the data already existing in the COEL database (a geographical historian, for example, might be more interested in references to place names rather than peoples' names). But the possibilities for the customisable version of the database are even more flexible than this. Because content and structure were developed independently, they can be separated, leaving other users to insert their own data: therefore, for example, the system will be able to operate just as a powerful archiving tool, recording sources at Level 1. This is a significant advance on many previous historical databases, which often work only for a very specific academic dataset. The flexibility of COEL pushes its potential user group beyond prosopographers. Dr Keats-Rohan believes that even non-humanities scholars will be able to utilise the system, and hopes that in marketing it she will be able to regain some of the initial financial outlay.
While the technical expertise is doing much to help create this customisable version, it was essential to have a clear design policy at the database shell's inception to allow the system to evolve in such a fashion. This is not only true for the customisable version, but the COEL project as a whole. Having a clear conception of how the database is to respond to the needs of the researcher is a prerequisite of its construction. Good planning allows an intelligent database to be produced and provide the scholar with the support she requires.
Links
1. The Unit for Prosoprographical Research
http://www.linacre.ox.ac.uk/proso.html
2. Sean Townsend, Cressida Chappell and Oscar Struijvé, Digitising History: A Guide to Creating Digital Resources from Historical Documents - http://hds.essex.ac.uk/g2gp/digitising%5Fhistory/
3. Katharine Keats-Rohan, Historical Text Archives and Prosopography: The COEL Database System, unpublished manuscript
Many thanks to Katharine Keats-Rohan of the University of Oxford,
Hamish James of the History Data Service
and Raymond Rood of Datacraft UK