Link to home page of AHDS History

 

Collections
Support
Creating
Guides to Good Practice
Information Papers
Case Studies
AHRC Applicants
Depositing
News and Events
Historical GIS
About Us
Job Vacancies
For Genealogists
Search Site
Relating the Past:
Efficient Database Design

Content written on 30th May 2001 by Alastair Dunning

Some historical sources lend themselves to being modelled in a database. Censuses, wills, or taxation records are all recorded in a structured fashion, and this structure can easily be imported to a database. But when trying to deal with a number of historical sources the matter is more problematic. While the sources may be purporting to record the same kind of thing, the level of detail may be highly divergent, and no clear structure is apparent. Nevertheless, such an array of sources can still be modelled in electronic form using a relational database. As opposed to the single-table database (called a flat-file database) which may be used to record a single structured source such as a census or a series of wills, the relational database contains a host of smaller inter-linked tables, developed so to reflect the variety of the source material.

The AHRB-funded project, Religion, Dynasty and Patronage in Rome, c.440 - c.840, is a project that, in incorporating several heterogeneous sources, is employing a relational database of some complexity. Based at Manchester University, the intellectual context of the project is to examine the changes in society that followed the fall of the Roman Empire. The database records the many aspects of patronage (employment, gifts, bequests, favours and donations) that lubricated the social machinery of the era. Popes, barons, bishops, envoys and merchants secured either political support or social favour by dispensing and receiving patronage, and the process was such an important aspect of early medieval (and later) society that records of it appear frequently in documents of the time. Sources such as the Liber Pontificalis and various inscriptions and charters are all fruitful, even if many of them were not composed with the deliberate aim of documenting patronage details. The Manchester team, therefore, wish to bring these records together in the database so to provide not only a catalogue of references relevant to patronage, but to allow historians to ask meaningful questions over a range of data much wider than had previously been available.

Examining the Sources

Figure 1 - A sample of an edited version of Pope Gregory's Letters

Figure 1 - A sample of an edited version of Pope Gregory's Letters

The Patronage project is based on an earlier project (Family and Monastery in Sixth-Century Rome) which itself had the same aim of recording details of patronage, but by concentrating on one source only, namely the letters of Pope Gregory the Great, in itself a fairly substantial resource. Figure 1 shows a section from Gregory's letters, part of an edited version of the letters prepared by a German scholar in 1990. The text that has been underlined refers to instances of patronage. One can see that the first example of patronage (from lines three to seven) is much shorter than the second, which continues for the rest of the page. Creating a flat file database from such a potentially varied source would be difficult, and the database would, in the end, be hard to maintain or retrieve information from. A relational database provides a much more flexible, but still structured, way of holding a wide variety of data.

Deciding on Entities

Figure 2 - The design for the Family and Monastery Project

Figure 2 - The design for the Family and Monastery Project

At first glance it appears a rather daunting diagram, a collection of abbreviations and arrows pointing in contrary directions. The design is, in fact, a distillation of what the team believes the crucial elements of any instance of patronage that they found in the source. Each box represents a single table, or entity, within the relational database; the headings within each box refer to the fields that are part of that table. The arrows indicate which tables are linked - you will note that tables that are linked always share one field. The database system is able to connect together records in different tables by looking for records that share a common field and have the same value for that field. When it came to data entry, the project team went through Gregory's letters, extracting and inserting data according to their database design. So the three tables on the central line (Person, Role and Institution) referred to each particular figure involved in the patronage, the roles he or she played within various patronage operations, and the related institutions.

Other elements from the original source were recorded, including the person's title, and his or her relationship to other persons in the database. The team also recorded the text of the Latin source. This was split and recorded in two separate fields (both called 'Text' and placed in P_notes and I_notes) and related to the information about the source itself.

Expanding the Project

This design worked well for Gregory's letters. But as soon as the project expanded, the team realised that the design was not adequate to record the details taken from other primary sources. The Patronage team therefore had to re-think their database design, creating a model that would be developed enough to cope with the full variety of sources. To develop such a model, planning was essential. Such planning involved envisaging the quirks and ambiguities in each source so that the team could capture the full richness of the documentary material. Leaping straight into full data-entry (usually the most time-consuming and hence the most costly part of building a database) without sufficient planning would have meant discovering data that did not fit the database design. As a result, this would have necessitated modifying a partially-built database - a very complex task, fraught with the potential for mistakes. Careful planning of the work is the way of avoiding such pitfalls.

Having further pondered the nature of the primary sources, the entities of the database were altered and added to. For example, the pilot design had placed Person, Role and Institution as the central entities. Gregory's letters had tended to record instances of patronage using this scheme. Yet in other early medieval sources, these entities are not necessarily all part of a patronage transaction; a relationship might have been recorded as being between two institutions or two people. Thus the entities Donor and Recipient were developed, which could articulate the patronage relationship without having defined them as persons and institutions.

Simplifying Concepts

This updated model, however, turned out only to be a temporary design. Analysis of the data revealed that the data was much more fluid than this model allowed. For the second model, the Patronage team worked on the assumption that every patronage relationship involved a donor and a recipient, and these two entities were worked in to the design accordingly. Yet the mechanism of patronage was more complex than this. Often, there was an enabler involved in the relationship, acting as a broker between patron and client. Additionally, there were several instances where patronage was triggered by someone's death - the patronage was inspired in memory of the deceased. The team decided that this important figure was an integral part of the relationship; yet a database structure based only on donor and recipient omitted this.

The project thus returned to including a family of tables (indicated by the light purple colours in the illustration) headed by a central table entitled Roles. Within this, the team would be able to label the disparate players involved in a patronage relationship. The same philosophy applied to other parts of the database. Rather than being very specific entities, the tables were broad enough to cater for different values, such as Roles, Transactions and Participants. Around these broad entities, smaller tables were attached to store frequently-repeated data, such as Library Location, related to the larger table Source. An example of the broadening of entities might concern the Person and Institution entities from the earlier model, which were both described by the 'Type of Participant' field within the Participants table of the later designs. Getting a good balance between the abstract and the specific is very important - too specific and the data will not conveniently fit into the table; too abstract and too much data of a disparate nature will slot into one table.

Figure 3 - The current database design for the Patronage project

Figure 3 - The current database design for the Patronage project

One other problem concerned the recording of the Latin text relating to the instance of patronage. In the earlier example, text was split up according to Person or Institution, or Donor or Recipient. This was a tricky task, however. Sometimes the reference to the person could be sandwiched in between the reference to the institution. A new table (Text) was therefore created which could hold all of the relevant text in one space, along with various related tables to hold sundry information. This makes it easy for the user to consult an entire transcription of the source they have used for the database; a vital component if database creators wish to indicate to their users how they have interpreted the primary sources.

One final point. While seeming like a logical affair, good database design is also helped by thinking about the language used to describe the tables within a relational database. Earlier designs had thought of patronage as involving gifts from patron to client, or even vice versa. Yet on occasions, elements of patronage could involve the sale of land. Gift was not the correct definition for such events. The Gift table therefore became the Transactions table.

Categorisation and Documentation

Developing a database requires clear thinking not only at the design stage, but at the point of data entry, too. A degree of standardisation is required so that, for example, a figure whose name is spelt differently in the sources is not represented as two separate people within the database. This is especially needed if more than one person, who might each have their own concept of standardisation, is doing the data entry. The Patronage team also had to develop regulations for how to extend abbreviations and other occurrences of text that required expanding. Other instances of data entry require categorisation. For example, church property often formed part of a patronage bequest or donation. Does one record 'chalice' and 'goblet' as separate items, or are they included under a catch-all 'drinking vessels'? Alternatively, actions were often instigated as a result of patronage. Does one include 'cleaning of the mosaic' in the same category as 'cleaning of the church', or should it be in the same category as 'creating a mosaic', or, indeed, are they are three separate categories. The problems in categorisation are the same as elsewhere in database design - be too specific and one will not be able to ask meaningful questions of the data; be too abstract and the richness of the data will be lost.

Categorisation and standardisation should ideally be accompanied by the transcription of the original source. Every time the Patronage team standardised part of the source, they were making an interpretation that another historian may disagree with. The transcriptions provided in the Text table permits the scholar to check what has been altered. But in some cases there is ambiguity within the text itself. Letters and numbers can become faded on a manuscript - is the year referred to AD 690 or AD 699? Thus the team have also included an Alternative Text Readings table to flag up the indistinctness of a source, and an alternative reading to the one they have decided is most likely. Including a table such as these is part of a larger process of scrupulous documentation. Besides the text, the Patronage team include extensive details on the historical document, and, if applicable, the secondary source from which they have transcribed it.

This process of meticulous documentation links back to one of the original aims of the project, for, armed with extensive documentation, the database also acts as a catalogue. Many of the individual references to patronage (as well as the figures, cities and institutions involved) are tucked away in obscure secondary sources, or lie unarchived in distant libraries. Previously, it would have taken the historian much bibliographic detective work to locate some of the references particular to her work. But having access to the Source table within the database allows the researcher to pinpoint a reference that would have taken hours to discover previously. With this, even the historian who entirely disagrees with attempts to model data in electronic form can take something from the database.

Intellectual Landscape

The experience of the Patronage team in developing several models for their database is not unusual. Experimentation is essential, and this will inevitably include some wrong turns in trying to develop a workable design. An iterative process of design and testing, with samples of real data, can iron out many problems before the main data entry begins. This will save much time and wasted effort, and apart from identifying technical issues, it may also highlight further research questions worth investigating. Now this part of the project is accomplished, the Patronage team have begun their close reading of the sources from which they are extracting the material relevant to their historical questions. Once complete, the team hope to see their resource accepted as a valuable resource within their intellectual community. The project team does not presume that their creation will give definitive answers on questions of patronage in the Early Christian era. Rather, the database is helpful as a research tool in two instances. Firstly, as mentioned, it points users, in a matter of seconds, to particular references in the primary sources. Secondly, the database allows users to discover patterns in the social make-up of the era, whether over a very broad sweep of time, or focused on one particular location. Using these clues, historians will be able to construct new hypotheses about early medieval Rome and then consult, in greater detail, the manuscripts and other sources themselves. From there they will be able to either confirm or reject these new ideas. Good design - entailing a model that does not impose restrictions on data entry by being too specific, but rather allows the primary material to shape its contours - is the key to achieving a database that can do this, and thus providing a digital resource that stimulates new questions for the historian to examine and answer.

The project has received funding from the University of Manchester Research and Graduate Support Unit and the Arts and Humanities Research Board. Team members, involved at various stages of the project, have been Marios Costambeys, Clare Pilsworth, Conrad Leyser and Julia Hillner. The team continues to work closely together.

Thanks to the team for their help in writing this study as well as Hamish James.