-- Managing Change with Digital Data: ---- The Case of the Essex Sites and Monuments Record -- ----------------------------------------------------------------------------
Content written on by .
The role of the Archaeology Data Service within the AHDS has since been taken up by [AHDS Archaeology] . ----------------------------------------------------------------------------

Taking advantage of the latest technology

Digital technology has changed enormously over the past twenty years, and continues to do so. For example, successive generations of database systems have offered more accurate and sophisticated ways of organising and analysing data. But for those who have been storing and managing datasets over a long period of time, the benefits afforded by more powerful systems have been accompanied by one large problem. Each time it is decided that a new database system is to be used, those maintaining databases must export the data from the old system to the new; no easy task if the two systems employ very different structures. This problem is not unique to database management systems, but is more apparent here than elsewhere because they have been in use much longer than other software tools. The lessons learned and problems encountered in this field are thus of relevance to anyone working with expensive digital data in the medium or long term.

A pertinent example of these issues comes from the Archaeology Section of Essex County Council. The county maintains an extensive database summarising all data relating to the local historical environment for the purposes of planning and research. Managed by Paul Gilman, this Sites and Monuments Record (SMR) has been through a myriad of changes as new technologies and new opportunities become available. Since being translated from a paper record in 1984, the Essex SMR has been through 4 metamorphoses. Such a large number of transformations is only likely to happen to a large dataset, where there is the financial and technical support to execute the transformations. For shorter-term projects, as are more common in the academic sector, there is less of a need to make such a series of changes. But any project with a longer lifespan is likely to have to update its software at some point (whether because of the emergence of new data, new technology, or a new set of questions needing to be asked). Such long-term research projects are often the more expensive and higher-profile ones, which have much to lose if the transition is not handled carefully. The example in this case study, the Essex SMR, illustrates the reasons for changing and the challenges that are faced by those implementing those changes.

Why Change?

Table 1 - Database systems used for the Essex SMR

Table 1 indicates the five database systems that the Essex archaeologists have used since their records went electronic in 1984. Each one was instigated for a variety of overlapping reasons, the common motivation being the increased capabilities of each new system. Even if it is a something of a cliché to note that advances in hardware technology allow the development of more powerful pieces of software, it is nonetheless true. Each new generation of software that Essex adopted could work with increasing amounts of data, and at a quicker speed. More recently, this increased power in handling data has become as much a qualitative as a quantitative matter. Essex's most recent database system, Exegesis, developed by Exegesis SDM Ltd from the database shell provided by Microsoft Access, is designed to allow the storage of data according to a widely accepted and more flexible data structure - the event/monument/archive framework. Whereas the initial systems (Version 1 and Superfile) modelled their data in a "flat-file" system, mainly on simply the attributes of given monuments, Exegesis structures its data in a relational database according to not only the attributes of monuments, but also the archaeological work done upon that monument (events), and the written articles, books etc. that resulted from that research (the archive). This event/monument/archive structure is more complicated, but it enables users to handle the relevant data much more effectively. Exegesis has the added bonus of greater compatibility with Geographical Information Systems (GIS) allowing data in the SMR to be represented, managed and modelled in an environment that is sensitive to geographical considerations.

Coupled with the changes in handling data, newer databases have provided a more convenient interface for their editors. From the primitive Version 1, which was really based on word-processed files and demanded the insertion of unusual grammatical symbols to distinguish between fields and records, through to the professional layout of Exegesis successive systems have made it easier for the database editor to enter and examine data.

Another motivation for change has been an increased standardisation in classifying data. National archaeological bodies have seen the need to create nationwide standards in the recording and labelling of archaeological data, so that a common language could be used to understand and analyse disparate SMRs. Whilst implementing these standards is, as we shall see below, a time-consuming task, their existence invites database editors to update and change their databases so as not to be left out as others progress according to a common standard. This was one of the reasons Essex transferred to SMR Online, as the system had been recommended by the national lead organisation for SMRs, the Royal Commission on the Historical Monuments of England (RCHME), and implemented many of its standards. Further changes to the Essex SMR have also been partially stimulated by the standards devised and agreed by the RCHME.

Finally, change can become essential for negative reasons, such as recurring technical problems. Essex dispensed with Superfile predominantly because it was an unstable program, liable to crash without warning, and SMR Online and Monarch required too much technical assistance ever to be fully implemented.

Difficulties with Migration - Technical Aspects

Migrating data from one system to another is a tricky business. Different systems interpret and structure data in different ways, so a form of translation is required when data is moved between systems. The size of the Essex SMR meant that technical assistance was often required to implement this translation. This was especially true of the later systems, the much-increased capabilities of which demanded more sophisticated methods of migration. When the RCHME attempted to supply the Monarch system to SMRs such as Essex in 1991/2, they hired a programming company (SPS Ltd.) to assist them with the switch. Explaining the precise archaeological contours of the SMR to non-archaeologists proved a tricky task, slowing down the process of migration.

Other technical obstacles arose even when Essex did not use external help. In-house technical expertise was difficult to rely upon and the added cost of importing new programs and hardware bit into the archaeology section's finances. Paul Gilman also found that, whilst migration was in process, the continuing task of creating new records was drastically slowed down.

Imperfections in the New Database

Even the most thorough programmer would be unable to transpose the data fully from one system to another. One of the principal reasons for moving system is the greater complexity and depth offered by the new system, complexity and depth that will only work if there is sufficient data available to analyse. Therefore it is sometimes the case that the data from the old system is not sufficient to take advantage of the facilities offered by the new system, or even comply with its basic requirements. One simple example is Essex's migration from Superfile to Monarch. When adding dates of excavations in Superfile, the archaeologists had only to specify the month and year of the excavation. However, Superfile demanded values for the month and the day for both the start and finish of the dig. The Essex Archaeology Section had not retained the paper records in many cases, and therefore had to insert dates for only the beginning and end of the months or numbers based on memory or guesswork.

As the scope of database systems has broadened, such difficulties have become more specialised. As mentioned above, the most recent system, Exegesis, allows archaeologists to distinguish between particular monuments and the various excavations and research work that enabled those monuments to be identified , i.e. the event/monument/archive structure. Previously such disparate items had been clumped together in one record. For example, the Superfile system that predated Exegesis did not have that degree of sophistication, and therefore records of excavations had no distinguishing features that would allow Exegesis to identify them as different entities. This was especially true of 'negative events' - records that document where archaeological excavations uncover nothing of value. When the data was carried into Exegesis, the program interpreted the 'negative events' in the Superfile data as being separate monuments, and they were recorded in the updated SMR as such.

The shortcomings in migration mean that the content in a new database system is not perfect. Paul Gilman, who has been responsible for overseeing many of these changes, accepts this as being inevitable in any transferral of data, particularly of a large dataset. A programme of manual correction accompanies many migrations of data, where the database moderators examine the new database, checking and amending mistakes in the records. Essex has done this, to an extent, for every migration. But for a database as sizeable as this, manual correction is a daunting task, the completion of which is beyond their available resources. Often, records are only corrected when errors are discovered, meaning that many of the problems may take many years to identify and rectify. This is feasible in this case, because the data is primarily used for management purposes. However, this will become an increasing problem in the future as access is widened to include the public. For other data, where statistical or geo-spatial analyses are being executed across the whole data set, such light maintenance is not an option. Any excess of bad data will distort the results, and will therefore need to be corrected before any kind of serious calculations can commence. Failure to recognise such problems could undermine and ultimately overwhelm a research project.

Standardisation

The migration of the Essex Sites and Monuments Record in the 1990s to more sophisticated relational databases required a much greater degree of standardisation than existed in earlier but more straightforward "flat-file" databases. Because relational databases are created from a collection of smaller tables, they must have overlapping terms so that the tables can be cross-searched. Essex therefore had to execute the essential task of standardising the names, places, and terminology of this data. Exegesis also requires detailed standardisation in terms of its geographical descriptions, so that GIS can be employed in a consistent and precise manner. Having standardised data also makes its easier for data to be submitted and quickly understood by other bodies, such English Heritage's National Monuments record and local museums.

Much of the debate over standards in archaeology centres on the specialised task of creating terminologies to describe various sites and artefacts. Through much of the 1980s and early 1990s archaeologists in East Anglia had been developing their own terminology list. This list, classifying archaeological finds, was incorporated into their early electronic SMRs, including the versions used in Essex. However, when Essex decided to introduce the Monarch system in 1991/2, which was developed centrally by the RCHME, they found that it would only work with a nationally agreed terminologies also developed by the RCHME. Some records, therefore, had to be altered to fit with these national standards. While this was a time-consuming task, the adoption of a standardised national list eased the later migration to Exegesis. But even then, the switch to Exegesis demanded that Essex take further decisions about terminology. The terminology list supplied with Exegesis did not completely correspond with the one for Monarch. For example, in Exegesis there was no separate category for pottery types. So while there had been one record of 'Belgic pottery', in the earlier database system, the staff had to decide upon a term in the national list that could reasonably be used to define this particular find. It was placed under the family of finds headed 'vessel' - a category much less precise than the original. Such decisions were made in full awareness that there was no definitive answer to how certain objects should be re-classified. What was important here was that Essex devised a system of rules so that any data requiring re-definition could be dealt with in a logical and consistent manner. Consistency, Paul Gilman emphasised, is vital in transferring data to a new system.

The final point on standardisation concerns coding. Databases systems designed for colossal amounts of data, such as Oracle, tend to promote standardisation by asking users to code their data, e.g. using numerical codes to define types of archaeological find. Paul Gilman, however, does not recommend this strategy. With codes, the editor of the database is constantly referring to subsidiary tables or pieces of paper, which are normally too large to be memorised, and could also be lost. In the experience of Essex it is far better to use clear and common sense naming, the meaning of which is transparent to those creating, and later editing, the content.

Importance of documentation

The existence of good documentation smoothes out the difficulties in moving data between different systems, especially when the data has been collected over a long period of time. Because of staff changes and the passing of time, it may not be immediately apparent why the fields of an older database were organised in a particular way, and yet it might be crucial to getting the full richness of the data. Providing documentation during the insertion, and during the migration, of the data, explains to later users the rationale behind preparing the data in a certain fashion.

One problem Essex encountered illustrates this. When programmers arrived in 1997 to help move the Essex SMR from Monarch to Exegesis, they needed to discuss certain issues with those that had executed the previous migration, SPS Ltd. The SPS team, however, had been dissolved by then, and had left no precise details on how the migration had been achieved. Without this knowledge, the task of the latter migration was more expensive and took longer to achieve.

Along with documentation, good preservation techniques are necessary during migration, as they should be at any stage during the system's lifecycle. It is essential that there are reserve copies of the data (as Essex has had for each migration) to fall back upon in case of glitches in the migration process.

The Collective Approach

Many of the difficulties facing the Essex Archaeology Section were alleviated by their involvement in a national project. Advice from the RCHME, from other local government archaeologists and the ADS have set high standards for Sites and Monuments Records all over the country. For example, the event/monument/archive model that is supplied by the Exegesis program and was developed by the Association of Local Government Archaeological Officers and the RCHME, is utilised by many SMRs, so there are a number of colleagues with whom Essex can discuss the problems and ramifications of importing this new software. The RCHME have also created thesaurus lists so a standardised terminology list for classifying sites and monuments is now in operation. Essex was one of the first county SMRs to adopt many of the RCHME's digital advice and has also benefited from their financial assistance in implementing their new databases. Essex County Council Archaeology Section has been fortunate in this respect - other digitisation and research projects do not have this wider network of support, but advice and guidance can often be sought from an institution's own computing services or from organisations like the AHDS.

Operating in a more collective environment has its drawbacks, however. Whereas an individual is free to choose when to update their database system, those working in a larger group might have to change because of a general consensus. While Essex is confident that the event/monument/archive structure of Exegesis will be a standard for some time, this is not the same for the related GIS. GIS are immensely powerful and useful tools, but at present there are few defined standards for the geographical description of the data stored in the SMR. For example, there are no agreed standards for symbols on maps, nor are there agreed standards for recording the accuracy and precision of data. Consequently, members of staff are unsure how much information they should retain or display when creating new maps. Essex is currently recording perhaps a little more data than they will ultimately need. Therefore if GIS standards are defined at a reasonably precise level, Essex will have enough data to respond effectively. Retaining "extra" data, while time-consuming now, will be rewarded if the system is to be restructured at a later date.

Conclusion

The example of the Essex SMR has highlighted that managing change in systems is inevitable and should not be taken lightly. There are various problems to be tackled during the course of migration, relating to standardisation, documentation and technical issues, all of which need to be planned. Accepting and incorporating these tasks into the project will make the task easier, hastening the time when editors and users will be able to exploit the advantages supplied by the new system.

Many thanks to Paul Gilman of the Archaeology Section of Essex County Council
and William Kilbride of the [Archaeology Data Service]

----------------------------------------------------------------------------

Page last modified: by
[Email the AHDS] | [Site Index] | [Other Relevant Services] | [Latest Collections]