Retrieving the Newham Archive
The Newham Museum Archaeological Service and its predecessors, the Passmore Edwards and Manor Valley Museums, were active in archaeological fieldwork across North East London for several decades. Their numerous fieldwork activities accumulated significant amounts of data from the London area. Most of their work was carried out in advance of housing development, so the sites under investigation could not be re-examined after the fieldwork had been completed. In 1997, the future of the service was threatened, and despite hopes that it might be merged with another unit, it was closed in 1998. Closure came abruptly, with only a few days notice. The computers upon which their work had been carried out, and on which much of the data was stored, were seized by the local council and promptly sold. The staff of the unit went their separate ways, taking new posts largely unconnected with their previous work: but only after a desperate salvage operation in which the entire contents of the assorted hard disks were copied onto floppy disks. This single, far-sighted action saved from oblivion almost ten years worth of work. The London Borough of Redbridge, who had no facilities for digital archiving, nor any familiarity with archaeology, then presented the discs to the Archaeology Data Service (ADS - http://ads.ahds.ac.uk), hoping that the ADS could retrieve the files from the archive. On inspecting the data, the ADS found numerous problems. In some cases, the data had become corrupt, or was created with obsolete software. In many cases there was no documentation.
The example of the Newham Archive demonstrates that forgoing efficient documentation during data creation endangers the existence of that very data. The Newham data has only been saved through the hard work and expertise of the ADS, a task that has been costly and difficult. But these costs would have been avoided if good practices had been adopted at the point of creation. Studying the Newham Archive as an example of how data can be lost gives an indication of what these practices are, practices that will safeguard data in the future, and offers advice that is not only useful to archaeologists, but also pertinent to a much wider constituency.
Figure 1 - The Newham Archive on the ADS website
The Newham Archive arrived at the Archaeology Data Service on 239 discs, holding 6350 files. Amongst other items, it contained graphics, text documents and data, in a variety of electronic formats. Around 180 different sites were represented from digs in Waltham Forest in the north-east of the city to Beckton in the east. The subjects of the files were equally varied, ranging from detailed site databases to CAD drawings to scraps of ideas and administrative notes.
Given that the files were up to ten years old, it was not surprising to find some of them in poor condition. 25 of the files were corrupted, around a dozen of which were unrecoverable. This, however, is a reasonably low percentage. The 3.5" discs on which the data was provided offer a relatively safe means of storing data in the short term. Older formats, such as 5.25" floppy discs, are more delicate.
The files were developed from ten-years worth of software, much of which is now obsolete. For instance, a sizeable proportion of the documents had been composed on the late 1980s word-processing package Word Star. The ADS was only able to retrieve these files after applying specialist software and much hard work. This problem of obsolete software applied not only to text documents in the Newham Archive, but graphics files produced from TurboCAD software, the early version of which has some major differences to the more recent editions.
These problems can be avoided with the sensible application of procedures to safeguard the information. A regular cycle of creating back-ups helps avoid the problem of corrupted files. Ideally, the back-ups should be kept in a different storage medium to the master copies, physically located in a separate place, and their functionality should be regularly checked. Within this cycle of back-ups, owners of digital data need to keep data 'fresh', i.e. on discs or CDs appropriate to contemporary hardware. They also need to keep files readable, by aligning them with the latest version of each software package, or, at the end of a project, moving them to open formats that are readable by any relevant program.
While the majority of the recovered files in the Newham Archive functioned in the technical sense, this still did not re-convert them to usable archaeological data. Many of the files arrived without any kind of documentation. The Newham Archive covered a wide range of work and without explanatory information it was impossible to deduce the precise nature of each file's subject matter. Although software packages often provide the capability of giving document summaries, these were not used, nor were there any separate log books doing this task. Additionally, the files were scattered around in irregular places, and not filed to a logical pattern. Neither were naming conventions adhered to, further complicating the task of the ADS in cataloguing the archive. In many further cases, the data appeared incomplete. For example, there would be references to certain documents or databases that did not appear on the discs handed to the ADS. Such problems all threatened the existence of the work done by the Newham Service; if data makes no sense it is worthless.
The lack of documentation was most conspicuous when examining database files. For example, one large database was populated entirely with numbers, save for one record that mentioned a 'patella', i.e. the kneecap. The database was presumably a collection of bones collected from a particular site (possibly the Stratford-Langthorne Abbey, but then again, possibly not), but there was no documentation to explain where the site could be, nor what any of the other bones were. In cases such as this, documentation is an obligation not only to preserve the sense of the database, but to demonstrate that archaeologists are sensitive to the obvious ethical issues when dealing with human remains.
The shortage of documentation wasn't confined to databases. Even text documents were ambiguous in their status. While the subject matter of many documents was immediately apparent, the context in which the document was written could still be unclear. Was a document an initial summary of private thoughts, a working copy or a final version? Guesses could be hazarded, but were not necessarily conclusive. For example, the ADS discovered one article written on a series of excavations done at Tilbury Fort. As it appeared finished, the ADS prepared to present the article on their website. It later transpired that the text was a draft version of an article that was just about to be published, and so, for copyright reasons, the article was withdrawn from the website. In other cases, more than one version of the same document appeared, and it could not be ascertained which was the more recent. Only by providing the relevant documentary information could the Newham archaeologists have hoped to preserve their digital material and communicate it to other scholars.
The loss of data from the Newham Archive is as much to do with poor project planning as it is do to with preservation. The archaeologists were not prepared for a sudden closure; there was no exit-strategy providing details of what to do with the data in such a situation. Needless to say, preserving and documenting data should not be an additional chore done at the end of a project, but an on-going process that is it integral to the creation of the data. This is the best method, in fact the only method, of ensuring the safety of one's data.
Recording Greater Amounts of Data
The state of the Newham Archive reflected the archaeologists' use of computers. None of the data there was being prepared for electronic publication - rather the computer was being utilised as a tool to aid hard-copy publication. All the databases were for personal analysis and the CAD drawings were being composed only as a stepping stone to paper publication. Once the computer data had been synthesised, the original files, one presumes, would have been deleted. Because the fieldwork that informs such data cannot be repeated, losing the files is losing the primary contact with the original sites.
Conscious of this potential loss of data, recent archaeological policy, such as English Heritage's Managing Archaeological Projects (MAP 2), gives archaeologists clear guidelines on what data should be preserved after fieldwork. Depending on the scale of the project, different types of archive are produced, including site summaries, matrices, specialist reports and diaries. According to MAP 2, these archive types should all preserved, whether the material is publicly disseminated or not, and whether it is created using computers or pen and paper.
With this increasing amount of information produced by archaeologists, the electronic format becomes a more popular and attractive option for recording that data. A carefully considered digital archive allows this information to be catalogued and accessed in a manner not available in a paper archive. Databases or CAD drawings are much more flexible than their paper or microfiche equivalents. More precise and complex questions can be answered much more rapidly. Moreover, the advent of electronic publication allows for seamless links between the traditional data archive and publication, giving rise to a new dynamic between the two. Finally, an increasing familiarity with computers and IT means that this data is more accessible in its digital form than ever before.
The ADS is well placed to help disseminate and protect digital data created by archaeologists, but this work could be greatly assisted if the lessons of the Newham data are learned. If we are to take advantage of the new technologies, it is essential to think of preservation and in particular about documentation. Summary pages and logbooks of the data are necessary; naming conventions should be adopted; and files should be presented in a rational fashion. Where possible, documents, databases and graphic files should be presented on up-to-date software, though a competent archivist can often assist the process of migration. An archivist cannot provide supporting documentation: this can only come from the original creator. More than anything else, the Newham case shows that documentation is a sine qua non of digital preservation.
The demise of the Newham Museum Archaeological Service is a worst-case scenario. It is unusual for an organisation to be shut so suddenly, for the data to be in such a condition and the staff to be dispersed so suddenly. Nevertheless, it emphasises the fact that there is more to digital data than just its creation and manipulation. As with an excavated artefact, digital data has to be carefully documented and preserved for it to function as a legitimate archaeological resource.
1. The Archaeology Data Service Catalogue containing the Newham Archive
2. English Heritage (1991): Management of Archaeological Projects, 2nd edition
3. Archaeology Data Service (1998): Guidelines for Depositors http://ads.ahds.ac.uk/project/userinfo/deposit.html
4. Reeve, J., Richards, J., Robinson D., Wise, A. (forthcoming): Digital Archives from Excavation and Fieldwork: A Guide to Good Practice, 2nd Edition
This Case Study has been written by Alastair Dunning
with much assistance from William Kilbride and
Keith Westcott of the Archaeology Data Service.