Digitisation. A Project Planning Checklist

The content of this document was last edited in 1999, and it is no longer being updated. However, it has been left in place as a many parts of it may still be of interest.
  1. Introduction
  2. Project design
  3. Project implementation
  4. Long-term maintenance and use
  5. Further reading
  6. Appendix. Estimating digital reformatting costs

The document offers practical guidance to those considering a digitisation project. It takes the form of a check-list of strategic issues which need to be addressed in a project's design phase. The issues follow the life course of a digital resource from its inception through to its development, maintenance and use. It does so because decisions taken about a digital resource at any one stage of its life will have ramifications for decisions which will be or have been taken about it at other stages.

Critical to the success of any project, the planning phase will determine whether, how, and at what cost digital resources are created and, critically, how those resources, once created, will be used. Issues that need to be addressed include:

A simple but essential cost-benefit analysis which may involve:

Where digitisation projects entail the production of digital surrogates for items within existing collections, an element of selectivity is involved. That selectivity should be guided by clear and consistently applied criteria which may take account of:

An initial review of the technical requirements that will ensure a digital resource actually serves the purposes for which it is made. The review may take account of the following with regard to the creation, management, and delivery or use of a data resource:

Having defined the aims, content, and technical aspects of a digitisation project it should be possible to estimate costs and to assess how and from what source(s) these may be met. (See the Appendix which supplies a costing model for use by managers of projects digitising paper-based information.)

Although implementation is largely a technical and administrative matter, it is essential that techniques and administrative practices are suited to a project's aims and to the funding and technologies available to it. Accordingly, implementation strategies need to be assessed as part of project design.

This phase will involve review and selection of data creation strategies (e.g. OCR, keyboard entry, digital photography, conducted in-house, contracted out, etc.) and related hardware and software. The review will also involve selection of those standards and best practices that will help digitisation projects maximise their achievements while minimising their costs. Standards and best practices deserve especial consideration because they are bewildering to most. Selection will depend in part on what kind of data resource is being created (standards appropriate for digital images are different than those appropriate for electronic texts or GIS), and in part on the uses to which a data resource is intended to be put (imaging standards appropriate for web-delivery of thumbnails are different than those used for archive-quality digital reproductions). There are also different kinds of standards which serve very different purposes as follows:

Data once created need to be managed on a day-to-day basis. How and where data are stored will be determined by how, and how frequently, they are intended for use. A number of storage/use scenarios exist and need to be considered in a project design phase. They include:

Data resources need somehow to be located in order that they may be used. What information is available will depend upon what documentation standards are adopted. How information is made available will depend upon users' resource discovery requirements and the tools selected to meet them. Amongst the tools that may be provided are:

How data are delivered to and used by end users will be contingent upon how and why they were created or acquired, how they are stored (e.g. on-, near- or off-line), and upon what software and hardware is needed to access them. User scenarios may include:

Having created a digital resource project managers will want to ensure it is used and maintained effectively. Data usage, support, and maintenance practices will be highly contingent upon why data were created in the first place and chosen to suit a digitisation project's aims. Accordingly, they need to be considered as part of the project design phase.

Technologies are arguably changing more rapidly than scholarly culture. Accordingly, some digital resources may remain under-utilised for a time after they are created. Obstacles to use that may need to be overcome include:

Data resources are typically very expensive to develop. Investment, however, may be repaid if the data can be made available without content loss despite changes in hardware, software, and network technologies. Long-term preservation may be achieved by a number of means either in house or through deposit with some archive facility. However it is achieved, the prospects for and costs of long-term preservation will be determined to a large extent by decisions taken during a project's design phase. Strategies for preserving data include:

Managing a digital resource over the long-term involves a degree of administration which needs to be planned from the outset. Consideration may need to be given to version control, order processing, and rights management and protection.

Owing to the costs involved in digitisation, whether and to what extent a data resource may be used to generate revenues are becoming key issues in project planning. How to design and implement cost recovery models is accordingly a concern in the long-term maintenance of any digitisation project.

based on Research Libraries Group, Worksheet Estimating Digital Reformatting Costs (1997, revised May 1998)

Ten step programme
Selection of materials
  • Identify materials ( Determine legal restrictions ( Investigate the availability of digital and other versions
  • Eliminate items which are in poor condition or incomplete ( Determine appropriate conversion process (e.g. film, then scan, disband originals etc.)
  • Calculate staff time for selection of materials = cost 1
Determine the size of the collection
  • Count number of titles, volumes and pages to be imaged, from bound or unbound documents
  • Count number of frames, fiche or reels of micro-images to be converted
  • Count number of finding aids required
Prepare documents
  • Retrieve documents from storage
  • Remove documents from circulation
  • Record physical condition of documents
  • Collate and identify missing pages and damage
  • Repair and replace missing or illegible pages
  • Prepare intermediates (e.g. photocopies, transparencies)
  • Disband originals (when required)
  • Create documentation for bibliographic control, indexing, tagging and encoding information (when required)
  • Calculate staff time for preparing documents = cost 2
Determine imaging requirements (benchmarking)
  • Assess essential document attributes to determine scanning requirements (resolution, bit depth, enhancements, file format, compression)
  • Confirm results by scanning a sample
  • Perform inspection of sample on screen and in print
  • Calculate staff time for benchmarking = cost 3
Determine requirements for and create metadata
  • Create catalogue entries for digital resources
  • Determine file naming and structuring strategies (e.g. individual images cf. Groups of images)
  • Create additional indexes (e.g. index at article level for journal literature) or revise/enhance existing finding aids
  • Calculate staff time for preparing metadata = cost 4
Determine imaging costs
  • Assess costs of external or internal service providers = cost 5
Determine text conversion costs
  • Define nature and extent of text conversion (e.g. full-text of all or specific documents)
  • Assess costs of external or internal service providers = cost 6
Determine SGML encoding costs
  • Define nature and extent of coding and accuracy requirements
  • Assess costs of external or internal service providers = cost 7
Determine Finding Aid Conversion costs
  • Define nature and extent of finding aid conversion and encoding
  • Assess costs of external or internal service providers = cost 8
Post-process quality checking
  • Load digital files
  • Conduct data integrity checks
  • Perform on-screen and paper inspection
  • Ascertain accuracy and consistency of file naming , structuring, text conversion and encoding
  • Integrate corrections into the digital file sequence
  • Create derivatives for network access
  • Calculate staff time and non-personnel costs (e.g. hardware) for quality checking = cost 9
Estimate additional local costs
  • Project management and tracking
  • Programming and systems support
  • Shipping and insurance
  • Purchasing storage devices, media and software
  • Other
  • Total = cost 10

Total cost = Costs (1-10) + (Indirect costs)