Introduction to Creating Digital Resources
Purpose of this Guide
This guide introduces key issues that anyone planning to create a digital resource for research, learning or teaching in the arts and humanities should consider. In addition to this introductory guide, the AHDS has a range of other guidance designed for anyone planning a digital resource in the arts and humanities:
- AHDS Guide to Good Practice series
- AHDS Case Studies
- AHDS Information Papers
Those preparing a technical appendix for an Arts and Humanities Research Council application will also find the AHDS Notes on Writing the AHRC Technical Appendix useful.
The AHDS provides subject focused specialist advice through its five Centres for History, Visual Arts, Performing Arts, Archaeology and Literature, Languages and Linguistics. If uncertain, please contact the AHDS Executive.
Creating 'Fit for Purpose' Digital Resources
A place can be found within the arts and humanities for virtually any kind of digital resource, ranging from transcribed texts to immersive virtual reality models. These resources are created using a wide variety of tools and techniques, and with so many options it can be difficult to select the best approach. Information technology offers a huge range of possibilities and it can be difficult to select the best option, or even to understand how best to assess the options. Keep in mind the maxim fit for purpose. The tools and techniques that you use to create a digital resource should be determined by the intended purpose of that resource; it is important to ensure that technology is used to support the research or pedagogical objectives of a project, whilst not coming to dominate them.
One of the benefits of thinking about building a digital resource that will be fit for purpose, instead of one that is led by the available technology and expertise, is that it helps keep the focus on the research, learning and teaching objectives of the project, rather than on the technical methodology that is used to achieve them. It should be possible to describe the goal of a digital resource creation project in the arts and humanities in non-technical language, and from this it should be possible to describe non-technical criteria that can then be used to assess the possible technical approaches.
Digital resources are often expensive to create, and as well as ensuring the finished resource meets the requirements of the project, it is worth spending some time making the resource flexible enough to be used by others in the future, perhaps for entirely different purposes. A key, and probably the most important part, of this is documentation. A well documented resource engenders confidence, and is both easier and more likely to be reused.
Documentation should cover provenance, methodology, technical standards and unresolved problems. Documentation is discussed in more detail in the AHDS Information Paper, Metadata for your Digital Resource
The Project Team
A digital resource creation project relies on the expertise of the project team to achieve its goals. Digital resource creation projects in the arts and humanities require both subject expertise and technical expertise.
When technical work is outsourced, it is important that project staff develop a solid working knowledge of the relevant technical issues so that they can understand and properly assess the recommendations of their technical contractors. When technical staff are brought into a project, it is important that their work is not carried out in isolation from the subject matter of the project. One of the most valuable commodities a digital resource creation project can have is staff who understand both the subject area and the technical issues.
Remember that technical decisions are interwoven with the overall subject focused objectives of the project, so it is important to ensure that technical decisions are not made in isolation from wider considerations.
Many projects do not adequately plan how they will create their digital resource. Planning is often left too late and there is too little of it. Early planning does not needed to be especially detailed to be useful. A short one or two page project brief that describes the project's purpose, its intended audience and the likely content will help to clarify the project's objectives, and will provide a useful summary that can be shared with the AHDS and other technical experts. This type of document will be sufficient for technical specialists to suggest possible technical approaches and to highlight potential pitfalls. Seeking out this type of advice early on means that the project will have enough time to examine several possible technical solutions and decide which one will best meet the project's objectives. Your project should be guided by a firm understanding of the content and the intended purpose of the digital resource. This understanding will help you manage some of the trade-offs involved in creating a digital resource:
- Amount and detail versus time and cost of creation
- Complexity of the digital resource versus ease of use
- Flexibility of the digital resource versus suitability for a specific use
- Content creation with current technology versus future possibilities
Trials, Prototypes and Pilots
Creating a digital resource is a practical activity, and the value of trying things out ahead of time should not be under-estimated. Trials, prototypes and pilots are all useful, particular in the areas of digitisation, software and hardware selection, and data entry. Including a formal piloting stage in your project can be especially worthwhile. It can be used to test the feasibility of different methods of digitisation, or check how your target audience responds to the resource.
One area where trials and pilots are extremely useful is IPR (Intellectual Property Rights) - establishing in practice what you will need to do to ensure that you have all the rights and permissions needed to access and use material owned or held by others.
The Project Timetable
The basis for a sound project timetable is a clear understanding of how long each task will take and the order in which they must be completed. The time allocated to each task must be based on realistic estimates of the effort required. Trials, prototypes and pilots can all help to inform the project timetable.
Once the main tasks have been identified, and the time needed to complete them has been estimated, a project timetable can be drawn up. The timetable should show how long each task will take, the order in which tasks will be started and finished, and what the members of the project team are doing at any given point in time. Links between tasks should be clearly specified in the project timetable. A timetable like this will help to reveal problems such as staff who are committed for more than 100% of their time, or tasks scheduled to begin before pre-requisite tasks have been finished.
By identifying all the interdependencies between tasks in your project, the critical path of the project can be identified. The critical path is the sequence of tasks that must all be completed on-time for the entire project to be completed on time. A delay to any task on the critical path will delay later tasks, and the entire project will fall behind schedule.
A good project timetable is a useful tool for monitoring progress. Regular reports by those responsible for each task can be compared with the progress anticipated in the project timetable, allowing problems to be identified and resolved as early as possible. Progress can only be monitored with up-to-date information, and this is best provided through some formal (but possibly quite simple) framework of meetings and reporting intended to share information about progress between project members.
A large team of academics, research students and technical specialists can be involved in the project, and good project management is vital in ensuring that work is coordinated and delivered on time. Projects should clearly identify who has the authority to act if the project is not going to plan. Usually, a single project manager is best, possibly supported by a management or advisory committee in the case of larger projects.
Digital resource creation projects need to be managed with a degree of formality. Reports and meetings should be used to track progress against the project timetable and to help identify problems as early as possible. In many projects, technical tasks are undertaken and managed somewhat separately from other parts of the project, and in this situation it is important to ensure that the timing and objectives of technical work remains coordinated with other project objectives. Planning the technical work in distinct stages, moving from design documents through to prototypes and then periodic reviews of on-going work before the final resource is completed, can help. Each of these milestones provides an opportunity to judge how much concrete progress has been made.
Developing a Digital Resource
It is useful to think about designing a digital resource in terms of the following three elements:
- The underlying data the resource contains
- The software (and hardware) that is needed to make sensible use of the data
- The user interface through which the user interacts with the software to retrieve, search, and manipulate the data
The software and user interface can be thought of as layers that lie on top of the actual data, making it easier to work with the data but, at the same time, constraining the way in which a user can access and manipulate the underlying data your digital resource contains. Consider as an example, a collection of digital images based on the works held in an art gallery. The digital resource could simply consist of a set of TIFF images. However, you will also want to provide users with some information about each image. You could create HTML pages that contain both the image and the information about it, but now the user will have to have access to a web browser if they are to view the resource properly. You may then wish to allow users to search for specific images by keyword. This could be done by creating a database of keywords that can be queried from the webpage, but now you will need to run a database server alongside the web server. Each additional piece of functionality complicates the resource further.
A digital resource should be designed so that it's core content is as independent as possible from the means of accessing that content. This will help keep the finished resource as flexible as possible, allowing it to change and develop to meet unanticipated requirements while avoiding becoming locked into obsolete software, hardware or methods of interacting with data. To achieve these aims, we need to start by thinking about the underlying structure and organisation of the data without worrying about how it will ultimately be presented or stored for specific uses. Your design goal should be to hold master versions of all your data in forms that can be converted to meet varying purposes.
|Type of Data||Environments|
|Texts||Web, print, textual analysis software|
|Datasets||Spreadsheets, databases, statistical analysis, dynamic web site|
|Audio||Streamed from Web, desktop PC, editing software|
|Still Images||Web, published print quality, OCR (for vector images: GIS, CAD)|
|Moving Images||Streamed from Web, progressive download, DVD player, desktop PC, editing software, TV broadcast|
Basically, designing a digital resource involves answering the question, what are you developing? Is it a database, a website, a GIS (Geographical Information System), a catalogue or some other type of resource? Each type of digital resource entails a different approach to organising information - a data model -which will be appropriate for some tasks, but not for others. Perhaps the simplest data model is embodied in plain text files. These files simply store a sequence of numeric codes which represent characters. All you need to know is which character each code represents. A far more sophisticated data model many people have some experience of is the relational data model, used by most database software applications, such as Microsoft Access. This data model imposes a range of constraints on how the content of a database can be organised (it must be arranged in discrete fields, each record must be unique and so on) which ensure that data is organised consistently and predictably so that the validation, searching and display of data can be automated.
Different data models are implemented using different sets of standards, file formats and software, so it is very important to understand the type of resource you are building before proceeding. For example, laying out a table as HTML for a web page is appropriate if you want people to read the table, but storing the table as delimited text and loading it into a spreadsheet would be more appropriate if you plan to perform complex calculations on the table.
|Resource Type||Things to investigate|
|Texts||XML, TEI, Dublin Core, PDF|
|Dataset||Relational data model, SQL, normalisation, XML|
|GIS||Vector and raster data models, polygon topology, Open GIS standards|
|Library/Archive Catalogue||XML, OAI, Dublin Core, subject specific metadata schemas (e.g. DDI, VRA Core), XSLT, controlled vocabularies|
|Audio Clips||Lossless compression MP3, sampling rates, bit rate|
|Still Images||Resolution and colour depth, TIFF, PNG, lossless compression, NISO technical metadata, VRA Core 3.0 metadata, Dublin Core|
|Moving Images||Compression, MPEG frame rate, resolution and colour depth, screen size, 'codecs'|
For more detailed information about different types of digital resources and the issues involved in designing them, you should read the AHDS Guide to Good Practice series. There are some basic characteristics that apply across all types of digital resource that suggest it has been well-designed:
- Repetitive tasks can be easily automated
- Data structures are consistent, well defined and documented
- Data is created according to consistent rules
- The presentation of data can be easily changed
If you find that one of these points doesn't apply to your digital resource, then you can probably improve it, making it more efficient (and your work less taxing!).
All projects need hardware, but most projects need not be too concerned about the exact specifications of desktop PCs, laptops and printers. Standard computing hardware is now very powerful, and will meet most needs, although projects should always seek advice about purchasing these items from their organisation's I.T. support service.
More attention needs to be paid to the purchase of scanners, digital cameras and other digitisation tools. These devices directly determine the quality of your digital master versions, so care should be taken to compare and test different devices before making a purchase.
Digital resource creation projects in the arts and humanities may need a wide range of software, and it is not possible to discuss each category in detail here. A good rule to apply to any situation is to select software that implements relevant standards and allows data to be easily imported and exported. By adopting standards you will make it easier to share your data with others, and it will be easier for them to understand the data. These formats are also the most likely to import into another piece of software without any loss of formatting or structure. By selecting software with lots of export options you can minimise the risk of your data becoming dependent on an inappropriate or obsolete piece of software (and again, make it easier to share your data with others).
Creating, Acquiring and Digitising Content
The content of a digital resource may be created from scratch, digitised from existing analogue sources or taken from existing digital material. However you obtain content for your digital resource, the process should follow documented procedures and be consistent over time. Simple techniques, such as template documents and automation using macros, can help to maintain consistency and reduce the likelihood of errors. This is especially important if more than one person will be doing the same work in parallel as it is remarkably easy for differences in practice to creep in.
Digitisation is the central component of many digital resource creation projects in the arts and humanities. Still images (photographs, artwork) and written documents are the commonest targets for digitisation, but audio and moving image recordings are also digitised, along with a range of more esoteric sources.
Probably the most important task associated with digitisation is obtaining all the necessary rights and permissions needed to gain access to, digitise, and use the resulting digital surrogates as you wish to. While investigating these issues, it is sensible to also assess the practical difficulties that the material you plan to digitise may present. Fragile material, materials that cannot be moved, and materials that are of unusual sizes and shapes will all pose additional problems that may affect the amount of material you can digitise, or the way in which you decide to digitise it.
The aim of digitisation is to create an accurate digital surrogate for the original object. Because the cost, effort and technical challenges involved in digitisation increase as the digitised surrogate is made more accurate, you will need to focus on the intended purpose of the digital surrogate in order to make sensible trade-offs between accuracy and other considerations (chiefly, time and cost). There are two main factors in the digitisation process that will affect the accuracy of the final digital surrogate: the characteristics of the digitisation tool (scanner, digital camera etc.) you use, and the file format used to store the digital surrogate.
|Material to Digitise||Tools||Things to Investigate|
|Live performance||Digital camcorders||Digital Video (DV), MPEG, YUV colour space, resolution, Firewire (IE 1394), lossy compression|
|Paintings, pictures, diagrams||Scanner, digital camera||(optical) resolution, dynamic range, RGB colour space|
|Written documents||Scanner, optical character recognition software, keyboard (transcription)||transcription (optical) resolution, dynamic range, RGB colour space, UNICODE, double-keying, spelling and grammar software, optical character recognition (OCR) software|
|Audio recording||Sound card with analogue to digital||Hardware compression, voice vs music|
|Moving picture recording||Video capture card||Signal standards supported (PAL, NTSC, SECAM), hardware compression|
More information is available from the Information Paper on the Digitisation Process.
Creating New Content
As well as being useful for editing digitised material, software such as word processors, HTML editors, CAD packages and other tools, can be used to create new digital content from scratch. When more than one person will be creating material, it is important to use the same software, or establish how content can be shared and integrated before work begins. This is especially important when the a large number of external (to the project team) contributors will be providing content.
It is important that everybody creating material understands the terms and conditions under which they will be used. Requiring all contributors to sign a formal licence, detailing their rights and those of the project, is a good idea. A number of model licences for different situations exist.
Documenting the Resource
Comprehensive documentation, such as user guides, interview scripts, codebooks and performance notes for example, is vital if a digital resource is to be shared and remain usable in the long-term. Indeed, good documentation often proves its worth when you return to a digital resource you have designed yourself after an absence. The AHDS strongly recommends that all projects devote a reasonable part of their total effort to documenting the digital resources they create.
Documenting a resource should not be left until it is completed, but should be seen as an integral part of its development. Tasks should be documented as they occur, when the activity is fresh in the mind. This approach guards against information being misplaced, forgotten, or taken away from the project if key staff depart.
Documentation should cover provenance of sources, methodology of digitisation, design of databases, XML schemas and other data structures, and give details of code books, controlled vocabularies, abbreviations and other project specific knowledge. Documentation should also include key correspondence and formal agreements relating to the creation and use of the resource's content. Online delivery systems, software and source code should, of course, be accompanied by suitable technical documentation.
Resource Discovery Metadata
In addition to general documentation most types of digital resource should also be accompanied by some formal, structured, resource discovery metadata. Metadata is 'data about data'. Resource discovery metadata is information that describes your digital resource and helps potential users find it, similar to the type of information you find in a library catalogue. If your resource is a collection of texts, images, or some other type of material where users will need to search a large set of items, you will need to create resource discovery metadata for each item as part of the digital resource creation process. There are now many formal standards for resource discovery metadata intended for different subject areas and levels of detail. You should at least create a basic metadata record for each item that conforms to the Dublin Core standard.
As mentioned before, documentation is discussed in more detail in the AHDS Information Paper, Metadata for your Digital Resource