-- Recounting Digital Tales: ---- Chaucer Scholarship and The Canterbury Tales Project --
----------------------------------------------------------------------------
Content written on by . The role of the Oxford Text Archive has since been taken up by [AHDS Literature, Languages and Linguistics]
.
----------------------------------------------------------------------------
"Learning was all he cared for or would heed," The Scholar as described in The General Prologue The application of computing in humanities research occasionally runs up against a knee-jerk response. Manuscripts, frescos, and artefacts, the response goes, are surely meant to be interpreted and admired rather than counted and dissected - are the humanities not engagingly heuristic rather than coldly logical? But there is plenty of research using digital technology that punctures this criticism. The [Canterbury Tales Project]
is an excellent example. The team, based at De Montfort University in Leicester, is engaged in the massive task of both transcribing and digitising all 84 remaining manuscript versions of Chaucer's famous tales. Segmented into various parts (e.g. --The General Prologue, The Miller's Tale--), each stage is being produced and published on a separate CD-ROM. With each CD-ROM users not only get the chance to explore electronic versions of the text, but are presented with programs to aid textual analysis. These programs allow scholars to gauge quickly the similarities and differences of the manuscripts' language. In turn, this creates new possible links and ties among them, guiding scholars towards fresh hypotheses concerning the Tales. Thus, the --Canterbury Tales Project-- invigorates Chaucer research on two planes. Firstly, the electronic versions of the Tales allow for much greater access to the manuscripts, and secondly the accompanying software offers tools that can help gain useful insights to the medieval masterpiece. This case study attends to the latter, giving a concise example of how computers can assist humanities research.
Figure 1 - A detail of what is believed to be one of the oldest Canterbury Tales manuscripts, commonly known as the 'Hengwrt Chaucer'. (Aberystwyth, National Library of Wales, Peniarth MS 392D, folio 13v). Digitisation has brought out faint detail (dry point gloss that is invisible to the naked eye) to the right of the image.
Image copyright of National Library of Wales Extending Older Traditions Dr Peter Robinson, Director of the Canterbury Tales Project, has been leading the way in computer-driven research by using the transcribed version of the Tales. As with many topics in literary studies, questions of documental genealogy are of prime importance. Which of the 84 manuscripts is the closest to Chaucer's original intention? Which did scribes in Chaucer's circle execute? Which were produced decades after Chaucer had died? Two scholars, Manly and Rickert, made a valiant effort at answering such questions in their book on Chaucer published in 1940.1 They tried to determine this genealogy by examining where there were similarities and variations among the numerous manuscripts. Put simply, if a set of manuscripts showed a marked similarity in various versions, there was a strong possibility they were made by copying a common manuscript, called the exemplar. When there were unique variations in a manuscript it indicated a manuscript that had a distinct place in the genealogy. By gradually working through the manuscripts a hierarchy was developed, suggesting the lineage of the Chaucer manuscripts. While provoking much interest, Manly and Rickert's theories failed to answer some of the major questions in Chaucer studies. The amount of information they had to analyse was huge, too much to produce satisfactory results. The 84 manuscripts contained thousands of different readings and variations, all of which needed to be collated and compared in order to pursue this method of research. The framework for such research has now changed with the advent of powerful computer technology. With the computer's capability for managing large amounts of information, scholars such as Dr Robinson are now in a much better position to execute the type of research instigated earlier in the century. The --Collate-- Software Package In the digital versions of --The Canterbury Tales--, each separate word in each manuscript is (or will be) collated and regularised by a program called --Collate--. An example from the CD-ROM on --The General Prologue-- shows how this works. The first line of --The General Prologue-- according to the manuscript in Christ Church College Library, Oxford reads 34 of the other manuscripts use this line, but the word appears in several different forms, for instance, Aprylle, Aueryll and Aprill. In all, there are 19 variations in the 34 relevant manuscripts. The --Collate-- software allows the researcher to regularise these variations - a research method used by many textual scholars, but much quicker to execute if computer-driven. Examining the 19 variations, or variants as textual scholars call them, more closely, the researcher can see family resemblances between some of them. The user can therefore regularise the variants by separating them into distinct groups, as has been done for each tale currently produced on CD-ROM. When the Canterbury Tales team, led by Dr Robinson, performed this function, the variants were reduced to two spellings Aprill and Aueryll. The team then proceeded with regularising the rest of --The General Prologue--. It was a lengthy task, but not nearly as arduous as the similar work done by Manly and Rickert in manual fashion. This process of regularisation produced a more manageable set of data - each word in each manuscript had been subtly realigned to reflect its presence in a larger family of words. While this process of regularisation is a simplifying one, it still leaves the scholar with colossal amounts of analysable data. The 7,000 words in --The General Prologue-- produced around 16,000 regularised variants. The Canterbury team therefore moved to another form of computer analysis in order to interpret this regularised data. The --PAUP-- Software Package Adapted from its use in evolutionary biology, the program ----PAUP---- (an acronym for the impressive-sounding --Phylogenetic Analysis using Parsimony--) can analyse the mass of regularised words in each manuscript, giving an indication of the manuscripts that share similar patterns of regularised words and the manuscripts that have significant deviations. The resulting graphs, such as seen below, are called cladograms, from the Greek --klados--, meaning shoot. Figure 2 - A PAUP analysis executed on sections of --The General Prologue-- Figure 2 shows the result of --PAUP-- analysing all variations (from lines 1 to 250) in 21 of the manuscripts that contain --The General Prologue--. (Each of the manuscripts is marked with a two or three character index related to the library where the manuscript is held.) The manuscripts that are on neighbouring stems indicate similarity, while those at different ends of the tree obviously have very different characteristics. Also, the longer the branch or stem the greater the difference. The presence of two very general branches, each sharing two halves of the diagram (one stretching from Ad1 to Ht and the other from Hg to Ha2), suggests two large families of manuscripts. Within that, there are manuscripts that are very similar (such as En3 and Ad1) at the very top, the odd individual manuscript that stands somewhat by itself (Ht) and other clusters packed together (Ha2 Mg Lc Pw La). The presence of the manuscripts El (the Ellesmere manuscript) and the Hg (the Hengwrt manuscript) near the base of the tree suggest that, while they do have the odd individual variation, they are more likely to be among the older manuscripts in the Chaucer tradition, upon which the others are founded. --PAUP--, therefore, devises a possible genealogy for the Chaucer manuscripts, indicating what manuscripts might have provided exemplars for other groups of scribes to work on. It can do this for the whole array of Chaucer manuscripts, and it can also help with more particular questions about a smaller set of manuscripts or a specific tale, for example. With 84 separate manuscripts and 24 tales (which are often analysed in smaller sections) there are diverse literary arguments with which --PAUP-- can provide some support. However, blindly accepting the answers provided by --PAUP-- will set alarm bells ringing in every scholar's mind. There are many assumptions in the process of genealogy-creation that need to be addressed. In running the --PAUP-- software, the broad assumption is that because two manuscripts are similar in their variations, they must be linked somehow. Their common features could actually just be coincidence. Additionally, knowledge of medieval scribal practice is more complex that the --PAUP-- model allows. Some scribes copied from several manuscripts, more than one hand may have been at work on a single manuscript, other scribes would have misread the manuscripts they were working from, Chaucer himself might have added later insertions to one or two of the manuscripts, and there are probably many manuscripts that went missing. A program such as --PAUP-- has difficulties reflecting these added complexities. This is where the --Collate-- program (which helped with the regularisation) resurfaces. --Collate-- allows users to look for variants that are common only to a particular set of manuscripts. The example below indicates a complex search partially based on the tree presented in Figure 2. The user inserts queries into the eight fields. If a set of manuscripts is based on a common exemplar, they will share variants that will not be present in other manuscripts. In this figure, the user is looking for variants that appear in manuscripts that featured in the top half of the tree in Figure 2, stretching from Ad to Ii. If the --Collate-- program discovers a large number of variants it would help confirm this. Figure 3 - The Collate program in action As can be seen in the lower half of the illustration, the computer found 34 variants that met the user's demands. The computer, however, can provide no further clue as to whether this determines a more-than-coincidental resemblance between various manuscripts. The rest is now left to human judgement. As Dr Robinson points out, there are around 7000 possible words in each version of --The General Prologue--. To what extent can a set of manuscripts that exclusively share 34 similar variants be considered a family? While the application of computers can give the impression of answering difficult questions, at the point of the final analysis it is still human judgement that counts. The computer is simply allowing the user to make a historical judgement with a wider range of evidence. The scholar must decide if the presence of 34 variants indicates an especial pattern or if it is just a coincidence. --PAUP-- is not meant to provide a definitive answer to the genealogy of --The Canterbury Tales--, but it can guide the researcher to a specific set of questions. Likewise, --Collate-- gives further suggestions on whether a set of manuscripts is part of a recognisable group but can offer nothing to confirm this. Interpretation is still the job of the researcher. So that this last point is not forgotten, it is important that the assumptions made in creating the computer model are made transparent. --PAUP-- offers a hierarchy of Chaucer manuscripts based on the assumption that similarities in variants mean that manuscripts have similar scribal histories. This is patently not the case, and claiming certainty for computer-generated theories will make the theories redundant. But illustrating how --PAUP-- can inform new questions in Chaucer studies is a different matter. When computer-generated evidence is linked to other types of evidence, there is the chance of building a more coherent argument. The following section of the case study looks at the position of the computer-aided analysis in Chaucer studies, indicating how the Canterbury Tales team has integrated technological and traditional methodologies to produce and justify their interpretations. Invigorating Chaucer Studies Many of the findings made by the --Canterbury Tales Project-- team have been recorded in their --Occasional Papers--. Peter Robinson's lengthy article "--A Stemmatic Analysis of the Fifteenth-Century Witnesses to-- The Wife of Bath's Prologue" tackles the questions relating to the textual history of the Tales using the techniques outlined above.2 Some of his conclusions reiterate those of Manly and Rickert, while others provide new insights. Like Manly and Rickert, Robinson's --PAUP---based analysis suggests the presence of four families of Canterbury Tales manuscripts, each family with characteristic variants (and therefore implying a common exemplar for each of them). However, Robinson was also able to identify a further two sets of manuscripts with family resemblances. This hypothesis was first suggested by utilising the --PAUP program--, and then subjected to further analysis with --Collate--. Splitting --The Wife of Bath's Prologue-- into eight sections of around a hundred lines each, Robinson analysed each section using --PAUP--. Going through the results, he identified which manuscripts were not securely rooted in one particular family. Such manuscripts tended to skip from branch to branch, an indicator of a confused scribal parentage. These manuscripts were then removed from the equation and a final --PAUP-- analysis was made on the remaining manuscripts, based on the entirety of the Wife's Prologue. Figure 4 - A cladogram of fundamental witness groupings of the Wife of Bath's Prologue As can be seen, the PAUP analysis suggests various families of manuscripts. Manly and Rickert had already identified A B C and D, but had only briefly considered the possibility of groups E and F. (O consists of the other, probably older, manuscripts with a greater deal of textual independence). Wishing to investigate further, Robinson took the E and F theory to --Collate--. In --Collate--, he found around 150 variants that he believed were particular to the manuscripts in the E category, Bo1 Ph2 Gg Si. Figure 5 - Collate identifying the readings characteristic of group E 147 was a high number of variants, but, alone, this does not justify the belief that they are products of a similar lineage. Here, Robinson refers to the analysis of one of his colleagues. Elizabeth Solopova's article, "--Chaucer's Metre and Scribal Editing in the Early Manuscripts of The Canterbury Tales--" spends some time making a close textual analysis of the some of the variants that Collate suggested were common to the manuscripts in the E group.3 Comparing these variants to the manuscript taken to be closest to the original (the Hengwrt manuscript), Solopova uncovered a clear pattern in how the variations came into existence. Various qualitative techniques, packet and parcel of the traditional school of literary study, were used. This included identifying subtle shifts in metre, the degree of grammatical correctness, and modifications in syntax. These techniques revealed that there was "an unintentional policy aimed at introducing stylistic corrections" - a significant editorial policy indicating the presence of a common exemplar (or common editor) which informed all the members of the E group (as well as some other manuscripts). For example, lines 371-2 of --The Wife of Bath's Prologue-- in the more poetic Hengwrt version read "Thow liknest eek wommanes loue to helle, To bareyne lond, there water may nat dwell." In the collection of manuscripts in the E family, however, the second line becomes "To bareyne lond, there no water may dwell." This is a more prosaic version, reversing the livelier wording of the earlier version. Solopova uncovers several other similar examples that are in Robinson's list of variants. Pronouns are introduced, metre re-arranged and colloquialisms removed, all further evidence of a coherent editorial policy in the E family. In tandem, Solopova's more qualitative essay and Robinson's more quantitative essay make a convincing case for the idea that the similarities in the E family were not a coincidence. Computer-aided research on Chaucer is not simply responding to numbers on screen, but a sensitive integration with other styles of research, paying heed to the assumptions made and limits to knowledge in both. Widening Access to Chaucer In conclusion, it is perhaps worth pointing out that despite their specific research methods, the --Canterbury Tales Project-- is committed to creating digital editions where researchers are free to follow their own line of inquiry. As well as the CD-ROM of --The General Prologue--, the project is preparing two electronic editions of the Hengwrt manuscript of --The Canterbury Tales--. One contains various introductory material and low-resolution images of the relevant manuscripts. A more expensive CD-ROM replaces these images with ones of a considerably higher quality, as seen in the earlier figure, and includes further tools for comparing Hengwrt with the Ellesmere manuscript. Even the scholar suspicious of computer-driven research can concentrate on studying the manuscripts without having to use the programs and therefore accept the methodology sketched above. One can always return to the primary sources. This increased accessibility should also mean a far greater range of interested researchers - historians, linguists, art historians etc. - who are no longer committed to making the pilgrimage to the rarefied atmosphere of the manuscript archive, but can now investigate the documents from their personal computer. Moreover, undergraduates are no longer restricted to the compromises of modern editions, but can gain a much fuller historical sense of what --The Canterbury Tales-- actually are. The --Canterbury Tales Project-- is a prologue to much greater interest in Chaucer studies. Footnotes 1. John Manly & Edith Rickert, --The Text of the Canterbury Tales. Studied on the basis of all known manuscripts--, 1940 2. Peter Robinson, "A Stemmatic Analysis of the Fifteenth-Century Witnesses to --The Wife of Bath's Prologue", The Canterbury Tales Project Occasional Papers--, edited by Norman Blake and Peter Robinson, Volume 2, 69-132. 3. Elizabeth Solopova, "Chaucer's Metre and Scribal Editing in the Early Manuscripts of --The Canterbury Tales", Occasional Papers--, II, 143-65. Many thanks to Peter Robinson and Claire Jones of the Canterbury Tales Project for their help in compsosing this case study. The home page of the Project is at [http://www.canterburytalesproject.org/]
----------------------------------------------------------------------------
Page last modified: by [Email the AHDS]
| [Site Index]
| [Other Relevant Services]
| [Latest Collections]
|