Coded Letters: The Kircher Correspondence Project
Structuring and Classifying Records for a Database
Background
The seventeenth-century Jesuit philosopher, Athanasius Kircher, was one of the world's first scholars to build up a network of intellectuals corresponding from locations all over the globe. He sent and received epistles on an incredibly diverse range of subject matter - the trajectory of comets, inscriptions on obelisks, magnetic machines and artificial languages, for example - thus elevating his status to one of the foremost thinkers of his time. Others sent him not only text but diagrams and drawings, including heraldic symbols, musical notation and sketches of mythical monsters, all of which funded his own intellectual investigations. The collected correspondence is now housed at the Pontifical Gregorian University in Rome, where two historians, Dr Michael John Gorman and Nick Wilding have been leading a team to digitise and deposit the letters on the Internet.
The project, entitled The Correspondence of Athanasius Kircher: The World of A Seventeenth-Century Jesuit, is ultimately aiming to produce a complete scholarly edition of Kircher's archive of letters. At present (Summer 2000), the project has digitised images of all Kircher's incoming correspondence and they are presently available online. For example, Figure 1 shows a letter written from Cardinal Leopoldo Medici in 1667 to Kircher, explaining that the letter is accompanied by a book on anatomy printed in Florence. This is one of the over 8, 500 images available online. Transcriptions and translations (the letters arrived in well over twenty languages including Armenian, Coptic, Czech and Kircher's own artificial language) will be added at a later date. Via the project's website (http://archimede.imss.fi.it/kircher/) visitors can perform search operations to find images of any of Kircher's letters.

Figure 1 - A letter sent to Kircher from Cardinal Leopoldo Medici in 1667
Presenting users with the individual images once they had entered their search terms was a particular problem while placing the images online. To overcome this, there had to be the creation of a classification that would allow the user to be guided from the text string he entered (a letter's sender, its title, or its date for instance) to the digitised image itself. This entailed the development of a database that would categorise the images, linking them up to possible text strings. This article looks at the issues involved in creating this database.
Using Pre-existing Databases
Many digitisation programmes in this position have to construct a database from scratch, but the Kircher team is involved in the larger Pinakes scheme. Pinakes aims to construct a digital library relevant to the history of scientific inquiry by providing its projects with a standard database template for their data. Other projects working under the Pinakes label included a catalogue of the papers once held by an eighteenth-century French scientist, and digitised manuscripts from the Biblioteca Nazionale Centrale in Florence. This meant that the Kircher team already had a model database on which to categorise the images.
The aim of Pinakes was to develop a systematic way of ordering these diverse projects, creating broad types of fields, relevant to every project. Its interface (shown in Figure 2) is a reflection of the standard database template they have created for all the projects working under the Pinakes flag. The queries that the interface offers are dependent on the types of field in the database. The first box allows the user to search by title, and, if needs be, restrict this search to certain types of object. Users can also reach the documents via shelfmarks, related names, the year in which the object in questions was composed or the general subject. A combined search is also available, allowing a user to cross-search over these various fields.

Figure 2 - The interface for the Kircher Correspondence Project
Using Pre-existing Databases (continued)
Utilising a pre-existing database affords many advantages. In practical terms, it eases the financial burden; less money, and less time, is spent solving the teething troubles that accompany the construction of any database. Additionally, providing data to a common shell allows it to become interoperable with other datasets, thus providing the potential for greater dissemination of the data. Technical difficulties are also easier to overcome when problem-solving can be spread over a number of databases.
But, as we shall see later, the Kircher project has to confront a number of obstacles when placing its database in line with other projects. Utilising a generic shell means that it can sometimes not be responsive to the particular needs of the Kircher database, and updating this shell requires much additional work. This is crucial if one is to provide a thorough tool to answer the complex needs of researchers.
Visibility: Reaching the Correspondence
Although the formula provided by Pinakes supplied the field headings, the Kircher team still had to develop the individual terms that would categorise each image. For some field-types this was straightforward, namely the year and shelfmark. But for others, subject matter and object description for example, this was more confusing.
The project had to bear in mind the wide range of scholars who may be interested in the correspondence. Kircher's polymathy attracts scholars of all kinds of disciplinary hue - historians, philosophers, scientists, theologians etc. Categorisation could not follow a pattern that would favour researchers in a particular subject. The sheer diversity of the Kircher collection guarded against this. Not only did the subject matter vary widely, but the existence of different types of objects (there is a significant minority of objects that do not come under the heading of written epistle) necessitates a very broad strategy of categorisation.
When shaping these categories another important factor was to ensure that the documents remained 'visible'. That is, to ensure that as few images as possible could escape a user's search by being defined too rigidly. When an image could be defined in numerous ways, the Kircher team took the opportunity to do so. Take, for instance, a letter concerning the inscription to be placed on an obelisk above a sculpture designed by the great artist Bernini. This is filed under three different subjects - Egypt and Egyptology, Bernini and inscriptions - so the user can find the letter by going down anyone of these three branches. This is particularly important when users from different subject areas approach the database. For the above example, an art historian may have thought of searching under Bernini while an ancient historian may have selected Egypt and Egyptology. If the image had only been catalogued under inscriptions, then neither of the researchers would have been able to locate the image.
However, the Kircher team had to remember to strike a balance in how many different ways subject matter was defined. While not providing enough categorisations may hide an image, providing too many will mean, because the image is related to so many categories, that the user is presented with a flood of hits on making his search.
With this in mind, the Kircher project began filling the field headings supplied by the Pinakes database.
Visibility: Reaching the Correspondence (continued)
For the title field, the Kircher team either gave the title that came with the object in question (such as the coat-of-arms bearing the title 'Epigramma in Honoatum Joannium, Joanniae domus') or, if there was none, a title created by the project team (such as 'Notes concerning the measurement of the diameter of the earth by Jean Picard'). At the moment, an auto-generation process is being developed for the letters, so that the title will indicate the letter's writer and recipient, permitting users to search for particular letters through their correspondents.
Inserting the names of senders and other connected figures to the names field was, like inserting titles, fairly straightforward, although it did require additional work to exploit one particular strength of the Pinakes model. The Pinakes system allows names in the database to be related to another. So, for example, a user being guided to the letter sent to Kircher by Cardinal Medici (Figure 1) could then be directed to the letters of those figures cited in the letter, in this case the anatomist, Niels Stensen. Names in the correspondence and even in pre-existing inventories of the Kircher correspondence varied wildly, especially if one name appeared in separate languages. The Kircher team therefore had to consult the Anglo-American Cataloguing Rules (AACR2) in order to develop a standardised set of names for the database - only with this standardised set of names would the Kircher database be fully able to demonstrate the connections between the various figures related to the correspondence. Additionally, the Pinakes template required that each name be categorised as a particular type - for instance, sender, recipient, translator, or name cited in the correspondence.
Letters were not the only objects sent to Kircher - the type of material varies greatly, and the team reflected this in the general object description field (beneath the title box at the top of the template). Figure 3 shows a cryptogram sent from Wolfenbüttel in Germany by the Duke of Braunschweig-Lüneburg; other text-types, as the various materials sent were called by the Kircher team, included mathematical calculations, genealogical trees and prints of hieroglyphics. There was obviously no pre-existing list (as there sometimes is with more commonly-used sources) that could be transported into a project with such novel material. The actual categories of text-type were therefore developed by the team. Their definition was a relatively common sense task, using for example, terms such as 'report', 'travel diary' or 'will / testament' for the separate types of manuscript. This stands in contrast to much larger databases with hundreds and thousands of records (for instance, nationwide archaeological databases) where, for more practical reasons, specific terms cannot be chosen for each record. Defining the list needs to be a process with well-defined rules, so that the process of naming becomes an automatic one. With a database that is relatively small in size, as with Kircher's correspondence, each record can be dealt with specifically.

Figure 3 - A cryptogram sent to Kircher by the Duke of Braunschweig-Lüneburg
Again, what was important to the Kircher team in creating this list of text-types was to reflect the width of the material, and this was better achieved through many closely defined topics rather than a few large catch-all subjects. Many of the manuscripts, of course, crossed the boundaries created for them. Users interested in either the text-type 'engravings' or 'portraits' would be led to an etched image of Joachim von Gravenegg, Prince-Abbot of Fulda, sent to Kircher by an unknown correspondent. Similarly, if there were disagreements in the team about how a piece of correspondence should be defined, it could be simply be defined twice. Again, presenting multiple definitions of an object made the image more 'visible' for the searcher.
The other complex field concerned the correspondence subject matter. Initially, the Kircher team followed the philosophy that they had used for the text-types, developing their own common sense definitions, definitions that did not favour any particular area of research. This has created a very varied typology. Some subjects are intellectual disciplines (palaeography, philosophy, physics), others are on books Kircher wrote (as with the subject Fisiologia Nuova della Natura delle Comete), whilst others are geographical (France, Florence or Rome). Each of these subjects is visible from the pull-down menu, so the user does not need to guess at possible subjects. (This would not be feasible with a much larger database, where there would be a much longer list of categories.) As with the text-types, different subjects in the Kircher database can refer to the same image. The letter from Florence about anatomy featured in Figure 1 is filed under both 'anatomy' and 'Florence'. It is vital that the user has several routes to get to an image.
Visibility: Reaching the Correspondence (continued)
But, because of the structure of the Pinakes shell, the Kircher had to do more than add general subjects. Once a user has selected a general subject, he must also choose a topic and an argument. Figure 4 shows that once a user has chosen 'patronage', they have to choose a topic (in this case 'gifts'), and an argument (in this case 'medals'). (The same tri-partite structure is required of the user when selecting a date - not only must he select a year, he must select a month and a day.) Each image had to be precisely delineated within a general subject, a topic and a particular issue, a task that required much additional work. And while this does allow the user to give a close specification of the topics he wants to explore, it causes more difficulties for the user wanting to make a more speedy or general search.

Figure 4 - Choosing a particular subject from the Kircher interface
This complex structure for subject matter is not the most efficient approach for the Kircher project, but is something it has inherited from the involvement with the Pinakes group. This highlights the main problem with sharing a database with other projects, for the structure can become too rigid to fit in with the specific patterns of individual projects. Creating three layers of description can make it more difficult to reach the images, especially if the user is not exactly sure of what images he is searching for, or is wishing to perform a general search. Additionally, the tri-partite structure is simply more inconvenient for the user - more clicks and waiting before the images are reached.
The Pinakes model is a little restricting in other ways. Two particularly fruitful modes of research would be by the letters' city of delivery or by their language; at present there is only a limited scope for searching language and nothing according to city. Kircher received correspondence in over twenty languages and they arrived from over 400 cities. Many researchers, interested in one geographical location, or letters sent from one particular nation, would wish to restrict their searches in this fashion. Eventually, the interface will be extended, allowing users to analyse the correspondence in this way - the necessary fields have been included to the database in this expectation. But the additional work required in making these extensions to the Pinakes interface indicate the problem of inheriting a pre-existing model.
Despite the difficulties with the Pinakes model, the Kircher team appreciates the possibilities that become available in creating a digitisation project within an interoperable environment. Dr Gorman has suggested that the Kircher correspondence would interoperate better with other scientific correspondence where there is a much clearer overlap between subject and text-types. Many of Kircher's correspondents, for example the French mathematician Marin Mersenne, built up their own body of correspondence. Place it together with other letters in the genre, and a formidable dataset of primary material would be established.
Conclusion
The Kircher project illustrates the pros and cons inherent in being part of a larger group - the advantage of building upon pre-existing work, but the disadvantage of being restrained by a generic structure. Creating a fully efficient search tool for Kircher's correspondence has been hampered by this. But intelligent categorisation alleviates some of this problem. By recognising the importance of keeping the images 'visible', the Kircher06/10/2003 team has created categories for its images which allow users the best possible chance of finding them. Whichever framework such a categorisation is placed in, it is essential that this be kept in mind when inserting content in a database.
Many thanks to Michael John Gorman
for his assistance in the preparation of this case study.
You can visit the Kircher Correspondence Project at
http://archimede.imss.fi.it/kircher/
Footnotes
1. Archivo della Pontificia Università Gregoriana; APUG 555; 61r-62v.
2. APUG 558; 75r & APUG 565; 77r-77v.
3. APUG 563; 57r-57v.
4. Social historians often employ a standardised list for classifying occupations; for archaeologists one example of a standardised word-list for monuments is supplied by the Royal Commission on the Historical Monuments of England Thesaurus of Monument Types.
5. APUG 562; 177r-177v.