“Digital Records, Digital Documents, and Digital Archives: The Past, the Present and the Future”
Anne J. Gilliland, Department of Information Studies, University of California, Los Angeles
Archives Lecture given in Cotsen Hall on November 4, 2008.
On this page you can view the lecture using the viewer below, or read the speaker’s text.
Watch the Lecture:
.
Read the Text:
“I know that I do not need to explain to this audience the key roles that archives have played in societies since antiquity, nor how archives have historically and continuously been affected by changes in literacies, recording media, and legal and administrative structures. However, this is an exciting moment for archives and for archival users. An ever-widening array of digital technologies are both generating new forms of documentation of bureaucratic, scholarly and personal activities, and making it possible to collocate, disseminate and analyse traditional as well as these new documentary sources in unprecedented ways.
At the same time, archives are struggling to continue to process large backlogs of traditional materials; to judiciously select, digitize, curate and make available online certain archival collections; to figure out how to create, identify, acquire, describe and disseminate born-digital materials of archival value; to stay abreast of increasingly complex and multinational policy questions relating to ownership, provenance, privacy and cultural sensitivities; and to implement preservation regimes and infrastructures that meet archival and evidentiary requirements for reliability and authenticity of analog, digitized and born-digital materials.
Tonight, I would like to offer a few thoughts about what all of this means for the evolving role and conceptualization of records, documents, archives and special collections in a digital world; and particularly for archival ideas and practices.
I would like to start off with a quote:
Authors in the Nineteenth century who sought to give a more precise meaning to the word archives gave contradictory definitions of it because they followed preconceived ideas, like Ménage and Le Duchat before them. We have seen that in the predominant thought of the first of these etymologists his conception of archives was that of ancient documents; for the second, it was the idea of precious documents. Several modern authors attach to the word archives other conceptions just as little satisfactory in themselves, e.g., that of official documents, of historical documents, of authentic documents, of documents in substantiation of rights.
In actual practice, if archives are in fact composed above all of documents in great part in one or another of these categories and sometimes in all of these, it does not mean that there may not be found in the archives documents which are neither ancient nor precious nor official nor authentic nor such as substantiate rights…
This statement could have been made by one of any number of current scholars of Archival Science who are seeking to debunk preconceptualizations of the nature of records, documents and, ultimately of the archive in light of rapid digital developments; or to reconceptualize, in post-modern and increasingly in post-colonial terms, the role and nature of the Archive in society. However, the statement was made in 1938 by Charles Samaran (at the time, Professor of Bibliography and Archivistics, École nationale des Chartes, Paris and future Director General of the National Archives of France). Samaran was writing on the eve of WWII, at another time when there was great ferment in archival ideas – rapidly developing office and audio-visual technologies were dramatically altering the nature and volume of records and archives; these new technologies made duplication and dissemination of records and documents easier, but were inherently more difficult to preserve. There was also a keen awareness of the risk to archives of war and social unrest, and archivists and other heritage and documentation professionals from across Europe and N. America had been coming together over the previous 3 decades to discuss what all of this meant locally and internationally for professional ideas and practices relating to appraisal, description, and preservation. What resulted was general international agreement about the professional paradigm that drives archival ideas and practices today.
Let me now give you an example of the state of affairs for archives and scholarship today so that we can understand better the opportunities and issues that now require us to consider how that paradigm might again need to be debated and augmented.
Last year, I attended a research symposium on strategies for studying contemporary sciences that was held at two Japanese research campuses – Sokendai and KEK, located in the “science cities” of Shonan Kokusai Mura and Tsukuba respectively. The symposium brought together scientists in high energy physics and related fields such as molecular science and fusion science, several of whom were among the most senior and distinguished in their fields. Also attending were historians and sociologists, and archival scholars and practitioners. The participants were there to discuss building archives to document the growth of these scientific fields in Japan. The scientists eloquently laid out milestones in their fields, discussing the careers and contributions of key figures and their influence on subsequent generations of scientists. They reflected on the impact of World War II and subsequent events on the development of their fields, including the post-War interactions with scientific education and research in the United States, national attempts to address gender inequities in science, the building of specialized science cities, and above all why they felt it was so important to capture all of this in an archive. They talked about how they had canvassed senior colleagues to gather the few remaining records and to organize and record interviews with them, and then to build digitized archives of those materials. They asked whether their archives should include only oral histories and personal papers, or should they also include administrative records? The archivists present pointed out the absence of university archives and institutional records management programs in most Japanese settings – the kinds of programs that in other environments might have been responsible for retaining important records relating to the administration and funding of scientific research. They postulated that the archives being developed might have to include these also. There ensued discussions about whether, beyond capturing the stories of individual scientists and disciplinary divergences and convergences, the archive should be documenting the actual processes of science. They also wondered about whether they should be including the products of that science such as data sets and computer simulations, formulae, and models. The high energy physics community was an early adopter of information technology, and the scientists speculated on how digital aspects of modern science might be captured. They also mentioned the prolific use of web pages and electronic mail by scientists.
The historians and sociologists argued that the social milieu within which science is and has been conducted needed to be studied and the resulting documentation also incorporated into the archives. For example, ethnography could be used to examine the motivations behind the development of Japan’s science cities and the social ecologies at work today in and around those communities. They talked about the importance of archives for developing fields of memory and identity studies. Like the scientists, they worried about whether and how to archive the digital traces of the processes of science. They mentioned blogs, wikis, mash-ups, and Second Life and other trends in scientific communication, many of which now take place in public view and even with interaction on the part of the public, and pondered what the “record” status of such digital artifacts might be in terms of serving as reliable, accurate and authentic evidence of the genesis and discussion of scientific ideas, as well as the actual conduct of scientific research.
As both Samaran’s statement and the Japanese scientists’ example demonstrate, conceptualizations of the archive have been evolving over the past hundred years, as have those of the “record” it contains or might contain, and of the “archivist” who is responsible for its management over time. Today, entities that are referred to as archives may be physical or virtual, they may be institution or community-based, or personal. They may be standalone repositories or housed within a special collections unit within a research library, they may be digitally distributed across many locations but collated as a virtual archive on the Internet, or they may reside only a digital storage device and be disseminated through a personal web page. What they all have in common is that they can be found at those points where information, accountability, memory, and culture converge. While classically archives would contain primarily, if not exclusively unique bureaucratic records, increasingly they include multiple versions and formats of those records, as well as a host of non-bureaucratic historical, documentary and other non-published or primary materials, many of which do not come in traditional textual forms, and whose status as records is widely debated among archivists These may be oral and visual histories; scientific data such as satellite images, read-outs from digital instrumentation, digital field notes; or virtual recreations of architecture or performances, just to give a few examples. In the digital realm, even the word “archives” is subject to challenge. It can mean something different to computer scientists and systems administrators and the online digital components of archives are often referred to as, or subsumed within, digital libraries.
Evolving archives
Today’s archivists not only need to address how to bring what are still predominantly analog holdings of manuscripts, prints, maps, photographs, and tape recordings to an online public – a massive digitization undertaking requiring careful selection, complex rights identification and clearance, re-description of collections and addition of item-level metadata, time-consuming digitization, and, of course, a lot of money as well as technological skill. They also need to be working to identify and preserve emergent digital bureaucratic records – that may not look like what they traditionally consider to be records - and research data now created in digital form using an ever-increasing and evolving array of often non-standard technologies, as well as other digital materials that are documenting aspects of society and everyday life that have previously rarely made it into an archive. All are being created in unprecedented volumes and ways by breathtakingly fast-changing technology.
This means increasing examination of the nature of legal, historical and cultural evidence and how best to select, preserve and present it, whether we are dealing with born-digital or even digitized materials. Many recent digital records initiatives have been driven by a re-stated emphasis on an evidence-based approach as a result of legal and bureaucratic needs to define the scope of the record, and imperatives to create and preserve digital records that will stand up in court. So far, this emphasis on legal and bureaucratic notions of evidential value has largely squeezed cultural, historical and other scholarly considerations out of the discussion about what born-digital material to preserve and make available, and how.
Building digital archives
Although the physical archive and the associated management activities will never realistically go away, because of the need to preserve original materials that were created in non-digital formats, archives now need to find ways to integrate, in an organic manner, born-digital as well as digitized archival content, and to re-think information retrieval and end-user services within a more comprehensive archival regime. In so doing, even practices for managing non-digital materials may, indeed should, be re-thought.
Moving ahead in these ways, however, raises a host of questions. For example, which archival approaches continue to work in the digital age, which need to be enhanced, and which might even be abandoned? How might archival notions of the record, evidence, permanence, uniqueness, authenticity, ownership, and custody be shifting? Who indeed, should or will be the archivist (for example, does the academic scientist, archaeologist, anthropologist or ethnomusicologist, to name but a few, who gathers the documentation as part of his or her own research have to take on responsibility also for archiving it? What might such developments mean for the trust we place in the archived record? Do academic institutions have to invest in digital repositories as well as archives, or should one function be subsumed within the other?)? How much of our digital heritage should actually be preserved? If we were able to preserve it all, should we? On what basis should we make those decisions? What, if any relationships should be developed between archives and other so-called memory or information institutions such as libraries and museums to address challenges and opportunities arising in the digital age? Are there profession-specific practices or approaches that libraries, museums and archives can learn from or share, especially in the areas of metadata creation, content curation, and programming for users? What economic and policy structures should support the development and ongoing management of digital archives (especially the long-term preservation of digital content)? As the resources required become more technical and more expensive, are we looking, perhaps, at a movement away from a culture of isolated institutions and toward more collaborative documentary and preservation arrangements?
Implications for archival research and practice
To respond to such questions, we need to understand more about the changing landscape in which the archive is operating. We need to develop analytical techniques to identify important changes in the records and records creation over time, in form, format and function. We need to develop and evaluate tools for automatically processing and preserving large volumes of digital materials. And we need to understand and design for heterogenous user communities and their epistemes—by examining how they create, remember, seek, and use knowledge, and what they believe and how they trust. Even scholars in the humanities who have traditionally been prized most highly by archivists as users and around whom they have designed many of their descriptive systems, vary profoundly in how they seek, analyze and present the evidence upon which their research is based.
Firmly and effectively situating the archive within an increasingly digital age necessitates a multi-pronged approach: a solid grounding and critical reflection on the strengths and weaknesses of those aspects of archival theory and practice that relate to enduring problems for managing records, and heritage and memory as more broadly construed, regardless of their form, format or function; capability and willingness to augment archival approaches with those drawn from other information and memory-based professions and disciplines when these might be relevant and useful; more use of research and development by archivists to support knowledge-based practice; and increased application of new tools and strategies made possible by digital technologies.
So let me mention a few areas where specific attention is needed:
Appraisal
Increasingly, technophiles argue that in a world where digital storage gets less and less expensive, the procedures required to select and extract digital content of archival value from high volumes of digital material will cost more than those required to save everything, and that what we really should concentrate on is keeping everything and developing better retrieval methods. There are two counter-arguments that archivists have employed: technically, storing everything does not mean that everything will also be easily retrievable. In fact, no matter what heuristic or algorithm is used, retrieval of relevant information becomes exponentially more difficult the more information that is stored and also the more heterogeneous that information is. Moreover, the accessioning, description, and preservation to archival standards of so much material would be such an enormous endeavour that it could only be accomplished through end-to-end digital processing such as that being developed by the Electronic Records Archive of the U.S. National Archives.
More importantly, perhaps, and this is a very archival perspective, is that there are also a number of legal, social and emotional imperatives for certain kinds of information or documentation to be able to go away, and it is surprising just how hard it is to make something go away completely if it exists in digital form. It may not endure in a certifiably reliable and authentic state, but the gist of it may still be resurrectable from some accumulation of bits somewhere.
But, thinking radically, what if we were to abandon, even selectively, appraisal in the case of born-digital materials? There is no doubt that the abundance of digital documentation now being created does capture in various ways more aspects of contemporary society and institutional and disciplinary practices than previously were caught in any kind of more traditional media; and that retaining more would help archives counter the charge of being selective and elitist.
Perhaps the most fundamental challenges here for archivists are to their notions of life-cycle management. If archives are eventually to take control of any born-digital material, then they will need to get involved before the material is even created, in the design of the systems that will generate that material. This means working with software developers, systems designers and the actual creators of the material to ensure that any documentation created will be in an environment as little dependent upon proprietary software as possible; have sufficient security controls and metadata to document their reliability and authenticity in and over time; will capture and describe accurately their provenance, content and any rights considerations; and will ensure that materials that need to be preserved can indeed be “fixed” in an unalterable way and removed when appropriate from an active system to an archival system.
Preserving the digital
Archivists face two kinds of preservation issues in the digital realm. The first issue relates to the previous point of how to capture, preserve and make available, without depletion of their legal, cultural or historical evidential value, future born-digital archival materials that are being created today. While, as I previously mentioned, considerable thought has been given to the characteristics of legal evidence that need to be preserved in born-digital materials, less has been given to how to maintain or represent the evidence contained in the materiality of items selected to be digitized. For example, how an original object feels or smells, its aesthetic, its weight, its color rendering, if it is a moving image or audio item, its projection or playback speed.
The second issue relates to how to protect extensive investment in digitization by ensuring that digitized materials remain available and reliable over time and technological change. This is going to require that institutions also invest a priori in an ongoing digital preservation infrastructure. However, the sources of funding for digitization such as grant-funding, generally have not supported long-term sustainability of those digitized resources, and institutions need to find alternate means for establishing and maintaining digital preservation repositories. The answer may well lie in collaborative digital preservation or data repositories, either within a single institution where archival material might be stored together with bibliographic materials and research data sets, or through the development of multi-institutional shared digital archival repositories.
Research in recent years such as that done by the InterPARES Project tells us that the preservation requirements of born-digital materials in evidentially sound ways is going to raise the bar considerably higher than what is currently being contemplated or implemented by most digital preservation repositories, and will likely move archivists and creators alike toward preservation assessment that is based upon both risk and opportunity management scenarios. That is, where there is a perceived high likelihood that certain born-digital materials will be challenged as to their authenticity and where failing to stand up to that challenge could put creators or an institution at risk, or where the materials might be able to be used or re-used in innovative or enterprising ways, more will be invested in high-end preservation. The rest of the born-digital materials may be preserved according to less scrupulous preservation regimes.
For all digital materials, whether digitized or born-digital, one major lack right now are forecasting methods and metrics for understanding the economic dimensions of retaining large quantities of digital “stuff” at varying degrees of rigor over extended periods of time. This makes it difficult to project exactly what costs might be incurred or how the development of automated preservation processes or changes in such things as copyright legislation might affect those costs.
Multi-tasking metadata
Metadata plays a critical role in ensuring the creation, management, preservation, discovery, and use of trustworthy records and is, therefore, one of the most important areas of development for archives in a digital age. Metadata can support not only description and resource discovery at the collection and item-level, but it can also document archival administration and the various business and research contexts, legal and rights requirements, and technical specifications and functionality of the records themselves. With the aid of different types of metadata (and several schemes are currently in use, including RKMS and the specifications of ISO 23489 for recordkeeping, EAD, and various learning object metadata schemes), we can enhance archival user services in several ways. For example, we can scaffold content (an approach that has been successfully used in designing learning systems and digital libraries in educational settings. Students interact in a disciplinarily-appropriate way with a system that becomes increasingly more complex and sophisticated as they learn about its content and tools). We can also use highly granular meta data to support the development of highly curated educational or even entertainment modules. We can also build multi-lingual and multi-script interfaces and translators to assist in generating bi-lingual finding aids to archival holdings that are in other languages and scripts, although we still need to work on addressing semantic differences.
Cataloging or archival descriptions have been criticized for requiring users to have expert archival and discipline-specific vocabulary and an historian’s methodological approach and contextual knowledge. Online, however, where there may not be an archivist available to mediate reference access to materials, those descriptions can be augmented in a variety of ways, for example with pull-down lists of preferred or alternate search terms for the lay or student user, or for the scholar who approaches the material from a different disciplinary perspective. Social bookmarking and tagging, refereeing, and other Web 2.0 services and capabilities can be used, with appropriate indication of provenance and, by implication, of reliability, to allow users to annotate or tag resources with their own commentaries, often along with uploading their own related digitized or born-digital content, and to make these annotations and additional source material available to subsequent users.
Pointers can also be included that direct users to helpful secondary sources such as biographies and histories. Online finding aids can be searched, abstracted, reformatted and collated in multiple ways according to the information need of the online user. Top-down, collection-level provenancial access need no longer be the primary access point to archival materials. Users can do known-item searches and expand laterally and upward from there.
Metadata can also support the ability to collate digitized images and even sounds, from multiple collections into browsable galleries or to generate visual simulations that include links to supporting documentary evidence. By including alternate renderings and multiple resolutions, users can either order or download the version and resolution of an image that they desire. They can even go through an online rights clearance process if they wish to use an image in a lecture or a publication.
Impact of the digital on the traditional archive
What, then, is the impact of the digital archive on the traditional archive? We do not yet know all the answers to that. What we do know are the following: the need to address the implications of digital records for the archive have led to increased reflection on and examination of archival theory and practice, especially as these relate to what characteristics a digital record should exhibit and how best these can be preserved across software and hardware evolution. As archives have developed online finding aids and started to make available digitized versions of their holdings, their activities have increased in complexity, and workflow has changed, decisions have had to be made about what and how much to select for digitization, and new descriptive standards have been adopted. Certainly there are some kinds of users who can be served better using digital resources and online delivery than they ever could be through in-person visits to the physical archive.
Many archivists will attest to changes in the patterns of users coming to the physical archive. Most of these archivists seem to be finding that archival use, both of the digital and the physical archive, is increasing, with the digital archive serving as an advertisement and something of a “tickler” for the more extensive physical archive. Some comment that archival researchers are coming to the physical archive with different expectations, already having explored the archive online.
Some of the other things we know less about are the implications for reference in the digital environment and what forms digital reference might take. For example, what role should archival mediation play in the digital archive in cases where users never interact with a physical repository or archivist? And how best should digital or digitized collections be promoted to users? Even though more are going online, research data indicate that their use by some of the targeted audiences, such as scholars, is still low.
Conclusion
In summary then, digital developments, allow the archive to span time and space in new ways. The digital archive can offer users, both local and remote, more granular and customized access, even if at present to less content, than does the physical archive. The Internet is a vehicle that the archive can use to capture more, and more candid materials created through digital communications and the World Wide Web, as well as to support shared responsibility for documenting and collecting, federating collections online, and opening up the archive for a variety of user participations. The archive is in a perpetual state of evolution, but to thrive in the digital age, it must engage actively and enthusiastically with it.
And as to its content, to return both to Samaran’s quote, and to the questions raised in the example of the Japanese high energy physicists – the archive of tomorrow, and indeed of today, knows no media limitations, and is comprised of all of those things that they have raised. It holds the ancient, the precious, the official, the historical, the authentic. It also captures the scientific, the disciplinary, the social, the personal, the ethnographic, the testimonial and the documentation that substantiates rights, obligations and memory of all kinds.
Thank you.