"Alice in Wonderland, the Bible, Shakespeare, the Koran ... will be with us as long as civilisation. An operating system, a program, a mark-up system will not1."
Michael Hart, Project Gutenberg
Introduction: "Technological Quicksand"2
Very large amounts of money are currently being poured into digitisation projects, and more and more information is born digital and exists in no other form. In 1996, however, a special US Taskforce on Digital Archiving3 drew the world's attention to the alarming fact that owing to ongoing technological change and the rapidity with which technologies now grow obsolescent, a vast amount of digitally generated information is effectively vanishing.
Both software and hardware are in a continuous state of evolution. Text files frequently cease to be accessible after a few years. Whereas the timeframe for action in preserving traditional paper-based materials can be scores or even hundreds of years, with digital material it is likely to be between two and five years.
Computer developments tend to lack backward compatibility:
you may have a disc that will survive for a century, but what
if you no longer have the technological capability to read it?
And what if the data were vital or irreplaceable? While a range
of problems is associated with preserving digital material much
online information is ephemeral, for example, and the average
lifespan of a website is said to be six weeks4
perhaps the most serious difficulties follow from changes in formats,
coding, software, hardware and operating systems that can leave
the material unreadable.
A High Profile Case
Some commentators have talked about a digital black hole or even anticipated a digital Dark Age. Over the last year the UK news media have shown considerable interest in a case that dramatically illustrates some of the issues surrounding digital preservation and technological obsolescence: the BBC Domesday Project. The Project made its appearance on 25 November 1986, in celebration of the 900th anniversary of the UKs earliest and most famous public record, known as the Domesday Book.
In 1085, King William I of England (the Conqueror) ordered a survey
of the countrys wealth, land rights, and tax and military service
obligations; historians have suggested that William may have been
spurred by a desire to tighten his grip on power, or by a threat
of invasion from the Danes. The resultant book, which first appeared
in 1086 and originally consisted of two volumes, records in extraordinary
detail Englands land-holdings in the late 11th century. Put together
with astounding speed, it runs to 913 pages and some two million
hand-written words, describing over 13,000 places in England and
Wales. According to the Anglo-Saxon Chronicle,
"not even one ox, nor one cow, nor one pig...escaped notice6."
The book remains today a valid legal document, a basic source for historical researchers, and the foundation stone of the UK national archives. It was nicknamed Domesday after the final Day of Judgment; the author of Dialogue Concerning the Exchequer (1179) wrote:
... just as the sentence of that strict
and terrible Last Judgment cannot be evaded by any art and subterfuge,
so, when a dispute arises in this realm concerning facts which
are written down, and an appeal is made to the book itself, the
evidence it gives cannot be set at nought or evaded with impunity7.
BBC Domesday Project
The Domesday Project8 resulted from
a collaboration between the BBC, Acorn, Philips, and Logica; overall
it cost £2.5 million (around $3.75 million) and can now be seen
as a landmark in the imaginative and innovative use of information
It was conceived in 1983 by an experienced BBC Television producer and filmmaker, Peter Armstrong. At this time multimedia was a tremendously exciting, much talked-about technology, pregnant with possibilities for education and the future of archiving. Armstrong envisaged a modern-day equivalent of the Domesday Book that would harness some of the potential of multimedia and provide a detailed snapshot or time capsule of British life in the mid-1980s, as seen by the people themselves. The target market would mainly be schools, universities, and libraries. The substantial presence of government-subsidised microcomputers in UK schools made it possible to ask the schools to conduct surveys of their local communities, and the results were to be combined with centrally-captured statistical, visual and written information.
The project involved about 60 BBC staff. As it developed about one million people participated, including children (mainly of primary school age) at 14,000 schools, as well as journalists, academic researchers, cartographers, statisticians, and amateur and professional photographers. Schoolchildren investigated local land use and wrote on local occupations and activities, people, cultural and sporting facilities, and the built and natural environment, typing their reports into their BBC Microcomputers and sending them off to the BBC on floppy discs. The text was left largely unedited, with spelling mistakes uncorrected (the only alterations were made for legal reasons). A vast archive of material was collected, which included some 200,000 photographs, 24,000 maps, 8,000 data sets, and 60 minutes of moving pictures.
The hardware was developed over a period of two years. The material was encoded, primarily digitally, on two specially developed, two-sided 12 inch laser vision discs, usable only with a new model of Philips LaserVision player, the VP 415 LV-ROM, called the Domesday player. The player was controlled by a BBC Master microcomputer or a Research Machines Nimbus. Highly innovative application software written by Logica provided a multimedia front end to the great store of data.
The creators said that to view all the data would take over seven years of working hours. The very first pair of discs was presented to the UK Public Record Office (PRO) at Kew, London, where they were placed beside the Domesday Book itself.
Fate of the Project
The original intention was to sell the product at around £1,100 ($1,650) but escalating costs led to the discs and hardware being made available as a package in 1986 at a price of over £4,000 ($6,000). While it was bought by higher education institutions, the price appears to have proved too high for much of the target market, and uptake was limited. Many of the contributors never saw the results of their work.
It seems fair to say that BBC Domesday was conceptually ahead of its time. When the project began, the possibility of using CD-ROM technology had to be ruled out because, aside from its relatively limited capacity for storing images, there was at this time no CD-ROM standard. IBM PCs and clones were used by businesses but were not present in UK schools. The Domesday multimedia application software was written in a language called BCPL, which did not come to be widely adopted; but in allowing for a high degree of interactivity, non-linear exploration and searching, and the overlaying of several kinds of information in a single view, the product was radical and prescient.
By the opening years of the new millennium, BBC Domesday stands on the brink of total obsolescence. The number of educational institutions and individuals who possess working systems is now dwindling rapidly, and in many cases the videodisc players are coming to the end of their working lives. After many years of use, the discs themselves are likely to be scratched and prone to error. Many parts of the complex hardware/software combination are incompatible with present-day computer systems.
It was this situation, as brought to their attention by the work of the CAMiLEON project (see below), which led to a sudden rekindling of UK media interest in the Domesday Project. Some news sources took up the story that this unique, monumental and once seemingly leading-edge resource had become almost entirely inaccessible. They found a certain piquancy in the situation: after over nine centuries, the original Domesday Book can still be consulted (provided you can decipher the hand-written Latin); the modern multimedia digital equivalent was unreadable after a mere decade and a half.
The positive side to this story is that researchers have now succeeded in developing a software program called an emulator, which imitates the function of the Domesday hardware and runs the project on a present-day computer, retaining the look and feel of the original.
The emulator was developed by CAMiLEON9 - Creative Archiving at Michigan and Leeds: Emulating the Old on the New. Since 1999 CAMiLEON has been researching and evaluating technical strategies for long-term digital preservation, with the support of the UK Joint Information Systems Committee (JISC) and the US National Science Foundation (NSF). The very scale and complexity of the challenge presented by BBC Domesday, together with its intrinsic historical value, led to its being adopted as a proof of concept test case for the innovative approach to preservation developed by the Michigan/Leeds project.
In the course of the rescue operation about 70 gigabytes of data were transferred from each side of the laser discs to bytestreams, which can be accessed on current hardware. CAMiLEONs approach, known as migration on request, keeps the data in its original abstract bytestream thereby removing many problems of hardware compatibility while maintaining a tool which allows the data to be migrated to a new platform at any future point. The project has found that restricting migration to a single step greatly reduces costs and the likelihood of error or loss of data associated with traditional migration from format to format through time. Techniques have also been developed to ensure the longevity of the emulation software tools themselves.
The BBC Domesday emulation10 was demonstrated at the University of Leeds in December 2002 at a meeting that included members of the original Domesday project team and some former schoolchildren who had contributed. A parallel project, supported by the PROs Digital Preservation department, has very recently produced a digital video disc (DVD) version of the community-related information collected by the Domesday Project, which is now accessible to the public at Kew. However it remains unclear at this stage whether or not BBC Domesday will again become generally available as a commercial product.
Towards Long Term Preservation?
Several years after the digital archiving Taskforces warning, digital preservation
issues are no longer solely the specialised concern of heritage
organisations and libraries. In the United States, this fact is
perhaps reflected in the $99.8 million funding which Congress
allocated in December 2000 for the development of a National Digital Information
Infrastructure and Preservation Program (NDIIPP) 11.
This effort to develop a national strategy on born digital information
will involve collaboration between the Library of Congress and
a wide range of federal, research, and business organisations.
The UK has seen the launch of the Digital Preservation Coalition
(DPC) 12, a consortium of 19 major
information organisations, including JISC, the Public Record Office,
and the British Library, aimed at raising awareness, sharing knowledge
and expertise, and promoting concerted action on long-term digital
preservation. A range of new legislation in the UK, including
a Freedom of Information Act,
is placing a statutory obligation on organisations to archive
and manage electronic documentation.
BBC Domesday was conceived on a heroic scale,
and its recovery was a massive effort. Paul Wheatley13,
project manager of CAMiLEON, has said that preservation work should
really have been undertaken at least ten years previously, while
the creators and vital documentation were still likely to have
been readily available. Coming late to digital preservation can
lead to the necessity for full-scale retrieval which will inevitably
be a costly exercise. Preservation Management of Digital Materials,
a handbook supported by the DPC and now available online, advises
the most cost-effective means of ensuring
continued access to important digital materials is to consider
the preservation implications as early as possible, preferably
at creation, and actively to plan for their management throughout
their life- cycle14.
Suzanne Keene, a specialist in museum management, has suggested
that the first principle of digital preservation is: Decide at the
time when it is created how long the material is to last15.
A preservation policy should recognise that long-term digital assets
such as databases demand:
careful attention to standards, metadata, their technological
basis, to maintaining off-site copies, and to strategic planning
for their future technological path.
Conclusion: Forever or Five Years
Digital preservation issues are now everybody's business. Perhaps the most immediate lesson to be drawn from the BBC Domesday Project is that, while digital information can be endlessly copied and in principle will never deteriorate, there is danger in assuming that once created it is eternal. Jeff Rothenberg, a powerful advocate of emulation techniques, likes to say: Digital documents last forever or five years, whichever comes first.
(BBC News - Technology - Digital records "obscure the past")
(Avoiding Technological Quicksand, Rothenberg, Jeff. CLIR Reports January 1999)
(Taskforce on Digital Archiving, US)
- Can you archive the Net? (Times 29 Apr 2002, Section 2 p.4-5.)
(Public Records Office - 11th Century - Domesday Book: Virtual Museum)
(Public Records Office - Virtual Museum - Document Icons - Domesday)
(BBC History - Domesday Book - Norman Conquest)
(CAMiLEON: Emulation and BBC Domesday)
- The National Digital Information Infrastructure and Preservation Program and its implications for a research agenda for digital preservation. (Library and Information Research News 26 Winter 2002 p.32-40)
(An update on the Digital Preservation Coalition D-Lib Magazine 8 (4) Apr 2002)
(Migration: a CAMiLEON discussion paper Ariadne (29) Oct 2002)
(Preservation Management of Digital Materials (Handbook))