Library archivists face contractual, technical challenges in preserving digital materials

Oct. 30, 2014, 1:32 a.m.

Archivists at Stanford libraries face contractual and technical challenges in keeping an increasing amount of digital material, like eBooks and email, safe and accessible for future generations.

For one, words in paper books don’t spontaneously disappear, but words in eBooks can. Because eBooks and electronic journals are licensed, not owned, libraries may not be able to ensure long-term access to them. Depending on the contract between the publisher and the library, publishers can sometimes remove or alter content without the library’s consent.

According to Hannah C. Frost, services manager at the Stanford Digital Repository, this issue is a long-standing problem for research institutions like Stanford.

“Sometimes we’ve invested a considerable amount of money in those subscriptions and then…the publisher changes their mind, or goes out of business, and suddenly we don’t have access to content that we had paid for,” Frost said.

Peter Chan, digital archivist at Stanford’s Born-Digital/Forensics Lab, said libraries have become “smarter and smarter” at standing up for their rights. Now that they know what to expect when negotiating with publishers, they can insist on long-term access.

To assist in this negotiation, Stanford created LOCKSS, or Lots of Copies Keep Stuff Safe, which is the first and only mechanism to apply the traditional purchase-and-own library model to electronic materials.

Participating publishers agree to make their digital books available to libraries over the long-term. Libraries can then place their electronic collections safely in online LOCKSS Boxes — digital bookshelves that the libraries control access to. Stanford started LOCKSS, but any library can join.

“Collaboration with other institutions is a really important part of all of this,” Frost said.

Of course, even if libraries can secure perpetual access to content, that content still has to be laboriously preserved. This is where technical challenges arise.

“Digital preservation is much more difficult than preserving paper or physical artifacts,” Chan said. “You can’t just keep hard drives in a climate-controlled box and expect them to stay readable.”

“Digital media can corrupt much more easily than paper,” he added.

Instead, Chan builds tools to extract the data from CDs, floppy disks, computers and the like and transfer it to more stable storage on Stanford’s servers.

There’s also the issue of ensuring that files are preserved in a stable format, which is similar to the eBook problem. Because of licensing, if a company that makes a particular kind of software goes out of business, files built to be read with it will be unreadable (records without a record player). So, data extracted from disks and computers also has to be converted to the kind of format that doesn’t depend on the consumer technology market.

The sheer volume of material archivists deal with can be overwhelming, too.

“There is sort of a backlog of legacy stuff that had come in…20, 25 years ago,” said Glynn Edwards, manager of the Born-Digital Program. “Nobody’s staffing levels are sufficient to keep up with…processing.”

But the longer you wait to extract data from a floppy disk, the more danger there is that the data will deteriorate. If data isn’t captured quickly enough, the means of capturing it may become obsolete.

Just a few years after Chan built two machines to capture data from 5.25-inch floppy disks, the company manufacturing the necessary components to make such machines stopped making them.

However, extracting the information is just half of the battle. The information must be accessible in the future.

“You can take the data off a hard drive and stick it in a digital repository and say it’s preserved,” Edwards said. “[But] no one can access it, no one can discover it, no one can use it.”

Edwards and Chan are working together on ePADD, a tool that will allow individuals and repositories to interact with email archives. ePADD will enable users to search, for instance, poet Robert Creeley’s email correspondence for information regarding a particular person or place name.

“I would emphasize that it’s important not to take the ease with which you produce and use digital content…for granted,” Frost said. “As amazing as [technology] is, it’s also a vulnerability, because of how things can change. And things will change, that’s the one thing we know.”

Contact Abigail Schott-Rosenfield at aschott ‘at’ stanford ‘dot’ edu.

Abigail Schott-Rosenfield '18 is a news staff writer and graduate student outreach strategist at the Daily. Contact her at aschott 'at' stanford.edu.

Login or create an account