HOME March 9, 2016 


Digital Libraries
By Richard Weyhrauch

Libraries have been thought of as the repository of physical books and their mission has been to organize these collections and to make them available to their audience. At the current time there is a lot of change going on as the availability of digital media becomes a reality and libraries are pressured, both by cost and availability, to make them a major part of their offering to their patrons. The issue addressed in this note is the not about how to make digital media more accessible but to ask how current and future technologies will change the very missions of modern libraries.

If you take seriously that the idea of a library should be a place designed to store knowledge then the accumulation of books to meet that goal needs reexamining. This type of reconsideration has happened before. First there was only oral traditoion and you sought knowledge by seeking out teachers—think Plato and Aristotle. By the middle ages 'libraries' were at best repositories of manuscripts, the hand written reports of thinking ... . The invention of movable type changed that forever. Books were born and facilitated both libraries and centralized corporate places (universities) where you could go to access knowledge, now guided by teachers. Scholars appeared. The important thing to note is that we live at a time where, like movable type, a new modality has emerged—the computer and massive storage. This presents the question: “What form will this new modality support for laerning?”

It not books that people want; it's the content of the books.

The Availabity of Databases

It is now commonplace for libraries to have online carelogues. This is a great step over 'card' catlogues, but it severely underutilizes this reasource. Both the amount of data that can be stored and the complexity that can be associated with this data is vastly underused. The storage of bibliographic data is, however, just a start. This has been recognized by many groups which I shall mention below but ne of the main points of this article is to point out how these efforts are 'in the right direction' but still lack a coherent vision and to propose a direction that can focus this effort with a large payoff. Let's start with one of my favorite examples.

Many of these ideas are grounded in thirty years of experience, at IBUKI, building database systems for storing all kinds of information in a way that is computationally usable.

What is Information/Data

The idea that simple bibliographic data is all the information about books make up the core of what a library can use, in my mind, is really outdated. The information, even restricted to books, that the patrons of a library would benefit from is no where near what a library currently offers.

Example: Works and Editions

Current (2017) bibliographic practice, in particular library cataloguing, generally fails to distinguish between the works of an author and their editions. Library card catalogues (even computer based ones) index an edition of a work. This is clearly because a library contains books, which are of course editions. However, if I say “'Call me Ishmeal.' is the first sentence of Moby Dick”, it is clear that I am refering to the work Moby Dick not what appears in some particular book, i.e., some edition of the work. To account for this distinction the IBUKI database has entries for both [work]s and [edition]s of the [work]. A work can have more than one edition and every edition will contain the text of some collection of works. Thus, the 'first appearence/edition' of the work 'On Shakespeare' by author John Milton appeared in the Shakespeare Second Folio (an edition containing some works of Shakespeare and and some worksof others). Without making the distinction between a work and its editions the preceding sentence (let alone extracting the information from text or a database) is hopeless. For this reason IBUKI distinguishes between an author's works and the editions of that work. In many cases, for example most academic journal articles, there is only one 'edition' of an article so informally it is common to overload mentioning the two, but to a computer this only leads to confusion.

To be able to make sense of the above sentence the IBUKI DataBase needs to have an entry for a work even if there is only one edition of the work!!!! This is particuliarly true of Journal articles which are usually not ever reprinted or minor works of scientists which only appear once.

This brings up the issue of what names (a unique ID(?)) should we use for these things

Naming Things

Here the situation gets interesting. And the good news is that some organizations have already made headway on this. eg VIAF has a unique number assigned to each author, independent of the way in which libraries write an authors name when it catalogues books by this author. This is exactly what is needed. The form of the ID does not really matter—VIAF uses integers—IBUKI used LISP symbols. What is important is that someone is taking the responsibity to make this unique assignment, which VIAF is taking for as many national librries and other groups who join. This idea has apparently caght on so Wikipedia no lists more than a dozen organizations trying to do the same. Although todays computers can handle this level of dissarray it is not the complexity that I aluded to above. I don't know how to accomplish the goal of a single authority organization I think its a desirable goal. Another aspect of this is that VIAF (with regards to people) does this job for authors. The Getty on the other hand does this for artists. They are both doing a good job bu they overlap!!! Some artists also write. Arrrrgh.

Both VIAF and the Getty have branched out. VIAF identifies some 'organizations' and the Getty actively works on 'geography'. Both of these institutions have been driven to create these databases because of the need to compute on the data thyey ahve stored and, for example, to know is two publishications are talking about the same person or live in the same place.

There are many other groups that are feeling the same need and are currently ... Perseus (ways of referencing classical works and Bird people/taxonomists for animals ...


NIL