I continue to receive many requests for the slides from my ALA2013 presentation on modeling the ESTC21 as an agnostic data store.* The technological challenge of the project is to design a system not only for managing the over 500,000 records in the union catalogue, but to develop a data store that will: (1) facilitate longterm record maintenance; (2) capture of an extensible set of information about the objects being described; and (3) allow low-overhead data input and output from and to an extensible set of cataloguing standards (MARC, BIBFRAME, TEI, etc.).
The presentation centers on the concept of what I call an agnostic data storage model. An agnostic data store is one that does not subscribe to any particular metadata or cataloguing standard. The conventional approach to cataloguing has been to store data in one of a variety of available standards and to crosswalk as necessary when a desire to present data in another format arises. This approach is flawed on two fronts. First, the effort involved in creating these crosswalks is not incidental. Second, and most importantly, it severely limits the cataloguing effort, as the cataloguer can only capture, in a structured manner, information that the base cataloguing standard recognizes—hence, for example, the infinitely growing MARC 500 field into which we have grown accustomed to throwing information on everything, including the kitchen sink, for which the MARC standard cannot account.
An agnostic data store seeks to overcome this limitation by providing an extensible and flexible model of data storage that will allow the cataloguer to add structured data elements to the base record as needed. As noted in the presentation, agnostic does not mean atheist. Utopian dreams of completely non-hierarchical data structures were the stuff of the SQML/HTML wars of the mid 1990s. A truly non-hierarchical text or catalogue record is a meaningless Bayesian bucket of words, or, perhaps, even letters, depending upon how far down the rabbit hole of chaos one wishes to travel. An atheistic data store would, by extension, store all data on all objects in a single, useless Blob field. The foundational cognitive equation of the human mind is, “meaning = structure.”
An agnostic data store admits the necessity of an intelligent design of its own universe, but it strives to impose this design at the most basic semantic level, thereby allowing for as much creativity as possible within the system. Rather than saying, for example, “All works have authors”, an agnostic store abstracts this to something like, “Things can be works; and things can be agents; and things can be acted upon by other things; and a type of acting is authoring.” This may seem like an unnecessarily complex statement; however, capturing the full semantic depth of what we are really saying when we say that a work is authored by someone allows for a greater deal of flexibility and extensibility in the root system. When we take the “Subject->Predicate” structure as the base of the data storage model rather than a pre-determined set of information buckets, we create a system capable of growth and flexibility.
Click here to download the presentation power point.
Click here for more information on the Andrew W. Mellon funded ESTC21 project.
* Presentation given in collaboration with Brian Geiger, Director of the Center for Bibliographic Studies and Research, University of California, Riverside