Notes
Outline
Using the OAI Protocols with EAD

JCDL ‘02
Portland, Oregon
July 16, 2002
Christopher J. Prom
Assistant University Archivist
University of Illinois at Urbana-Champaign
prom@uiuc.edu
Thomas G. Habing
Research Programmer
University of Illinois Engineering Library
thabing@uiuc.edu
OAI for Cultural Heritage
Many DL projects for archives, manuscripts, photos, artifacts, objects
Encoded in different metadata standards
Research difficulties of humanities scholars, students
OAI framework originally for pre-prints
An interoperability protocol based on metadata harvesting
May allow metasearches across projects and data types
Uncover Hidden Web
UIUC Mellon Project Goals
Test feasibility of harvesting, searching cultural heritage with OAI
Develop data provider tools that produce usable OAI records from disparate sources
Build service provider tools for storage and retrieval of cultural heritage metadata
Re EAD:
assess structural problems in mapping to OAI
develop an effective crossmapping
allow basic searching in an OAI environment
test effectiveness of the search
provide proof of concept
EAD Background
For finding aids, not individual documents (but can link to)
Collective description
<eadheader>: metadata about finding aid
<archdesc> metadata about materials described in finding aid
“top level” elements
EAD Background
Multilevel Description
<dsc>: description of subordinate components
hierarchical <c01>, <co2>. . . <c12>
can include many tags in varied nesting; very flexible DTD
Wide range of possible tagging practices--encoding standards vary by institution
EAD/OAI Problems and Issues
Providing full context; Mining the <dsc>
Hierarchical inheritance in <dsc>
More Problems/Issues
What level of materials are being described; inconsistent use of level attribute
Flexibility in DTD and encoding practices
Lack of standardization, little use of content standards like APPM, LCSH, LCNAF, inconsistent date styles, name conventions.
Our Approach
Examine Encoding Standards
Mitigate inconsistent encoding practices
level attribute
<eadid>
Generate multiple OAI records for one EAD
<archdesc> top level record
mini records from <dsc>, with relation to top level
Preserve context for “hits” by linking user to finding aid in the search/retrieval mechanism
Recommended Crosswalk
To Simple Dublin Core
Top Level
flexible mapping, draws from <eadheader> and (mainly) <archdesc>
Key fields identifier, title, date, type, description, subjects, relation
<dsc>
provides metadata for the box listing
OAI records separate but related to top level
use of Xpointer to provide context
replicates hierarchical structure of EAD
XPointers
W3C Candidate Recommendation (2001-09-11)
Can identify XML fragments using a superset of the XPath syntax
#xpointer(//dsc[1]/c01[2]/c02[3]/c03[10]))
When EADs are split into their subordinate components, XPointers are used to identify the individual parts and link them together
Server-side scripts use the XPointers for rendering and linking
"<rdf:RDF>"
<rdf:RDF>
  <rdf:Description>
    <dc:identifier>
      http://…/…/test.xml#xpointer(//dsc[1]/c01[8]/c02[5]/c03[244])
    </dc:identifier>
    <dc:title>Toensing, Richard</dc:title>
    <dc:type>text</dc:type>
    <dc:type>archives or manuscripts</dc:type>
    <dc:type>file</dc:type>
    <dcterms:isPartOf>
      <rdf:Description>
        <dc:identifier>
          http://…/…/test.xml#xpointer(//dsc[1]/c01[8]/c02[5])
        </dc:identifier>
        <dc:title>Various Composers</dc:title>
      </rdf:Description>
    </dcterms:isPartOf>
  </rdf:Description>
</rdf:RDF>
Indexing and Retrieval Issues
Variations in consistency and quality of source EAD markup--currently being tested
Size of EAD finding aids: 1 EAD can result on 1000’s of DC records (many of marginal usefulness, but some very useful)
Frequently occurring search term can overwhelm results list with many parts of one EAD
Performance Issues
Splitting many EADs can be a time consuming batch process
Many marginally useful DC files can effect search performance--need a logic to discard these records
Disk space requirements can be large
Simple Search
http://oai.grainger.uiuc.edu/oai/search
Advanced Search
Search Results
Full Record
Hit in Context of Finding Aid
Outstanding Issues
Improved display of multiple search hits within a single EAD
Summary and detail views or hierarchical
Improve display of hit in context of finding aid
Filter out superfluous subordinate components
Normalization of various EAD data elements
Using the OAI Protocols with EAD

JCDL ‘02
Portland, Oregon
July 16, 2002
Christopher J. Prom
Assistant University Archivist
University of Illinois at Urbana-Champaign
prom@uiuc.edu
Thomas G. Habing
Research Programmer
University of Illinois Engineering Library
thabing@uiuc.edu