|
|
|
Christopher J. Prom |
|
Assistant University Archivist |
|
University of Illinois at Urbana-Champaign |
|
prom@uiuc.edu |
|
Thomas G. Habing |
|
Research Programmer |
|
University of Illinois Engineering Library |
|
thabing@uiuc.edu |
|
|
|
|
Many DL projects for archives, manuscripts,
photos, artifacts, objects |
|
Encoded in different metadata standards |
|
Research difficulties of humanities scholars,
students |
|
OAI framework originally for pre-prints |
|
An interoperability protocol based on metadata
harvesting |
|
May allow metasearches across projects and data
types |
|
Uncover Hidden Web |
|
|
|
|
|
Test feasibility of harvesting, searching
cultural heritage with OAI |
|
Develop data provider tools that produce usable
OAI records from disparate sources |
|
Build service provider tools for storage and
retrieval of cultural heritage metadata |
|
Re EAD: |
|
assess structural problems in mapping to OAI |
|
develop an effective crossmapping |
|
allow basic searching in an OAI environment |
|
test effectiveness of the search |
|
provide proof of concept |
|
|
|
|
|
For finding aids, not individual documents (but
can link to) |
|
Collective description |
|
<eadheader>: metadata about finding aid |
|
<archdesc> metadata about materials
described in finding aid |
|
“top level” elements |
|
|
|
|
|
|
|
Multilevel Description |
|
<dsc>: description of subordinate
components |
|
hierarchical <c01>, <co2>. . .
<c12> |
|
can include many tags in varied nesting; very
flexible DTD |
|
Wide range of possible tagging
practices--encoding standards vary by institution |
|
|
|
|
Providing full context; Mining the <dsc> |
|
Hierarchical inheritance in <dsc> |
|
|
|
|
What level of materials are being described;
inconsistent use of level attribute |
|
Flexibility in DTD and encoding practices |
|
Lack of standardization, little use of content
standards like APPM, LCSH, LCNAF, inconsistent date styles, name
conventions. |
|
|
|
|
|
Examine Encoding Standards |
|
Mitigate inconsistent encoding practices |
|
level attribute |
|
<eadid> |
|
Generate multiple OAI records for one EAD |
|
<archdesc> top level record |
|
mini records from <dsc>, with relation to
top level |
|
Preserve context for “hits” by linking user to
finding aid in the search/retrieval mechanism |
|
|
|
|
|
To Simple Dublin Core |
|
Top Level |
|
flexible mapping, draws from <eadheader>
and (mainly) <archdesc> |
|
Key fields identifier, title, date, type,
description, subjects, relation |
|
<dsc> |
|
provides metadata for the box listing |
|
OAI records separate but related to top level |
|
use of Xpointer to provide context |
|
replicates hierarchical structure of EAD |
|
|
|
|
W3C Candidate Recommendation (2001-09-11) |
|
Can identify XML fragments using a superset of
the XPath syntax |
|
#xpointer(//dsc[1]/c01[2]/c02[3]/c03[10])) |
|
When EADs are split into their subordinate
components, XPointers are used to identify the individual parts and link
them together |
|
Server-side scripts use the XPointers for
rendering and linking |
|
|
|
|
<rdf:RDF>
<rdf:Description>
<dc:identifier>
http://…/…/test.xml#xpointer(//dsc[1]/c01[8]/c02[5]/c03[244])
</dc:identifier>
<dc:title>Toensing,
Richard</dc:title>
<dc:type>text</dc:type> |
|
<dc:type>archives or manuscripts</dc:type> |
|
<dc:type>file</dc:type> |
|
<dcterms:isPartOf> |
|
<rdf:Description> |
|
<dc:identifier>
http://…/…/test.xml#xpointer(//dsc[1]/c01[8]/c02[5])
</dc:identifier> |
|
<dc:title>Various Composers</dc:title> |
|
</rdf:Description> |
|
</dcterms:isPartOf> |
|
</rdf:Description> |
|
</rdf:RDF> |
|
|
|
|
Variations in consistency and quality of source
EAD markup--currently being tested |
|
Size of EAD finding aids: 1 EAD can result on
1000’s of DC records (many of marginal usefulness, but some very useful) |
|
Frequently occurring search term can overwhelm
results list with many parts of one EAD |
|
|
|
|
Splitting many EADs can be a time consuming
batch process |
|
Many marginally useful DC files can effect
search performance--need a logic to discard these records |
|
Disk space requirements can be large |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Improved display of multiple search hits within
a single EAD |
|
Summary and detail views or hierarchical |
|
Improve display of hit in context of finding aid |
|
Filter out superfluous subordinate components |
|
Normalization of various EAD data elements |
|
|
|
|
Christopher J. Prom |
|
Assistant University Archivist |
|
University of Illinois at Urbana-Champaign |
|
prom@uiuc.edu |
|
|
|
|
|
Thomas G. Habing |
|
Research Programmer |
|
University of Illinois Engineering Library |
|
thabing@uiuc.edu |
|