Building the Interspace: Digital Library Infrastructure
for a University Engineering Community
Bruce R. Schatz, Principal Investigator
University of Illinois Digital Library project
thabing@uiuc.edu
DLI Project-Wide Workshop
April 24, 1995 Urbana, IL
Goals
Construct large-scale testbed
digital library of structured documents
- SGML collections: engineering/science journals/magazines
- professional users: faculty/students at Big Ten universities
- production library for effective search and display
- production software for propagation to the Internet
Perform underlying research for
effective interaction across networks
- technology development and sociology evaluation
- Net interaction (Interspace architecture)
Organizations
Testbed
- Grainger Engineering Library Information Center (part of University Library, UL)
Infrastructure
- NCSA Mosaic (part of National Center for Supercomputing Applications, NCSA)
Evaluation
- Graduate School of Library & Information Science (GSLIS) + Sociology, Economics
Technology
- GSLIS + Computer Science (CS), NCSA
Principal Investigators
- Bill Mischo, UL, Testbed Lead
- Tim Cole, UL, testbed collection
- Joseph Hardin, NCSA, Infrastructure Lead
- Larry Jackson, NCSA, infrastructure software
- Ann Bishop, GSLIS, Testbed Evaluation
- Bruce Schatz, GSLIS, Technology Research
- executive committee: UL, NCSA, GSLIS, CS
Partners
journal/magazine publishers:
- IEEE Computer Society
- Institute of Electrical and Electronics Engineers (IEEE)
- American Society of Civil Engineers (ASCE)
- American Institute Aeronautics& Astronautics (AIAA)
- American Physical Society (APS)
- American Institute of Physics (AIP)
- American Astronomical Association (AAS)
- John Wiley & Sons
testbed: SoftQuad, Dataware, Hewlett- Packard
infrastructure: Spyglass, CNRI, CNIDR, Microsoft
evaluation: CMU (NetBill economics)
technology: Univ Arizona, Univ Wisconsin
Central Library Model
USER--------LIBRARY
request-------repository
CLIENT------SERVER
books in a physical library
Distributed Library Model
USER----LIBRARY----PUBLISHER
request----reference------repository
CLIENT------GATEWAY------SERVER
documents in a digital library
Publishing Cycle
USER..........................request
LIBRARY................reference
INDEXER...................classify
PUBLISHER............edit/filter
AUTHOR..................generate
difficult even for documents due to
multiple types (DTD for structure)
heterogeneous styles (DSSL for display)
different classification schemes (for search)
Community Model
USER----------MEDIATOR
repository-----------metadata
peer-peer not client-server
users are authors (publishing cycle joins)
mediators are library indexer publisher
who provide search classify quality
Testbed Status
Initial Tests Period 1 (9/94-2/95)
- collection, software, users, evaluation
Initial Production Period 2 (3/95-2/96)
- goal is 1000 documents & 1000 users
- Grainger Library (public terminals, usability studies)
- Beckman Institute (physics), Computer Science Department
Period 3: University of Illinois
Period 4: CIC Universities (midwest Big Ten)
- goal is 100,000 documents & 100,000 users
Testbed Collection
journal/magazines in engineering/science
text/figures/tables/equations/sidebars
- IEEE Computer Society
- APS (American Physical Society)
- AIP (American Instititute Physics)
- ASCE (Civil Engineers)
- AIAA (Aerospace Engineers)
- AAS (American Astronomical Society)
engineering bibliographic databases
distributed document repositories
scientific image databases with links
Testbed Software
Front end Display
- Mosaic: Client interfaces (CCI), Net fetch
- SGML viewer from SoftQuad (Panorama)
Back end Search
- stateful gateway (CGI) for search history
- full-text engine from Dataware (BRS)
- modify standard DTDs for search functions
Testbed Evaluation
Interviews (Qualitative Data)
100 in Focus Groups [first wave done]
Surveys (Quantitative Data)
1000 in Periods 3 and 5
Instrumentation (Transaction Logs)
100,000 record structured access
different methodologies in granularity & scale
towards nethnography: quanti-> qualitative
Technology Research
towards the Interspace: Net interaction
Semantic Retrieval
- manual thesaurus versus automatic relations
System Scaling
- user/usage measurements, URNs/URCs
New Architectures
- design prototype for peer-peer interaction
- design/implement next generation system
Semantic Retrieval
- human indexer (manual classification)
- hierarchical terms (classify) correct but general
- machine indexer (automatic classification)
- related terms (co-occur)--specific but incorrect
-
- manual thesaurus via collaborations
- INSPEC thesaurus, Dewey subject classification
- automatic thesaurus via supercomputing
- concept space for INSPEC took 1 day on SGI Challenge
- multiple displays for different classifications
System Scaling
- user/usage system measurements
- structure, retrieval, search, display
- instrument all components of system
-
- document classification in the Net (A& I)
-
- URNs: unique invariant document names
- NCSA in collaboration with CNRI (handles)
- URCs: author indexing for all documents
- NCSA in collaboration with OCLC (metadata)
New Architectures
- from library model to community model
- design/implement Interspace environment
-
- support Net interaction and analysis
- cross-correlation from multiple sources
- peer-peer information systems architecture
-
- incorporate research and testbed efforts
- apply to DLI, CAN (NASA datasets), WCS
Interspace Protocols
- structured objects form information space
- linked information spaces form Interspace
-
- interaction interoperability (type coercion)
- analysis navigation (path recording)
-
- communication for people interaction (notify)
- wrappers for program interaction (invoke)
Go Back to the Home Page