Building the Interspace:Digital Library Infrastructure
for a University Engineering Community
Bruce Schatz, Principal Investigator
University of Illinois DLI project
thabing@uiuc.edu
DLI Project-Wide Workshop
November 9, 1995 Santa Barbara, CA
Research on the Net
The Past: Access
The Net fetches documents
The Present: Organization
The Net searches repositories
The Future: Analysis
The Net correlates information
From the Internet (data transmission)
to the Interspace (information manipulation)
Project Goals
- Semantic Federation (research)
-
- Distributed Repostories (infrastructure)
-
- Scientific Literature (testbed)
- evaluate large testbed
- perform technology research
Organizations
Grainger Engineering Library Information Center
(part of University Library, UL)
NCSA Software Development Group
(National Center for Supercomputing Applications)
Graduate School of Library & Information Science (GSLIS) + Sociology, Economics
GSLIS + Computer Science (CS), NCSA
Principal Investigators
- Bill Mischo, UL, Testbed Lead
- Tim Cole, UL, testbed collection
- Joseph Hardin, NCSA, Infrastructure Lead
- Beth Frank, NCSA, infrastructure software
- Ann Bishop, GSLIS, Testbed Evaluation
- Bruce Schatz, GSLIS, Technology Research
- executive committee: UL, NCSA, GSLIS, CS
Primary Partners
- journal/magazine publishers:
IEEE Computer Society
Institute of Electrical and Electronics Engineers (IEEE)
American Society of Civil Engineers (ASCE)
American Society of Agricultural Engineers (ASAE)
American Physical Society (APS)
American Institute of Physics (AIP)
American Astronomical Association (AAS)
John Wiley & Sons
INSPEC, Compendex (Engineering Index)
- testbed: SoftQuad, EBT, OpenText , Hewlett-Packard
- infrastructure: OCLC, CNRI, Spyglass, Microsoft
DLI Collaborators
- Stanford interoperability experiment
- Michigan search interface, OpenText SGML
- CNRI/Cornell secure object store
- Carnegie-Mellon NetBill charging trial
???
- UC Santa Barbara GIS correlations
- UC Berkeley image processing
- Carnegie-Mellon network video
Illinois Project Groups
process, index, search, display SGML collection
multiview interface to Web distributed repositories
usage and users of testbed with information context
semantic retrieval, both manual & automatic
analysis environments with objects & semantics
Testbed Goals
Large Organized Collection
- SGML pipeline direct from the publishers (deposit)
- complete articles fully tagged and indexed (search)
Large Number of Users
- faculty/students around UIUC then Big Ten
- Internet interface with multiple views (display)
Careful Sociological Evaluation
- needs assessment, usability studies
- surveys, instrumentation
Testbed Status
Year 2 (Sep 95 -> )
- production pipeline for a few journals
- testing production database and components
- Grainger Library (public terminals, usability studies)
- Beckman Institute (physics), Computer Science Dept
- implement full Web version for deployment
Year 3: University of Illinois
Year 4: CIC Universities (midwest Big Ten)
goal is 100,000 documents & 100,000 users
Testbed Collection
- full SGML from Jan 1995 forward
- production on AIP (2000), APS, ASCE
- multi-year archive from IEEE, IEEE CS
-
- ISO 12083 DTD with figures, tables, equations
-
- publisher hands-on workshop Nov 16-17
- plans for publisher-maintained repositories
- problems with rendering scientific literature
Testbed Components
- Gathering: SGML from publishers
- Processing: normalize (federate) tags
- Indexing: store term/tag lists
- Control: VisBasic search interface
- Searching: OpenText fulltext engine
- Displaying: Panorama SGML viewer
- Fetching: Mosaic gets SGML/DTD files
federated SGML repository across the Net
Production Web Version
Client Interface
- multiple view interface (Java)
- session control across repositories
Network Gateway
- repository protocols (SQL, Z39.50)
- maintains state of search history
Server Search
- deposit canonicalized SGML documents
- index using DTDs for full-text retrieval
Multiple Views
- Different Levels of Search Interface
- Drag-and-Drop between views
- Integrates A& I (Abstracting and Indexing)
- Term Suggestion followed by Text Search
- Subject Thesaurus for coarse-grain suggest
- Concept Space for fine-grain suggest
- Visual Basic prototype, Java for multiplatform
New Web Servers
- HTTP servers evolve into Object Repositories
- NCSA Web server 2.0 released December
- Modular Steps towards Repositories
- Multiple Procotols (HTTP, Keep-Open)
- Security, Metadata Checking, Link Maintain
- Stateful Gateways support Distributed Sites
- Towards Sessions in Later Versions
Sociological Evaluation
different methodologies in granularity & scale
Needs Assessment
- ethnographic observations in libraries and labs
- focus groups and user interviews
Testbed Evaluation
- conceptual framework for evaluation
- planning for usability tests of pre-productions
- development of system instrumentation
Community for Social Studies of DLs
- Allerton Institute Oct 1995 on concepts & methods
- sessions at DL95 and DL96 on user research
Technology Research
towards the Interspace: Net correlation
- Scalable Semantic Retrieval
concept spaces and vocabulary switching
- Distributed Object Stores
secure object infrastructure (w/CNRI, Cornell)
- Analysis Environment Systems
correlation of information across repositories
Semantic Retrieval
automatic indexing of concepts
- find context of phrases within documents
- generates a concept space based on term frequency
useful for interactive searching
- given a term, can suggest other terms
- merging concept spaces supports vocabulary switching
concepts require supercomputing
- concept space for INSPEC took 1 day on SGI Challenge
- co-occurrence matrix for 400K abstracts
Analysis Environment
- objects fine-grain manipulation
- navigation & grouping path recording
- retrieval & classification concept spaces
- correlations path matching via concept spaces
- prototype in Smalltalk, ObjectStore, ILU
- application in personal info, DLI, GIS
The 21st Century: Analysis
- Beyond Search to Analysis
- Cross-Correlating Information from many sources across the Net
- The Net solves problems
- Every community has its own special library
- Every community & every person does A& I !!
Go back to the Home Page