$4 Million grant over four years.
PIs: Bruce Schatz (GSLIS/NCSA), Ann Bishop (GSLIS), Bill Mischo (Library), Joseph Hardin (NCSA). Multidisciplinary team. Also includes Sociology, Computer Science, Economics Departments.
Testbed: SGML Full-Text of major Engineering Journals in testbed at Grainger Engineering Library Information Center.
Collection: Engineering, Computer Science, and Physics Journal Articles.
Software: Custom Windows Client, NCSA Mosaic, OpenText, Microsoft SQL Server, EBT, Dataware BRS, OCLC Newton.
Users: UIUC (Illinois), CIC (Midwest).
a national digital model
Construct large-scale digital library testbed using SGML text documents; investigate indexing, retrieval, and display of SGML
Perform research needed to scale this infrastructure (technology/sociology)
100K documents and users
Computer Science, Engineering, Physics journals
Engineering faculty and students
NCSA metacenter Production
University Library Development
scaling up in the real world
Computer Science: system integration, information retrieval, interface design, large-scale internet retrieval
Information Science: SGML processing, indexing, retrieval
Architectures: information systems
User evaluation: surveys/sociology
System economics: access charges
provide SGML for this project
Provide software or resources
Client Software
Server Software
Commercial Systems
Intellectual Property
Industrial Applications
Common language of open document systems.
Permits documents to be treated as objects to be viewed, manipulated, and output.
Reveals Content and Structure of a document.
Viewer and converter support by:
Loading, Converting, Processing, Displaying SGML.
Extensions to ISO 12083 Article DTD.
Experience gained with variety of publisher SGML formats and DTDs.
Enhanced Document Annotation, Workgroup Mechanisms.
Dynamic Links between Documents.
Linking of Periodical Index Databases with Full-Text Availability.
Creation of Customizable databases that allow the Creation of Virtual Magazines based on User Search Interests.
Knowbot Agents that find Information.
Linking UIUC Digital Library with other NSF DLI Projects.
Bob Kahn: Greatest Weaknesses in Internet document - indexing, retrieval and display.
Develop a Model for Full-Text Retrieval of SGML Documents.
Client-Server Database Structure and Indexing Techniques.
Demonstration Interface:
Interface Design for Full-Text Retrieval. Full-Text Retrieval
Enhanced Searching and Browsing Capability of SGML Documents.
Database Server Technology
Database Structure for Full-Text Searching and Browsing.
Links between Controlled Vocabulary and SGML Documents.
No prescriptive models for an optimum interface design. (Grudin)
No complete HCI theories (Fischer)
Stable and complete guidelines for interface design are decades away.(Schneiderman)
Interface design rules sometimes conflict and one has to make appropriate trade-offs...that best satisfies the needs. (Nielsen)
Leaving the (interface) design to the users is the ultimate abdication of the designers responsibility...it is absolutely essential for the design to give the users a carefully thought out set of defaults. (Nielsen)
Understand user needs and searching behavior.
Database design is key for effective retrieval.
Test interface on real users.
Focus on iterative design cycle, relying on rapid prototyping. (Gould and Lewis)
Full Text Searching
Studies have compared full text retrieval with titles, abstracts and controlled vocabularies.
Potential Problems as well as strengths: use caution! (Tenopir)
Evaluative studies on the effectiveness of full text retrieval argue for sophisticated search techniques.
Early studies show value of natural-language and increased volume of full text documents searched via natural-language.
Current thinking concludes that bibliographic information retrieval systems should allow both natural- language and controlled vocabulary searching.
Robertson/Walker: Thesaurus based query expansion questioned.
User Needs and Behavior
Boolean operators & strategy formulations are difficult for users to learn.
Subject and known-item searches result in high user failure rates.
Reported user satisfaction does not imply true success.
Users prefer easy to use, quickly learned, and inexpensive search systems.
Users prefer computer-aided instruction or one-to-one instruction from library staff or peers vs. printed or formal training instructions.
Users generally search infrequently; even frequent users require retraining or refamiliarization with system features and services.
DLI Search and Retrieval Required Feature Set (Part 1)
Browse dictionary of database words (by Field)
Browse lists of Headings, Author Entries, Controlled Vocabulary terms.
Term proximity searching (same Section, Paragraph, Sentence and Within n words operator).
Limit to selected, multiple SGML defined Fields.
Stemming of terms.
Search modify mechanism without user-entered explicit Boolean and Proximity operators.
DLI Search and Retrieval Required Feature Set (Part 2)
Quorum search mechanisms (partial match algorithms).
Search expansion algorithms (suggestive prompts for) and search limit methods (suggestive prompts for).
Relevance ranking (ranked Boolean, high and low frequency words, cosine function).
Dynamic, context-sensitive, context-specific suggestive help.
Multimedia demonstrations of search types.
Show me more Like this One.
Without a well-planned and useful database design and retrieval system, the best interface will not increase user satisfaction.
Need to accommodate the feature sets.
Interface works at client level, to do automatic truncation, etc.. and then communicates with search database at server level.
Red Sage: Univ of California-San Francisco; Springer-Verlag.
OCLC Electronic Journals On-line with Guidon Interface.
Cornell CORE (Chemistry On-line Retrieval Experiment) in cooperation with American Chemical Society.
Elsevier TULIP Project; Materials Science Journals at nine universities.
WILLOW Medical Journals with BRS COLLEAGUE
IBM/Institute for Scientific Information.
Project Envision: Virginia Tech.
Three identifiable types of interfaces:
Folio, Lotus Notes, Mac Rosebud, Willow, ASCE, UMLS, XWAIS, MV Mac, Mercury, ELSA, Envision, Guidon.
Enhanced Web Clients
Multimedia capable
Invoke other programs (CCI)
Distributed Object Environment
Web Server Technologies
Sophisticated Retrieval Tools
SGML Standard.
Repository System.
Distributed Model
Search, Retrieval, and Display.
Database Design Issues.
Future of the Scientific Journal.
Roles of Authors, Publishers, A & I Services, Libraries.