ILLINOIS DIGITAL LIBRARY INITIATIVE

NSF/ARPA/NASA Digital Library Grant

William H. Mischo


Digital Library Grant

$4 Million grant over four years.

PIs: Bruce Schatz (GSLIS/NCSA), Ann Bishop (GSLIS), Bill Mischo (Library), Joseph Hardin (NCSA). Multidisciplinary team. Also includes Sociology, Computer Science, Economics Departments.

Testbed: SGML Full-Text of major Engineering Journals in testbed at Grainger Engineering Library Information Center.

Collection: Engineering, Computer Science, and Physics Journal Articles.

Software: Custom Windows Client, NCSA Mosaic, OpenText, Microsoft SQL Server, EBT, Dataware BRS, OCLC Newton.

Users: UIUC (Illinois), CIC (Midwest).


Illinois DLI Goals

a national digital model

Construct large-scale digital library testbed using SGML text documents; investigate indexing, retrieval, and display of SGML

Perform research needed to scale this infrastructure (technology/sociology)


Illinois DLI Testbed

100K documents and users

Computer Science, Engineering, Physics journals

Engineering faculty and students

NCSA metacenter Production

University Library Development


Illinois DLI Research

scaling up in the real world

Computer Science: system integration, information retrieval, interface design, large-scale internet retrieval

Information Science: SGML processing, indexing, retrieval

Architectures: information systems

User evaluation: surveys/sociology

System economics: access charges


Publisher Partners

provide SGML for this project


Developer Partners

Provide software or resources

Client Software

Server Software

Commercial Systems

Intellectual Property

Industrial Applications


SGML

Common language of open document systems.

Permits documents to be treated as objects to be viewed, manipulated, and output.

Reveals Content and Structure of a document.

Viewer and converter support by:

Loading, Converting, Processing, Displaying SGML.

Extensions to ISO 12083 Article DTD.

Experience gained with variety of publisher SGML formats and DTDs.


VALUE-ADDED ELEMENTS

Enhanced Document Annotation, Workgroup Mechanisms.

Dynamic Links between Documents.

Linking of Periodical Index Databases with Full-Text Availability.

Creation of Customizable databases that allow the Creation of “Virtual Magazines” based on User Search Interests.

Knowbot Agents that find Information.

Linking UIUC Digital Library with other NSF DLI Projects.


Testbed Issues: Search & Retrieval

Bob Kahn: Greatest Weaknesses in Internet document - indexing, retrieval and display.

Develop a Model for Full-Text Retrieval of SGML Documents.

Client-Server Database Structure and Indexing Techniques.

Demonstration Interface:


TECHNICAL ISSUES

Interface Design for Full-Text Retrieval. Full-Text Retrieval

Enhanced Searching and Browsing Capability of SGML Documents.

Database Server Technology

Database Structure for Full-Text Searching and Browsing.

Links between Controlled Vocabulary and SGML Documents.


Interface Design Principles (Part 1)

No prescriptive models for an optimum interface design. (Grudin)

No complete HCI theories (Fischer)

Stable and complete guidelines for interface design are decades away.(Schneiderman)

Interface design rules sometimes conflict and one has to make appropriate trade-offs...that best satisfies the needs. (Nielsen)

Leaving the (interface) design to the users is the ultimate abdication of the designer’s responsibility...it is absolutely essential for the design to give the users a carefully thought out set of defaults. (Nielsen)


Interface Design Principles (Part 2)

Understand user needs and searching behavior.

Database design is key for effective retrieval.

Test interface on real users.

Focus on iterative design cycle, relying on rapid prototyping. (Gould and Lewis)


User Needs and Behavior:

Full Text Searching

Studies have compared full text retrieval with titles, abstracts and controlled vocabularies.

Potential Problems as well as strengths: use caution! (Tenopir)

Evaluative studies on the effectiveness of full text retrieval argue for sophisticated search techniques.

Early studies show value of natural-language and increased volume of full text documents searched via natural-language.

Current thinking concludes that bibliographic information retrieval systems should allow both natural- language and controlled vocabulary searching.

Robertson/Walker: Thesaurus based query expansion questioned.


User Mental Models

User Needs and Behavior

Boolean operators & strategy formulations are difficult for users to learn.

Subject and known-item searches result in high user failure rates.

Reported user satisfaction does not imply true success.

Users prefer easy to use, quickly learned, and inexpensive search systems.

Users prefer computer-aided instruction or one-to-one instruction from library staff or peers vs. printed or formal training instructions.

Users generally search infrequently; even frequent users require retraining or refamiliarization with system features and services.


User Needs and Behavior:

DLI Search and Retrieval Required Feature Set (Part 1)

Browse dictionary of database words (by Field)

Browse lists of Headings, Author Entries, Controlled Vocabulary terms.

Term proximity searching (same Section, Paragraph, Sentence and Within ‘n’ words operator).

Limit to selected, multiple SGML defined Fields.

Stemming of terms.

Search modify mechanism without user-entered explicit Boolean and Proximity operators.


User Needs and Behavior:

DLI Search and Retrieval Required Feature Set (Part 2)

Quorum search mechanisms (partial match algorithms).

Search expansion algorithms (suggestive prompts for) and search limit methods (suggestive prompts for).

Relevance ranking (ranked Boolean, high and low frequency words, cosine function).

Dynamic, context-sensitive, context-specific suggestive help.

Multimedia demonstrations of search types.

Show me more Like this One.


Database Design

Without a well-planned and useful database design and retrieval system, the best interface will not increase user satisfaction.

Need to accommodate the feature sets.

Interface works at client level, to do automatic truncation, etc.. and then communicates with search database at server level.


Other Full Text Retrieval Projects...

Red Sage: Univ of California-San Francisco; Springer-Verlag.

OCLC Electronic Journals On-line with Guidon Interface.

Cornell CORE (Chemistry On-line Retrieval Experiment) in cooperation with American Chemical Society.

Elsevier TULIP Project; Materials Science Journals at nine universities.

WILLOW Medical Journals with BRS COLLEAGUE

IBM/Institute for Scientific Information.

Project Envision: Virginia Tech.


Screen Examples

Three identifiable types of interfaces:

Folio, Lotus Notes, Mac Rosebud, Willow, ASCE, UMLS, XWAIS, MV Mac, Mercury, ELSA, Envision, Guidon.


Internet Operating Systems

Enhanced Web Clients

Multimedia capable

Invoke other programs (CCI)

Distributed Object Environment

Web Server Technologies

Sophisticated Retrieval Tools


Model for Publishing Scholarly Articles

SGML Standard.

Repository System.

Distributed Model

Search, Retrieval, and Display.

Database Design Issues.

Future of the Scientific Journal.

Roles of Authors, Publishers, A & I Services, Libraries.


Go back to the home page