Interface Design and Database Structure for an SGML Document Retrieval
System
Presentation at the American Society for Information Science, October
10, 1995, Chicago, IL
William H. Mischo, Grainger Engineering Library Information Center,
University of Illinois at Urbana- Champaign
Digital Library Grant
$4 Million grant over four years
PIs: Bruce Schatz (GSLIS/NCSA), Ann Bishop (GSLIS), Bill Mischo (Library),
Joseph Hardin (NCSA).
Multidisciplinary team includes Sociology, Computer Science, Economics
Departments.
Testbed: SGML Full-Text of major Engineering Journals in testbed at
Grainger Engineering Library Information Center
Collection: Engineering, Computer Science, and Physics Journal Articles.
Software: Custom Windo Client, NCSA Mosaic, OpenText, Microsoft SQL
Server, Electronic Book Technologies (EBT), Dataware BRS, OCLC Newton.
Users: UIUC (Illinois), CIC (Midwest).
Illinois DLI Goals
A national digital model
- Construct large-scale digital library testbed using SGML text
documents; investigate indexing, retrieval, and display of SGML
- Perform research needed to scale this infrastructure (technology/sociology)
Year 1: 9/94--2/95 Tests on Sample Testbed Year 2: 3/95--2/96 Initial
Production; 1,000 Documents, 1,000 Users Year 3-5: 3/96--8/98 UIUC, CIC,
10K Documents, 10K-- 100K Users. Incorporation of Research
SGML
Common language of open documents systems
Permits documents to be treated as objects to be viewed, manipulated,
and output. Reveals Content and Structure of a document.
Viewer and converter support by:
Loading, Converting, Processing, Displaying SGML. Extensions to ISO 12083
Article DTD.
Experience gained with variety of publisher SGML formats and DTDs.
Testbed Issues: Search & Retrieval
- Bob Kahn: Greatest Weaknesses in Internet--document indexing, retrieval
and display.
- Develop a Model for Full-Text Retrieval of SGML Documents.
- Client-Server Database Structure and Indexing Techniques.
- SQL
- Dataware/BRS
- OpenText
- EBT
- Demonstration Interface:
- Multimedia
- User Search Strategy Formulation
- Implicit Boolean Operators
- Power of SGML
Technical Issues
- Interface Design for Full-Text Retrieval.
- Full-Text Retrieval
- Enhanced Searching and Browsing Capability of SGML Docments.
- Database Server Technology
- BRS Dataware (Z39.50)
- OpenText (PAT)
- SQL Relational Databases (TCP/IP)
- EBT (DynaWeb)
- Database Structure for Full-Text Searching and Browsing.
- Links between Controlled Vocabulary and SGML Documents.
Internet Operating Systems
- Enhanced Web Clients
- Multimedia capable
- Invoke other programs (CCI)
- Distributed Object Environment
- Web Server Technologies
- Sophisticated Retrieval Tools
- Spell Checking
- Search Trees
- Ranked Searching
- Expert Systems
Model for Publishing Scholarly Articles
- SGML Standard
- Repository System
- Distributed Model
- Search, Retreival, and Display
- Database Design Issues
- Future of the Scientific Journal
- Roles of Authors, Publishers, A& I Services, Libraries, Computer
Centers
Interface Design Principles
- No prescriptive models for an optimum interface design. (Grudin)
- No complete HCI theories (Fischer)
- Stable and complete guidelines for interface design are decades
away. (Schneiderman)
- Interface design rules sometimes conflict and one has to make
appropriate trade-offs...that best satisfies the needs. (Nielsen)
- Leaving the (interface)design to the users is the ultimate abdication
of the designers responsibility...it is absolutely essential for
the design to give the users a carefully thought out set of defaults.
(Nielsen)
Interface Design Priniciples (Part 2)
- Understand user needs and searching behavior.
- Database design is key for effective retrieval
- Test interface on real users
- Focus on iterative design cycle, relying on rapid prototyping.
(Gould and Lewis)
User Needs and Behavior: Full Text Searching
- Studies have compared full text retrieval with titles, abstracts and
controlled vocabularies.
- Potential problems as well as strengths: use caution! (Tenopir)
- Evaluative studies on the effectiveness of full text retrieval argue
for sophisticated search techniques.
- Proximity searching
- Ranking algorithms
User Needs and Behavior: Controlled Vocabulary Searching
Controlled Vocabulary
- Early studies show value of natural language and increased volume of
full text documents searched via natural language.
- Current thinking concludes that bibliographic information retrieval
systems should allow both natural language and controlled vocabulary searching.
- Robertson/Walker: Thesaurus based query expansion questioned.
User Mental Models
User Needs and Behavior
- Boolean operators & strategy formulations are difficult for users
to learn and apply.
- Subject and known-item searches result in high user failure rates.
- Users make spelling mistakes and form of entry errors.
- Reported user satisfaction does not imply true success.
- Users prefer computer-aided instruction or one-to one instruction from
library staff or peers vs. printed or formal training instructions.
- Users generally search infrequently; even frequent users
require retraining or refamiliarization with system features and services.
User Needs and Behavior: DLI Search and Retrieval Required
Feature Set (Part 1)
- Browse dictionary of database words (by Field)
- Browse lists of Headings, Author Entries, Controlled Vocabulary terms.
- Term proximity searching (same Section, Paragraph, Sentence and within
n words operator).
- Limit to selected, multiple SGML defined Fields.
- Stemming of terms
- Search modify mechanism without user-entered explicit Boolean and Proximity
operators.
User Needs and Behavior: DLI Search and Retrieval Required
Feature Set (Part 2)
- Quorum search mechanisms (partial match algorithms).
- Search expansion algorithms (suggestive prompts for) and search limit
methods (suggestive prompts for).
- Relevance ranking (ranked Boolean, high and low frequency words, cosine
function).
- Dynamic, context-sensitive, context-specific suggestive help.
- Multimedia demonstrations of search types.
- Show me more like this one.
Database Design
- Without a well-planned and useful database design and retrieval system,
the best interface will not increase user satisfaction.
- Need to accommodate the feature sets.
- Interface works at client level, to do automatic truncation, etc. and
then communicates with search database at server level.
Other Full Text Retrieval Projects......
- Red Sage: Univ. of California-San Fransisco; Springer-Verlag
- OCLC Electronic Journals On-line with Guidon Interface
- Cornell CORE (Chemistry On-line Retrieval Experiment) in cooperation
with the American Chemical Society.
- Elsevier TULIP Project; Materials Science Journals at nine universities.
- WILLOW Medical Journals with BRS COLLEAGUE
- IBM/Institute for Scientific Information
- Project Envision: Virginia Tech.
A Short Quiz.......
- User surveys show that functionality is more important that ease of
use? True
- Users find a Start button useful? True
- Controlled vocabulary thesauri provide useful entry term vocabulary
items? False?
- Users will enter a URL like
calliope.ncsa.uiuc.edu:800/hypernews/get/talk_1.html? True
Go back to the home page