The testbed team has expended a great deal of effort in evaluating the available SGML full-text database management and retrieval software packages. After careful study, the Testbed Team chose the Open Text Corporation's Open Text Index search engine for indexing and accessing the DLI Project documents. The Open Text engine, originally developed at the University of Waterloo, is an extremely robust and expandable system that allows phrase, Boolean, and proximity searching and is tailored to SGML processing and retrieval. The Open Text engine provides us with:
Following from points 5), 6) and 7) above, the Open Text software provides us with a framework for modeling a federated system of distributed publisher document repositories, in which individual publishers can mount and maintain their separate document repositories based on their specific DTD. In this model, any designated subset of these discrete repositories can be accessed and documents then retrieved from an intelligent client based on expressed user needs and information requirements. Our expectation is that the client gateway function which will link user information needs-via comprehensive indexing stores-to documents in designated repositories will be an important component in retrieval across the Web. The retrieval enhancement techniques, such as vocabulary switching and co-occurrence table lists would be placed in the context of the client-indexing stores side of the process.
The specifications for metadata generation and normalizing the seven heterogeneous publisher DTDs have been written, tested, and deployed. Our metadata objects reflect what we regard as critical retrieval and display needs, but will be carefully tested in an end-user searching environment. Recent studies by the American Physical Society, Elsevier's Project TULIP review team, and our DLI Evaluation group support the usefulness of the metadata elements we have selected. We will continue to modify our metadata specification based on user feedback and comments from the professional community.
All of the Open Text software modules have been tested, including the parallel execution monitor software used to search across multiple databases. The first journals are available for public access, and we expect to see all publishers in production within the next several months.
To study access techniques and retrieval effectiveness over these full-text journals, we have designed and implemented a prototype client written in Visual Basic 3.0 operating in a Microsoft Windows environment. This custom front-end has been designed, from the onset, as a demonstration system to study full-text retrieval and explore functions that pose problems in a Web environment. The present limitations in Web search capability include the inability to maintain state or hold open the connection to the database, the difficulties in dynamically updating forms, and bandwidth limitations that preclude dynamically updating word wheels and like. In addition, the prototype DLI client is part of an overarching gateway client that provides access to remote and local information resources from a public workstation. Indeed, the integration of A&I service databases, online catalogs, locally mounted and remote periodical index databases, campus and Library maintained databases, and the full-text DLI Project data is an integral part of the UIUC comprehensive digital library system. Users typically desire relevant information from multiple, sometimes disparate resources in what are often multiple formats. In particular, the linking of A&I service databases with full-text document stores is an area that demands attention and will be investigated in this project.
The custom client utilizes a TCP/IP connection to our Open Text and Ovid servers and employs both OLE (Object Linking and Embedding) and DDE (Dynamic Data Exchange) methods to converse with the SoftQuad Panorama and Netscape Navigator display and rendering applications. The actual SGML documents are stored on an HTTP Server and use the Panorama/Netscape CCI (Command Interface) linkage to retrieve and display the SGML documents.
The custom client has been designed to serve as a rapid prototyping platform for exploring techniques to facilitate full-text document retrieval. The client employs intelligent multimedia interface techniques, such as voice synthesis, demonstration searches using successive screen capture and voice-over instruction, and full-motion video, to provide context-specific help and assistance. The interface is designed to assist end-users with search strategy formulation and navigation through the search process. The features implemented in the initial version of the custom client include:
One of the primary functions of the prototype client is to demonstrate interface and retrieval technologies that can be ported to the Web environment.
The Open Text engines does provide a word index generation capability, but because of its underlying phrase indexing structure, it does not provide the browsing and display of headings common to systems employing an inverted file indexing structure. To get around this problem, we have exported the word tables generated within OpenText into a Microsoft SQL Server database structure. These "word wheels" can then be used to display for users a letter-by-letter match with user-entered search arguments. This commonly used word wheel approach has been proven to minimize user spelling errors and suggest alternate word forms that enhance retrieval.