Annual Progress Report for 4th Quarter 1996
Program Plan for Period 4 (3/97 to 2/98)
DLI Project, University of Illinois

Federating Repositories of Scientific Literature

Bruce Schatz, PI, schatz@uiuc.edu

Research Plans

Testbed

Now in formal production for 5 publishers and 50 journals, the Testbed will increase both the depth and breadth of the digital collection. Coverage of the Institute of Electrical Engineers (IEE) journals will be extended from 1993 to the present and the American Society of Civil Engineers (ASCE) database will contain a full two years plus of articles. In addition, the breadth of the Testbed will be increased by the addition of several more Institute of Electrical and Electronic Engineers Computer Society (IEEE CS) titles, the Journal of Applied Physics and Review of Scientific Instruments from American Institute of Physics (AIP), and the AIAA Journal from the American Institute of Aeronautics and Astronautics. Negotiations with several additional scientific publishers are also underway. We expect this year to have a full SGML collection of 30,000 articles from 60 journals from 8 publishers.

This increased coverage should help increase usage as the number of users increases around the University of Illinois campus. All articles are received directly from publishers as the print journals are published and stored in fully federated canonical SGML tagged form. The federated search and the Opentext search engine will thus get its first significant workout during this next period. The link we have built to the Ovid search engine will provide text search of bibliographic abstracts from Inspec and Compendex, to greatly extend the range of coverage beyond the subset of full-text articles. Now that our publisher consortium has concluded its negotiations with SoftQuad, the resulting mathematics displayer will be incorporated when ready into our Testbed. This will provide the first adequate interactive display of mathematics for scientific journals containing SGML markup. Finally, the first distributed SGML repository will be installed, at the American Institute of Physics home base, as a clone of our setup at Illinois. This was funded by a contract to us from AIP.

Leads: Bill Mischo, Tim Cole

Internet

The IODYNE multiple view client will begin running with connections to the production databases and be deployed around campus during this period. Due to limitations in the SGML displayers and to avoid wasted research effort to track rapidly evolving Web infrastructure, this will be a PC-Windows Visual Basic version. Drag and drop will be supported transparently across term suggestion and full-text search engines. The client will support stateful search sessions, with both term suggestion and full-text search for both SGML articles and bibliographic abstracts.

The INSPEC subject thesaurus provides term suggestion from human indexers. INSPEC abstracts are provided for full-text via a Z39.50 connection to a local OVID server and for term suggestion via a concept space for 400K abstracts computed on a supercomputer. The full-text SGML articles from the Testbed are provided for text search via a socket connection to the local Opentext server and for term suggestion via a concept space for the Testbed collection computed on a supercomputer using our new parallel algorithm. This will be the first large concept space generated from full-text materials. The Internet client will thus be an integration of our Research results into our Testbed collections.

Leads: Eric Johnson, Kevin Gamiel

Research

The complete engineering concept spaces utilizing 10,000,000 abstracts from Compendex and Inspec within 1000 community repositories across all of engineering will be incorporated into the Interspace environment prototype. This prototype is a Smalltalk interface to multiple search indexes across CORBA servers for concept spaces, full-text search, category maps, and vocabulary switching. Implementations of concept spaces on parallel supercomputers will be tested to do real-time query refinement. We will also begin experimenting with continual updating of the concept spaces for the Testbed collection with these new algorithms.

The collaboration with the UC Santa Barabara DLI project will come to fruition with concept spaces for 5000 images of aerial photographs with similarity matching of texture units interlinked via a gazeteer to 500,000 text abstracts on geography subjects. User experiments will be done on these integrated collections to evaluate the capabilities of concept spaces for multimedia semantic retrieval.

Leads: Hsinchun Chen, Bruce Schatz

Evaluation

Large-scale usage testing (especially transaction log instrumentation) will be done in this period to complement the small-scale testing (interviews, sessions) done this year. This is linked to the deployment around the University of Illinois campus of a low-end Web interface, based on the Opentext Latitude/LiveLink server, which runs within a Web browser. This client will enable users to perform simple federated search on our SGML Testbed collection. The Internet multiple view client, which integrates the concept spaces from the Research efforts with the structure search from the Testbed efforts, will also be deployed on a more limited basis and evaluated in special focus group interview sessions. Both of these have already undergone usability testing.

Ethnographic studies of all the information sources used by engineers will be carried out, to understand the context of journal usage within the information flow. These will follow-up on our previous studies on a laboratory of physicists with studies of other laboratory populations in engineering disciplines who are using our Testbed clients. Evaluation of the range of effectiveness of different methodologies will be begun, in preparation for a complete study of how to evaluate digital library users and usage.

Leads: Ann Bishop, Leigh Star

Financial Report

see attached budget and itemizations for main Grant to University of Illinois and for subcontract to University of Arizona

Management Report

Organization Chart

Principal Investigator: Bruce Schatz (DLRP)

Testbed:
Search Indexing: Bill Mischo (Grainger Library)
Collection Development: Tim Cole (Grainger Library)

Internet:
Multiview Client:: Eric Johnson (DLRP)
Search Gateway: Kevin Gamiel (NCSA)

Research:
Semantic Retrieval: Hsinchun Chen (Univ of Arizona)
Interspace Environment: Bruce Schatz (DLRP)
Performance Evaluation: Roy Campbell (CS)

Evaluation:
User Studies: Ann Bishop (GSLIS)
Context Studies: Leigh Star (GSLIS)

Testbed:
Programmer (Collection): Bob Ferrer (Grainger)
Programmer (Search): Maria Pflaum (Grainger)
Programmer (Display): Donal OíConnor (Grainger)
Students: Han Wen Hsiao (Grainger)

Internet:
Programmer (Client): Eric Johnson (DLRP)
Programmer (Gateway): Bill Wentling (NCSA)
SGML Standards: Tom Magliery (NCSA)

Research:
Programmer (Algorithms): Dorbin Ng (Arizona)
Programmer (Environments): Kevin Powell (DLRP)
Students (Information Science): Bill Pottenger, Conrad Chang (DLRP)
Students (Computer Science): Yongchen Li (CS), Bob McGrath (NCSA)

Evaluation:
Students: Bob Sandusky, Laura Neumann, Cecelia Merkel, Emily Ignacio (all GSLIS)

Coordination:
Partners: Susan Harum (DLRP)
NSF: Ben Gross (DLRP)

DLRP = Digital Library Research Program, University Library
Grainger = Grainger Engineering Library Information Center, University Library
NCSA = National Center for Supercomputing Applications
GSLIS = Graduate School of Library and Information Science
CS = Department of Computer Science, University of Illinois at Urbana-Champaign

Partners List

Publishers:

AIP American Institute of Physics (Applied Physics)
APS American Physical Society (Theoretical Physics)
AAS American Astronomical Society
ASCE American Society Civil Engineers
ASAE American Society Agricultural Engineers
ASME American Society Mechanical Engineers
AIAA American Institute Aeronautics & Astronautics
IEEE Institute of Electrical and Electronics Engineers
IEEE CS IEEE Computer Society
IEE Institution of Electrical Engineers (British)
EI Engineering Information (Compendex)
OSA Optical Society America
John Wiley
Elsevier Science
Academic Press
AAAS American Association Advancement Science

Software and Hardware:

SoftQuad
OpenText
Hewlett-Packard
Microsoft
OCLC

DLI Projects:

Santa Barbara (GIS semantic retrieval)
Carnegie-Mellon (NetBill charging software)
Stanford (Search Interoperability)
Michigan (User Interfaces)

Go back to the DLI progress reports page

DLI Home | Glossary


University of Illinois at Urbana-Champaign Digital Libraries Initiative
Comments to: External Relations Coordinator, Tom Habing
2/12/96