Project Summary
DATE PREPARED: March 5, 1998
ORGANIZATION: University of Illinois at Urbana-Champaign
PRINCIPAL INVESTIGATOR: Bruce Schatz
CANIS, 704 S. Sixth St., Champaign IL 61801
schatz@uiuc.edu (217) 244-0651 fax (217) 333-6869
TITLE OF EFFORT: Federating Repositories of Scientific Literature
ACCESS INFORMATION: http://dli.grainger.uiuc.edu
OBJECTIVE:
We are developing the technology to effectively handle digital libraries of scientific literature on the Net. Particular attention is being paid to federating repositories of structured documents. The technology includes a Testbed of searchable journals obtained in SGML format direct from publishers, Internet software for effective search of this Testbed, and Research to enable semantic federation across repositories in different subject areas. Results will include digital library software and sociological evaluation of its use with hundreds of users searching tens of thousands of documents.
APPROACH:
The Testbed efforts include obtaining SGML of journal articles in a direct pipeline from major publishers in engineering and science, then indexing these into a single federated collection, which can be searched and displayed using the structure of the documents. The large-scale Testbed is operational on the campus of the University of Illinois at Urbana-Champaign, within the context of a major engineering library and integrated with its other online services as a production facility.
The Internet efforts are developing the Gazebo search gateway. Gazebo allows clients to perform searches on collections of remote data sources simultaneously w/o having to reformulate the queries for the data source's protocols or configuration. Currently, Z39.50 data sources are supported and Opentext support is on the way. Gazebo translates abstract queries into these underlying protocols, submits them, and returns results to its client asynchronously. It will enable IODyne to dynamically discover the attribute space it's configured for, so IODyne will be able to perform searches on previously-unknown data sources.
The Research efforts are developing automatic indexing technology for the content of documents instead of the structure. In particular, they are investigating vocabulary switching across subject domains in engineering and science, by computing concept spaces on large document collections using supercomputers. The protocols for concept spaces are also being embedded as the fundamental infrastructure of a new network information system, called the Interspace environment, which supports information analysis by correlation across repositories.
The Evaluation efforts study the users and use of the technology developed above, with a variety of methodological techniques. Usability studies using the developed clients are done on sample populations. Contextual investigations study the total information usage of the user population beyond the text documents supported in the Testbed. Large-scale user studies, when the clients are propagated around university campuses, utilize instrumentation and surveys.
PROGRESS:
The Testbed with production facilities is now accessible via the WWW to UIUC faculty, staff and students, with continual streams from 5 publishers, currently comprising 40,000 complete SGML articles. The Internet client is operational and connected to the production sources. The Research has computed concept spaces for 10,000,000 abstracts in 1000 subfields across all of engineering. Research is proceeding on large image collections. The Evaluation has performed usability studies on several versions of several clients and interviewed several contextual focus groups.
RECENT ACCOMPLISHMENTS:
* Testbed: production pipeline from multiple publishers in physics (AIP, APS), civil engineering (ASCE), and electrical engineering (IEE, IEEE CS) available via the WWW. All materials processed into canonical SGML and placed into a federated repository with currently 40,000 articles. Testbed personnel are in the process of providing forward and backward links from bibliographic citations to other items in the testbed and providing links from testbed record bibliographic citations to the INSPEC and Compendex databases.
* Internet: development of the Gazebo search gateway. Gazebo allows clients to perform searches on collections of remote data sources simultaneously w/o having to reformulate the queries for the data source's protocols or configuration. Currently, Z39.50 data sources are supported and Opentext support is on the way. Gazebo translates abstract queries into these underlying protocols, submits them, and returns results to its client asynchronously. It will enable IODyne to dynamically discover the attribute space it's configured for, so IODyne will be able to perform searches on previously-unknown data sources.
* Research: extension of concept space algorithms into images. Testbed on collection of aerial photographs via collaboration with the University of California at Santa Barbara DLI project. GIS interoperability demonstration produced across image, text, and number.
* Evaluation: collaborative design of DeLIver (testbed) interface with the Testbed team; research on various aspects of work practices in the changing digital environment; implemented mechanisms for capturing online data related to testbed use; developing methods for producing summative reports of testbed use; building a research community focusing on social aspects of DL design and evaluation.
* Publishers: annual Partner Workshop held to build collaborations.
PLANS:
* Testbed: continue to increase both the depth and breadth of the digital collection, providing forward and backward links from bibliographic citations to other items in the testbed and links from testbed record bibliographic citations to the INSPEC and Compendex databases. Continue to investigate indexing, searching, and displaying of the heterogeneous full-text repositories in testbed. Extend testbed access to Big 10 university consortium.
* Internet: prototype multiple view client will continue work on repository configuration objects (RCO), which will allow clients to work with distributed resources in a unified way. Digital librarians will be able to assemble collections of RCOs of interest to the research communities they serve, and deliver these collections dynamically to clients over the Internet.
*Research: GIS interoperability prototype extended to first large collection to test interoperability across text and image. User evaluations on term suggestion via subject thesauri and concept spaces as search boost to bibliographic search and SGML full-text.
* Evaluation: collection of summative evaluation data from DLI testbed users through surveys, focus groups and observations of use in context. Continuation of capture and analyzation of records of online user activities. Final reports describing and discussing use of DLI testbed will be available.
* Publishers: annual Partner Workshop to discuss continuation of the testbed after end of DLI initiative through an industrial partners program and funding from DARPA.
TECHNOLOGY TRANSITION:
The Testbed repository will become an educational resource for the University of Illinois and other universities in the Big Ten. The University Library at Illinois is committed to continuing this resource as a campus facility beyond the DLI grant. The other CIC universities in the Big Ten have committed to using the Testbed on an experimental basis.
We are collaborating with our publishing partners to help them set up their own SGML repositories. In particular, the AIP (American Institute of Physics) is paying us to clone our software and hardware setup at their home site so they can maintain their own repositories of their own journals. Our publishers have requested much continuing support that we are unable to provide within the confines of our research project on the DLI grant. Thus we will establish an industrial partners program to extend the research related to the testbed after the DLI initiative grant ends.
Significant Event
This past grant period saw the completion of the final phase of the design and implementation of the Web client (DeLIver) with UIUC campus-wide distribution on October 15, 1997. Prior to distribution, DeLIver was tested by select user groups to insure the usability of the interface and the performance of the system. After actively promoting the system and soliciting responses from users, usage of DeLIver (the testbed) increased dramatically, proving that the web client has had a significant impact (200 new users within six weeks of the initial distribution). There are currently around 300 regular users across campus.
The testbed team is also in the process of implementing an Ovid proxy system which provides links to full-text articles in DeLIver from Ovid bibliographic databases (INSPEC and Compendex). This will provide the advantage of providing access to over 4 million citations from 4,000 journal and conference titles. Items available in full-text format from DeLIver can be retrieved and displayed from an Ovid citation short record.
The UIUC DLI will be a recipient of funding to continue the testbed. DARPA will be providing $100K per year for three years to support the continuation of the UIUC DLI testbed. This funding will supplement funding from the UIUC Digital Library Industrial Partners Program, which is in start-up phase at this time.