INTRODUCTION

The University of Illinois at Urbana-Champaign (UIUC) Testbed collection, comprised of full-text SGML supplied by five professional society publishers, has grown to approximately 60,000 articles as of July 1, 1999. The five publishers providing the Testbed with full-text are the: American Institute of Physics, American Physical Society, American Society of Civil Engineers, Institution of Electrical Engineers, and the Computer Society of the Institute of Electrical and Electronics Engineers. Negotiations are underway with several commercial publishers to add a number of new titles to the Testbed corpus. Studies of how the testbed is being exploited by end-users continue at UIUC. Researchers at the Department of Computing (DIKU), Copenhagen University and Oregon State University are also currently conducting studies involving the testbed.

In the second quarter of the D-Lib Test Suite Program, the UIUC Testbed Team has been focusing their work on several activities that incorporate new Web-based technologies and, in particular, reflect emerging trends in document representation and dissemination. The Team’s development work has centered on 1) improvements to rendering of mathematics in web-accessible XML documents, 2) dynamic conversion of stored XML documents to HTML and XML schemas that can be rendered by current releases of Netscape Navigator and Microsoft Internet Explorer, 3) management of links between digital objects, 4) continued development and testing of testbed search interface enhancements, and 5) term co-occurrence analysis of testbed documents (the results of which will be used in the next quarter to create term suggestion tools). Members of the Team are also involved in the metadata sub-group of the Working Group on Digital Library Metrics.

The recent development efforts described here will be incorporated in the next release of the testbed interface, expected in August 1999.

MATHEMATICS RENDERING

One of the primary foci of the Testbed Team has been on improving and enhancing the rendering of display Mathematics in a Web-based environment. Several approaches are being explored. In one approach, software has been developed to convert ISO 12083 SGML marked-up mathematics to TeX for translation to bit-mapped images (.GIF files) for rendering within a browser. This provides a mechanism for the accurate rendering of mathematics in HTML (and XML) supported Web browsers (all of which display bit-mapped images natively). The tradeoff is in the time and bandwidth involved in passing these images across the Internet and the inability to search and manipulate the bit-mapped mathematics.

The second and more exciting approach being implemented involves using CSS and DHTML scripts to style the mathematics for HTML and XML displays. While the CSS style sheets provide the framework for the mathematics displays, the actual logic for the accurate rendering lies in the script (.asp) pages that convert the XML marked-up mathematics to HTML or XML. These script pages contain the algorithms for kerning and sizing of integrals, fences, square roots, and matrix equations. In demonstrations to and by our publisher partners, there has been a great deal of enthusiasm and support for this approach. This approach allows for searching of equation elements, re-sizing of display mathematics within the browser, and manipulation of equations by end-users.

A separate Web-based mathematics search module has been implemented which provides pull-down lists of operations, named constants, and laws. Alternatively, the user can enter any desired element of an equation into the entry fields.

In addition, the Testbed Team is continuing to explore the use of the MathML (Markup Language) and XSL (XML Style Language) standards being developed by W3C standardization bodies.

DYNAMIC CONVERSION TO HTML AND OTHER XML FORMATS

Much work has been done in dynamically converting the well-formed XML into formats that can be displayed in the current popular web browsers (Netscape Navigator and Microsoft Internet Explorer). We have successfully implemented three different approaches.

One approach is to convert the XML back to SGML for display with the Panorama plugin. This is the least interesting approach and was done primarily for backward compatibility for our existing user base.

A second approach has been to convert the XML into HTML with appropriate Cascading Style Sheets (CSS). For the present, this approach is the most viable, since HTML is supported by every common web browser.

A third approach has been to convert the well-formed XML into an XML schema that, along with appropriate Cascading Style Sheets, can be rendered in an XML-aware browser (currently only Internet Explorer 5.0). This is the approach with the greatest long-term appeal, but is currently complicated by missing or incomplete support for various XML standards by the browser. This necessitates the use of HTML embedded within the XML (via the XML namespace mechanism) for most interesting cases, such as hyperlinks, images, tables, math, and others.

One of the key advantages of the dynamic conversion which is common to all three of the above approaches is metamerge. Metamerge is the on-the-fly combining of the full-text article with the metadata. This allows the rendered article to contain much value-added information that was not contained in the original article, such as links to cited and citing articles, links to a A & I services, etc.

DIGITAL OBJECT LINK MANAGEMENT

The Testbed Team is working with AIP on the DOI-X prototype (Metadata Exchange for Reference Linking - Phase 1- Journal Article Reference Linking Project). For this project, we will be creating and registering Digital Object Identifiers (DOI) for our full collection of AIP articles, and we will begin exploring the use of those identifiers for cross-article linking within the Testbed.

In addition to the DOI, we are also exploring other linking technologies such as Scholarly Link Specification Framework (SLinkS), as well as publisher-specific technologies such as AIP’s Reference Resolution Engine or APS’s Link Manager, which we currently use for linking to the PDF versions of APS articles.

A proposal was submitted for a cross-repository DLIB Test Suite metadata index.

SEARCH INTERFACE ENHANCEMENTS

The Testbed Team is continuing to explore and evaluate various enhancements to the search interface, such as the author searching and browsing capabilities, simultaneous searching across multiple resources, linking to and from various other resources such as A & I services, and term suggestion via co-occurrence analysis. Much of the above was described in the previous quarterly report.

TERM CO-OCCURRENCE ANAYLSIS

Bill Pottenger of NCSA has completed the generation of co-occurrence tables over the entire Testbed collection. The Testbed Team is currently investigating use of these tables for term suggestion in the search interface. We expect to have working prototypes of different term suggestion ideas by the end of the third quarter.

VISITORS AND PRESENTATIONS DURING 2nd QUARTER OF PROJECT

In the second quarter, Tim Ingoldsby from the American Institute of Physics (AIP) visited the Testbed Team. A teleconference was also held with Bob Kelly and other staff members of the American Physical Society (APS).

Testbed Team members Timothy Cole, Tom Habing, and William Mischo presented a workshop on XML use in the Testbed at the CESSE (Council of Engineering and Scientific Society Executives) Conference on July 19-20, 1999 in Cleveland. Testbed Team member Robert Ferrer also presented the Testbed to a Sun Microsystems/Chinese Delegation visiting from The National Library of China, and Tom Habing presented the Testbed to a group of Mortenson Center for International Library Programs Associates visiting from Russia.

CONCLUSION

The Testbed Team has enjoyed an exciting and productive quarter, highlighted by continuing work on XML conversion and display, improvements in mathematics rendering, enhanced search capabilities within DeLIverNG, and document linking strategies. The Team looks forward to introducing co-occurrence term search capabilities, and exploring the Agent and Handles technologies provided by CNRI.