University of Illinois DLib Test Suite Project Update and Status Report for July-December 1999

1. Collection Growth

Around 5000 full-content articles were processed, indexed, and added to the testbed. This represents an approximately 12 % growth in testbed size during the last half of 1999.

2. Mathematics Rendering

The mathematics for all publishers will now render reasonably well under MS IE 5, using a combination of real-time server-side scripting, Cascading Style Sheets (CSS), and client-side JavaScript.

One of the refinements in approach developed in the 3rd quarter was to migrate more of the dynamic positioning calculations from the server to the client. This should allow finer control over math display while at the same time reducing the burden on the server, improving the scalability of the solution. During the fourth quarter this solution was fully implemented and validated using the testbed materials that have been provided by ASCE. It will be migrated to the other publisher collections in the testbed during the 1st quarter of 2000.

An algorithm for capturing, as discrete bitmaps, the display of block equations and complex inline equations as rendered by MS IE 5 was developed during the 3rd quarter. These bit-maps can then be supplied to clients connecting to the testbed with MS IE 4 or Netscape 4 in lieu of sending such clients the marked up block equations, which can't be displayed well by these browsers due to limitations of the earlier implementations of the Document Object Model and Cascading Style Sheets. This algorithm will be implemented in early 2000 to enable more nearly equal quality rendering of testbed materials in the less capable browsers.

MathML is continuing to be investigated as a future solution to the math problem.

3. Metadata Enhancements

As part of the DOI-X effort (see also below) and in anticipation of Test Suite interoperability requirements, a major redesign of our metadata tagging structure and nomenclature was begun in the 3rd quarter. The redesign is an attempt to more closely conform to the RDF syntax and the latest Dublin Core enhancements, and to improve the readability, internal consistency, extensibility, and portability of our metadata format. The semantic updates are mostly complete. Scripts and documentation will be updated in early 2000, and existing metadata files will be rebuilt to conform to the new schema.

Also, as part of the Test Suite interoperability requirements, we have documented our new metadata schema using an XML DTD conforming to the ISO11179-3 standard. This can be viewed at http://dli.grainger.uiuc.edu/iso11179/idli.asp.

4. XSL, DOM, and other W3C Standards

We continue to explore and utilize additional open standards when possible or feasible.

We are investigating the incorporation of the transformative capabilities of the Extensible Stylesheet Language (XSL) into more of our processing steps. Since XSL is a W3C standard, this should increase the portability of our solutions.

In our 'Next Generation' site, we are currently utilizing XSL for the rendering of extended citations from the metadata files for all of the publishers. This will be moved to our production stream early in 2000. For the Internet Explorer 5 browser, the raw RDF metadata file is transmitted to the browser where the styling then occurs. For other browsers, which do not yet support XSL, the styling occurs on the server, using the exact same XSL file, before the results are sent to the browser.

We will also be making extensive use of XSL and the XML Document Object Model (DOM) during our new metadata generation process.

In addition we are exploring the use of the DOM and XSL for our dynamic processing that converts the raw article XML files into renderable form.

5. DOI-X

In conjunction with AIP, we participated in the DOI-X project to explore viability of Digital Object Identifiers for use for linking between online articles published by major sci-tech publishers. We developed processes and programs to generate and register DOI metadata for our entire collection of AIP journal articles, which were then submitted for registration in the DOI-X repository. We contributed suggestions for improvements to the XML metadata DTD used to register DOIs and the batch lookup system used to find DOI links, and we did preliminary proof-of-concept work on how to incorporate DOIs into the testbed metadata schema. To support DOI linking into our testbed we also created and implemented an XSL stylesheet to transform and render object metadata. The XSL stylesheet, which is used conjunction with a CSS stylesheet for final formatting, is implemented client-side if the user accesses our testbed with MS IE 5 and is implemented server-side if the user accesses our testbed using Netscape 4 or MS IE 4. Based on the success of this approach for DOI-X project use, this same approach will now be implemented for the rest of the metadata in our collection. The DOI-X project work concluded in the 4th quarter, and a decision was made that the project would be continued in a production mode by a consortium of technical publishers. See http://meta.doi.org/doi-x-reflinks_v1-0.PDF and http://www.doi.org/ref-link-press-release-11-99.html. A final report from the project committee is forthcoming.

6. Presentations, Visitors, and Outside Researchers

Partners Workshop

The annual partners' workshop was held in Champaign on August 19-20. It was well attended by representatives from the American Institute of Physics (AIP), American Physical Society (APS), American Society of Civil Engineers (ASCE), ASM International, Association for Computing Machinery (ACM), Institution of Electrical Engineers (IEE), Naval Research Laboratory, Online Computer Library Center (OCLC), University of Chicago Press, Seagoat Consulting, and University of Illinois at Urbana-Champaign (UIUC).

We reported on the latest results of our research and development efforts, plus our ideas for other avenues of investigation. Everyone was very enthusiastic regarding our progress and the direction of our research. We also received invaluable feedback from our partners regarding directions they would like to see the research take.

Visitors

Engineers from AIP and APS

Several engineers and managers from both AIP and APS visited prior to the partners' workshop for some in-depth technical discussions regarding our work.

Chicago State University

This was a group from the IT department of the CSU library who were here to learn more about Grainger Library's IT infrastructure and special projects, such as DLI.

Delegation from Singapore

We presented a brief overview and demonstration of the project to two visiting researchers from Singapore.

NTT Learning Systems Corporation from Japan

A group of five managers and staff from their Internet Department visited us for three days for presentations and discussions regarding potential participation in our partners program. NTT Learning Systems Corporation is developing an online technical journal system, similar to our DLI system, for the government of Japan.

Outside Researchers

Temporary access to the testbed has been granted to a researcher at the Graduate School of Library and Information Science at Keio University, Japan, for his research on a theme called "Implications of the non-semantic attributes of documents for IR".

Additional Partners

We are in negotiations with ASM International to add them to our partnership program and possibly include some of their Handbook material in the collection.

We are also in negotiations to include NTT Learning Systems in our partnership program.


Most of what we have described above is still in the development stages, and can mostly be previewed from our 'Next Generation' web site. However, we plan to roll many of the enhancements into our production system early in 2000, namely, the XSL styled extended citations, the HTML or XML (depending on the browser) versions of the articles, and the new metadata schema. We will keep everyone informed as to when we migrate all of these technologies to our production system.

We hope everyone had a safe and happy New Year, and we look forward to continued collaboration in 2000. Please let us know if you have any questions or concerns or ideas for further investigation.