The Illinois Testbed of full-text scientific and engineering journal articles was originally developed as part of a four-year Digital Library Initiative I (DLI-I) grant awarded to the University of Illinois at Urbana-Champaign (UIUC) in 1994. The DLI-I grant work was carried out by a multi-departmental research team comprised of individuals from the university's Graduate School of Library and Information Science, the University Library, the National Center for Supercomputing Applications (NCSA), and the Department of Computer Science. The DLI Testbed, housed in the Grainger Engineering Library Information Center, was constructed from source text journal articles supplied in SGML format by a number of professional society publishers.
The Testbed team implemented a Web-based retrieval system, called DeLIver (Desktop Link to Virtual Engineering Resources) featuring enhanced access and display capabilities. The DeLIver system has been in operation since September of 1997 and has been used by over 3,000 registered UIUC students and faculty, as well as designated outside researchers. Sample pages from a DeLIver search session are shown in Figure 1 below. Detailed transaction log data of user search sessions (gathered and merged from both database and Web servers) have been gathered and an analysis of user search patterns from some 4,200 search sessions has been performed.
In addition to enhancing the DeLIver system, in the course of this project Testbed staff developed a metadata-based retrieval and full-text display system based on a relational database model, a cross-repository retrieval system over several of the D-Lib Test Suite collections based on metadata normalizing procedures, a handbook-based full-text system, a Local Resolver server for processing and handling Digital Object Identifiers (DOIs), and a simultaneous search portal which operates over major A & I Services and provides reference linking to publisher full-text using OpenURLs and DOIs. These systems are described below.

The Illinois Testbed is comprised of full-text in XML format (converted from the SGML) with RDF qualified Dublin Core metadata and bit-mapped images of figures for 63 journal titles containing over 100,000 articles from eight scholarly professional societies in physics and engineering. The full-text journal articles for the Testbed have been contributed by:
American Institute of Physics (AIP),
American Physical Society (APS),
American Society of Civil Engineers (ASCE),
Association for Computing Machinery (ACM),
Elsevier Science, Ltd.,
Institute of Electrical and Electronics Engineers (IEEE), and
Institution of Electrical Engineers (IEE).
In addition, the Testbed contains full-text handbook data from ASM International.
The CNRI D-Lib Test Suite grant provided the Testbed Team with the resources to extend the functionality of the DeLIver system and develop data conversion, dynamic linking, enhanced rendering, extended metadata, and improved retrieval capabilities. In addition, a Collaborating Publishing Partners program was instituted to provide additional support for the Testbed. The Testbed Collaborating Partners have supplied both full-text content and monetary support. A total of eleven organizations--professional societies and commercial entities--have been members of the Partners program during the last three years.
The overarching focus of the Illinois full-text Testbed has been on the design, development, and evaluation of mechanisms that can provide effective access to full-text engineering and physics journal articles within an Internet environment. The primary goals of the Illinois Testbed within the D-Lib Test Suite have been:
1. The construction and testing of a multi-publisher XML-based full-text Testbed employing emerging document representation techniques and flexible search and rendering capabilities that offer rich links to internal and external resources;
2. Develop of processing, rendering, metadata, linking, and search technologies and best practices that can be transferred to the Testbed publisher partners within their full-text repository systems;
3. Development of integrated retrieval systems employing full-text repositories, A & I services, and metadata resources resource within the continuum of information resources offered to end-users by Library portals;
One of the cornerstones of the Testbed, in terms of retrieval capabilities, is the effective integration of the article content and structure revealed by XML with the associated article-level metadata. Across publisher repositories, the metadata serves to normalize the heterogeneous XML and to provide short-entry display and cross-article linking capabilities. The metadata contains links to internal and external data in the form of forward and backward links to other Testbed articles and links to A & I Service databases (including Ovid INSPEC and Compendex databases) and other full-text repositories, such as American Institute of Physics, the American Physical Society, and Elsevier.
An important concern of the Testbed Team has been in exploring effective retrieval models for a Web-based electronic journal publishing system. The retrieval and display of full-text journal literature in an Internet environment poses a number of issues for both publishers and libraries. It has now become commonplace for both major and small-scale publishers to provide Web-based access to full-text journal issues and articles. The work performed under this grant has provided important information, in the form of design insights and specific technologies, to the scientific publishing community seeking to establish or improve their full-text repositories.
Probably the most significant contribution of the project has been the transfer of technology to our publishing partners and others. This is evidenced by the evolution of many of the sci-tech journal publisher’s Web sites. When this project commenced in 1994, almost none of the scientific or technical publishers provided online access to full-text journal articles. Indeed, even at the outset of the D-Lib Test Suite in 1998, full-text journal article content was provided as static, proprietary Adobe Acrobat files or even as scanned page images. However, nearly all sci-tech publishers now provide online access to full-text articles, many with features that closely follow those that were originally developed within the Illinois Testbed project. These feature-sets include: full-text display using HTML and CSS, internal linking between citations and footnotes, hyperlinks to cited articles using DOIs and OpenURLs, and the display of complex mathematics and special Unicode characters directly in the HTML full text. Examples of these and other technologies developed for this project will be described below.
In addition to our publishing partners, we have also served as consultants for numerous other libraries and researchers both domestically and internationally regarding issues associated with the development of digital libraries of scientific literature. These include the NTT Learning Systems J-Stage Project group who, under the sponsorship of the Japanese government, have developed an online sci-tech journal system a number of Japanese professional societies, the U.S. Naval Research Laboratory, and numerous digital library researchers on a less formal basis. In addition, Testbed Team members are also working on the Open Archives Initiative (OAI) protocols for metadata harvesting and the Appropriate Copy local reference linking initiatives sponsored by DLF and CrossRef. We have also hosted several visiting international scholars, including a Fulbright Scholar from India. We have granted temporary access to the collection to several researchers at other institutions in order to facilitate their own digital library research. Since 1999 we have hosted two multi-day workshops for our Collaborating Partners and other interested parties. These workshops have reported out the results of our work and also solicited feedback for future research directions. During the second quarter of 2000, the Grainger Engineering Library hosted the “MathML and Math on the Web” conference here in Urbana. Testbed Team conference presentations and published papers are listed in a bibliography section below.
Much work has been done in dynamically converting well-formed XML into formats that can be displayed in the current popular web browsers (Netscape Navigator and Microsoft Internet Explorer). We have successfully implemented several different approaches.
Early in the project, prior to the adoption of several key W3C standards, we had implemented our own XML parser (similar to the current SAX parsers) that we used to dynamically transform the XML. However, as the W3C standards evolved we adopted more of the standard technologies and tools for working with XML, namely the XML Document Object Model (DOM) and the eXtensible Stylesheet Language Transformations (XSLT). However, the rendering techniques that we experimented with using both our custom tools and the W3C standard tools are very similar.
The approach that ultimately proved to be the most flexible was to dynamically convert the XML into HTML on the server with references to the appropriate Cascading Style Sheets (CSS) and downloadable fonts. We have developed several sophisticated server and client-side scripts that use XSLT and CSS stylesheets for rendering the full-text XML articles as HTML.
Other approaches we experimented with were natively rendering the XML in the web browser. However, this approach was hampered by the lack of native XML support in commercial web browsers, and the fact that the browsers that did support XML did so imperfectly.
One important technique that is common to all of the above approaches is metamerge. Metamerge is the on-the-fly combining of the full-text article with the metadata. This allows the rendered article to contain much value-added information that was not contained in the original article, such as links to cited and citing articles, links to a A & I services, etc. This is the primary reason for dynamically converting the XML at the time of a request, as opposed to preconverting the XML to the appropriate formats.
The most challenging aspect of natively rendering sci-tech articles in a web browser was the mathematics. Several different approaches to rendering mathematics have been explored during the course of the project.
In one approach, software was developed to convert ISO12083 SGML marked-up mathematics to TeX for translation to bit-mapped images (.GIF or .PNG files) for rendering within a browser. This provides a mechanism for the accurate rendering of mathematics in HTML supported Web browsers (all of which display bit-mapped images natively). The tradeoff is in the time and bandwidth involved in passing these images across the Internet and the inability to search and manipulate the bit-mapped mathematics.
The second approach that was implemented involved using CSS and DHTML scripts to style the mathematics for HTML and XML displays. While the CSS style sheets provide the framework for the mathematics displays, the actual logic for the accurate rendering lies in the script pages or XSLT that convert the XML marked-up mathematics to HTML. These scripts contain the algorithms for kerning and sizing of integrals, fences, square roots, and matrix equations. This approach allows for searching of equation elements, re-sizing of display mathematics within the browser, and manipulation of equations by end-users. A considerable amount of effort went into this approach with some success; however, it was ultimately abandoned in favor of the MathML approach.
The third approach involved converting the legacy (mostly ISO12083) mathematics into MathML. We had considerable success with this approach, developing several XSLT stylesheets for transforming the different publishers’ math markup into MathML. Toward the end of the project we had nearly completed the conversion of our entire collection over to MathML. However, even with MathML there are still rendering issues. We initially experimented in applying our second approach (above) to the rendering of MathML with similar results – the majority of the math could be legibly rendered, but there were still some problematic equations. However, as the MathML standard matured more standard rendering options became available, such as plugins for the most popular web browsers, or even native web browser support, as is being developed for the Mozilla project (Netscape 6).
Even with the improving support for mathematics in web browsers and MathML plug-ins, we still need to generate bit-mapped images of the more complex equations in order to support older web browsers. Even for the second and third approaches described above we have developed techniques for generating bitmap versions of all the mathematics in an article.
In addition to journal articles, we have also experimented with the marked-up text of several volumes of one of the leading reference tools in metallurgy and materials science, the ASM Metals Handbook. Handbook chapters and articles, because they are substantially different from journal articles, offer some new and interesting research challenges. One issue that we are addressed with these materials was how to partition the sections, some of which can run to hundreds of printed pages, into appropriately sized chunks for delivery to users over the web. Other issues include how to implement the page number based index for each volume for use over the web, and how to deliver useful and meaningful search results for such a large handbook.
The Testbed Team was a key participant in DOI-X prototype (Metadata Exchange for Reference Linking - Phase 1- Journal Article Reference Linking Project) on behalf of AIP. We developed XSLT stylesheets and scripts that allowed us to generate DOI-X compliant metadata for our collection of AIP journal articles. We also registered these articles with the prototype CrossRef database, and did numerous experiments with the CrossRef database and using the DOIs for reference linking within our Testbed.
Project personnel were involved in the planning and execution of a test of “localizing” DOI linking in order to address the “appropriate copy” issue. The appropriate copy issue involves providing users with links to a locally loaded version of a publishers full-text resource, rather than directing them to the publishers site via the DOI. As part of the test, a “doi.org” cookie referring to our Local Resolver service was added to the Grainger Library main public menu page. The Local Resolver utilized the CrossRef reverse lookup function, using the metadata returned to build a Web page, when appropriate, of local resources including links to locally mounted AIP and APS journals. Project staff worked with CNRI, CrossRef, Harvard University, OhioLink, Los Alamos, and Ex Libris on this proof of concept project. The Local Resolver work directly contributed to the development of a system providing simultaneous search of A & I Services with reference linking to publisher full-text repositories employing OpenURLs and DOIs. This system is described more fully below.
Some early experiments were done in implementing a web-based mathematics search module. The search interface was based on search-by-example techniques and provided pull-down lists of operations, named constants, and laws. Alternatively, the user could enter any desired element of an equation into the entry fields.
A highly interactive Java applet was developed that allowed a searcher to quickly browse the collection of all journal authors or cited authors maintained in the Testbed. Similar techniques could be used to develop browsers for other key search fields as well.
Researchers at the NCSA completed the generation of co-occurrence tables over the entire Testbed collection. Using a Java applet similar to the Author Browse applet, the Testbed team used these tables to build an experimental search interface that would suggest alternate search terms based on a previously entered term or phrase.
Several experimental systems were developed to integrate the D-Lib Testbed with other online library systems. The first of these was a simultaneous search interface within DeLIver. This allowed a single search of the Testbed to be simultaneously broadcast to other online search systems such as the library online catalog and one or more abstracting and indexing (A &I) databases, such as INSPEC or Compendex. The user could then easily navigate to the results for any one of the searches.
A second part of the integration was dynamically adding full-text links to articles in the Testbed. This was accomplished by building proxy servers for various other databases. The proxy server would mediate all interaction with the other database, parsing and interpreting the search results and adding full-text links as appropriate to the search results before returning them to the user. Proxies were developed for several of the locally mounted A & I databases.
As noted above, the Grainger library is presently in the rollout stages for a simultaneous search tool that provides broadcast search capabilities over A & I Service databases, the online catalog, and the Google search engine and, for the A & I Service databases provides links to article full-text using OpenURLs and DOI resolution.
Metadata has always been an important part of the project, and the metadata schemas used for this project have gone through several iterations. The project started with a custom XML metadata schema and ultimately evolved to a completely standards-based XML metadata schema based on the Resource Description Framework (RDF) and Dublin Core Qualified (DCQ). Maintaining quality metadata has contributed to the project in several ways from the metamerge process previously described, to normalization of key data elements for improved searchability.
We assisted in the development of an XML DTD to be used by the all the D-Lib projects for documenting their various metadata formats according to the ISO 11179-3 standard. We have documented our metadata using this DTD, and we have also developed an XSL stylesheet for the display of these files.
Development of a D-Lib Cross-repository metadata database was completed. This included harvesting metadata from four of the D-Lib testbeds, including Berkeley, UIUC, D-Lib Magazine, and Netlib at Tennessee. The metadata was then organized according to the Dublin Core (DC) standard, added to a relational database, and indexed for searching. Scripts and web forms were then developed for searching this collection over the web.
Utilizing the metadata we had created for the American Society of Civil Engineers' articles, and our expertise with XML transformations, we created Searchable Physics Information Notices (SPIN) metadata records for use with the American Institutes of Physics' Online Journal Publishing Service (OJPS). This allowed ASCE's articles to be hosted on AIP's OJPS.
Project personnel participated in the alpha test of the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). They developed several prototype data provider interfaces that made select metadata about the local D-Lib collection harvestable. Metadata from other digital collections from the UIUC were also made available. Several reference implementation of the provider protocol were made available for free download.
Leveraging this work using the D-Lib Test Suite collection with the OAI Metadata Harvesting Protocol, and in partnership with the University of Michigan, the project team authored and submitted a grant proposal ‘Proposal to Implement a Scholarly Information Portal Using OAI Metadata Harvesting Protocols’ to The Andrew W. Mellon Foundation. In June the proposal was accepted. The proposal includes developing Open Source software tools for harvesting metadata, using those tools to harvest metadata related to cultural heritage, and developing a search portal for the harvested data.
Research conducted during this project shed light on a number of issues relating to digital document processing and use. In particular, issues relating to document representation technologies, metadata schemas, and document presentation to the end-user were investigated. The results of these investigations are reflected in the accomplishments listed above. Summarized here are several of the most important issues along with rationales and conclusions developed during the course of this project. This work helped to determine the direction of research over the life of the project and also has implications for future work at Illinois and elsewhere.
The types and functionality of formats available for the online representation of text are continuing to evolve. ASCII and bit-mapped image representations of text objects (e.g., journal articles) have been superceded by more functional representations such as application-specific word processing formats and proprietary and non-proprietary information interchange formats (e.g., Adobe PDF, Microsoft RTF, TeX, SGML, XML). The Illinois Testbed project required a standard document representation format that would provide robust, platform-independent search, retrieval, and rendering within a Web environment. The selected format needed to be capable of full-text indexing and high-granularity, field-specific search and retrieval. Initially considered during the pre-cursor DLI 1 project as candidates for Testbed document format were SGML, HTML, TeX, and PDF. At the start of the D-Lib Test Suite project Illinois Testbed content was maintained in SGML, with copies of selected items also in PDF.
While our hopes for improvements in SGML usability and rendering engines did not materialize directly, the DLI-I Project Team's selection of SGML as the preferred encoding format for Testbed documents has been largely validated by the development and rapid widespread adoption of XML and its related technologies. XML, established as a W3 Consortium Recommendation in February 1998, strives to make the best features of SGML more accessible to Web authors and publishers. The major Web publishers and vendors have trumpeted XML as central to their industry’s future (Weil, 2000).
However, the XML specification itself does not explicitly deal with the presentation of content, nor does it address document object transformation. On its own it would not have been sufficient for the application dealt with here. These issues must be addressed through the use of CSS (Cascading Style Sheets versions 1 and 2) and XSLT, both more recent W3 Consortium Recommendations. Used in concert, these three technologies have been sufficient to support the Illinois Testbed application.
One of the major changes from SGML to XML involved the introduction of the concept of "well-formed" document instances. This change has enabled better separation of the authoring process from the presentation process. SGML requires that each document instance be validated against a Document Type Definition (DTD) which defines the content model (i.e., the intellectual structure) for a document or a class of documents. A single DTD can be used to author and validate an infinite number of document instances, but the requirements of SGML still mean that all SGML applications must include the entire overhead necessary to read and apply DTDs. It also means that each document instance transmitted must be accompanied by a DTD or must make proper and unambiguous reference to an appropriate publicly available DTD. While most authoring systems need DTDs (or equivalent) in order to insure that a document as it is being written conforms to the desired intellectual structure, validation is generally superfluous for applications that simply want to render a document. Maintenance and tracking of DTDs for use by rendering agents in a Web environment can be an unnecessary complexity.
By contrast, XML allows for document instances to be either "valid" (that is validated against a DTD or an equivalent set of rules expressed in the form of an XML Schema) or "well-formed." Allowing for well-formedness lowered the bar for XML rendering systems. In conjunction with other limitations and simplifications incorporated into the XML specification, this has made possible the development of lightweight, non-validating XML parsers that could more easily be incorporated into current Web browsers. Additionally the inclusion of well-formed in the XML specification greatly facilitated the transformation of legacy SGML to well-formed XML, because it meant only the document instances needed to be transformed. It was not necessary to transform the SGML DTDs. In fact, XML well-formedness rules mean that it's possible (and often desirable given difficulties of converting SGML DTDs into XML DTDs) to use SGML authoring tools to generate well-formed XML.
Based on these considerations, the entire collection of articles and metadata in the Illinois Testbed was retrospectively converted from SGML to well-formed XML in 1999, early in the D-Lib Test Suite project. As it turned out this conversion was not difficult because publishers were using very few of the SGML features not carried forward to XML. Empty content models (i.e., elements with no closing tag) and tag minimization (i.e., practice of not explicitly including close tags that can be implicitly deduced from DTD requirements) were the primary issues we had to address in our retrospective document instance conversions. Since 1999, several of our collaborating publisher partners have made additional changes to make their SGML output even closer to well-formed XML. For instance, a publisher previously using SGML tag minimization no longer does so. Though source article data continues to be supplied to us in SGML, minimal changes (handled by a simple, project-specific C++ application) are required to convert SGML now being received into well-formed XML. Generic tools for conversion of many classes of SGML document instances to well-formed XML document instances are now widely available (Clark, undated).
When used in concert with XSLT and CSS, XML proved a durable and robust document representation technology for the purposes of the Illinois D-Lib Test Suite project. The approach has been adopted for other University of Illinois text-based digital library projects.
Over the course of the Illinois D-Lib Testbed project, item-level metadata records were used to facilitate searching by normalizing key fields from different publisher document markup schemas, to provide common and easily displayable intermediate search results, and to offer value-added information in the form of links to cited or citing articles. Links included in the metadata records point directly to the full-text version(s) of the article, to separate article components (e.g., figures and tables), to related full-text articles in the Testbed, and to related article references in external abstracting & indexing database services.
Metadata records also were used as part of the Testbed’s link management scheme. URLs to a document’s full-text and associated external figures and tables were stored in metadata records. By keeping separate URLs for each figure, users could retrieve figures independently of document full text. In addition to providing document and figure linkages, the metadata also provided linkages to external related resources. These included other articles cited in the document that might be electronically available. The metadata used in the Illinois D-Lib Test Suite Testbed provided links to relevant INSPEC and Compendex records, as well as to bibliographic records maintained by various publishers. As new items were added to the Testbed each document was checked for links to articles already in the collection. When such links were found, the metadata record for the earlier item also was updated to show the relationship to the newer document. (E.g., addition of an errata article results in a link being added to the original article’s metadata showing linkage to the newly added errata.)
We also relied on metadata to serve as document surrogates in the display of intermediate results. While the full-text of metadata records and journal articles were indexed together, metadata records were maintained separate from the full-text document files themselves. As noted, the metadata records were dynamic while the document instances themselves were static. This helped to protect the integrity of the original source documents. This arrangement also facilitated incremental indexing.
Both SGML and XML are well suited to storing this kind of metadata and making it easy to use. The introduction of XML "Namespaces" with the release of the XML 1.0 specification, however, gives XML additional flexibility and potential functionality as a metadata format. XML Namespaces allow the mixing in a single document of semantics from different XML DTDs and/or Schemas. We maintained our metadata in XML using a combination of Dublin Core (DC) semantics, Resource Description Framework (RDF) model and syntax, DC Qualified semantics, and early draft (1999) DC Agent semantics. The RDF model and syntax provides a standard way for representing named properties and property values about resources, and also for using XML to serialize the metadata record. The Dublin Core model (and the DC extensions mentioned) provides the semantics necessary to achieve additional standardization and interoperability when describing bibliographic characteristics of a document.
The Illinois Testbed metadata schema was verbose in that DC element tags were repeated for each occurrence of an intellectual element (e.g., each subject heading assigned an article is contained within a separate DC Subject tag) except where RDF containers (e.g., Seq for DC Creator) were used to describe relations between repeated DC elements and other metadata structures.
Metadata records were generated from the full XML document instance of the article described using an XSLT stylesheet. Section titles from the entire article were extracted and transformed by the XSLT stylesheet into a RDF / DC Qualified metadata structure. In addition to extracting metadata from the article itself, and in many cases normalizing that extracted metadata, XSLT stylesheets were used to incorporate into the metadata value-added content hard coded or taken from other sources. Most of the added data concerned links and relations to other articles inside and outside the Testbed. Additional information (e.g., publisher name, copyright statement) not included in the document instance as originally generated for internal use by the publishers also was added as necessary. Calls made to various databases determined which links to include in the metadata. To normalize across different XML source document schemas, distinct (though similar) XSLT stylesheets were developed for each individual publisher.
The use of XML and XSLT technologies to create, encode, and manipulate Testbed metadata proved effective over the course of the Illinois D-Lib Testbed project. Qualified Dublin Core was found to be a sufficient metadata schema to use for encoding metadata required for the Illinois D-Lib Test Suite Testbed project. Techniques developed during this project are now being applied in metadata interoperability projects including a study of the utility of the Open Archives Initiative Protocol for Metadata Harvesting.
As alluded to above, effective presentation of a collection of online, full-text articles in an open, Web-dominated environment can be problematic. What's desired is a combination of readability, including high-quality rendering of mathematics, and flexibility, including an ability to present multiple views of an article and its full-text (e.g., short citation, long citation, full-text, citations only, figures and tables only, etc.). To address maintenance and scalability issues, this flexibility should be accomplished using a single instantiation of the source article and it's metadata. Finally, for maximum utility, it should be possible to render all views of interest (to acceptable fidelity) using currently available Web browsers. Plug-ins should be optional rather than required.
XSLT used in combination with CSS provides an effective and highly flexible way to present XML document instances and associated metadata. Full-text alone, metadata alone, or both in combination can be presented in part or in full. Content can be transformed into XHTML or other presentation-oriented XML Schemas using XSLT. (Using XSL Formatting Objects it's even possible to transform XML into PDF, but that technique was not utilized during the Illinois D-Lib Test Suite project.) Content from multiple documents (e.g., metadata record and primary source document) can be merged and presented in a composite view.
In the Illinois D-Lib Test Suite project we provided 3 views of each article in our Testbed. All views were generated from the article metadata and/or full-text document instance using XSLT. All XSLT-generated views were transformed into XHTML for rendering natively by conventional Web browsers. XSLT stylesheets add generated content (e.g., labels, punctuation) as appropriate for styling and presentation. CSS is used to manage margins, fonts, absolute and relative positioning of objects including simple mathematics. (It is important to note that CSS stylesheets created to work in concert with XSL stylesheets work on the output generated by applying the XSLT stylesheet, not on the original XML document instance. To facilitate construction of our CSS stylesheets we added class and id attributes to several of the XHTML output elements created when transforming from XML storage format.) UTF-8 encoding was used to support non-Latin characters and diacritics (including non-spacing diacritics). Complex mathematics was rendered using the IBM Techexplorer plug-in (capable of rendering embedded MathML natively). For clients without IBM Techexplorer installed, complex mathematics was rendered as instances of PNG (Portable Network Graphic) files. (These PNG files, each depicting a complete mathematical equation, were generated as items are added to the Testbed using an automated process that captures views of the MathML markup as rendered in IBM Techexplorer.)
Our results indicate the viability of presenting XML using conventional Web browsers today. With the exception of a few documents containing extremely complex and cutting edge mathematics, documents were readable and understandable. Presentation quality still lags PDF by a small margin, but the potential of the XML not only as a storage and indexing format, but also as a transport and presentation format is quite high.
The work performed by the UIUC Testbed Team under the auspices of the CNRI D-Lib Test Suite project resulted in significant advances in full-text document representation and rendering, and in the deployment of linking technologies. In particular, the use of metadata standards built around RDF and qualified Dublin Core were developed and tested. Various technologies that center on the effective rendering of mathematical equations were explored and demonstrated. And, the efficacy of XSLT as a dynamic tool for document transformation, storage, and rendering was clearly shown. It is to the credit of the D-Lib Test Suite project that many of the technologies developed within the Illinois Testbed have become standard operating features of the publisher full-text repositories.
As full-text repositories have evolved and standards for linking and document representation have matured, the Illinois Testbed staff has become actively involved in projects that integrate these technologies. Examples of this include the OAI metadata harvesting project and the simultaneous search system utilizing OpenURLs and DOIs for reference linking from A & I Service records. The need for maintaining a current Illinois Testbed collection of full-text journal articles has now ended. However, Testbed staff will continue to leverage the work performed under DLI-I and the D-Lib Test Suite to advance the development of document representation, linking, and retrieval middleware tools in order to further the development of digital library collections and services.
Clark, James (undated) SX: An SGML System Conforming to International Standard ISO 8879 -- Standard Generalized Markup Language. Online. Available: http://www.jclark.com/sp/sx.htm [10 April 2001].
Weil, N., (2000), "Web Publishers Hinge their Future on XML," Infoworld, February 11, 2000. Online. Available: http://www.infoworld.com/articles/hn/xml/00/02/14/000214hnseybold.xml [10 April 2001].
Cole, Timothy W., William H. Mischo, and Thomas G. Habing. "XML Publishing Applications," Full-day pre-conference workshop given in conjunction with Annual Meeting of Council of Engineering and Scientific Society Executives," Cleveland, OH, July 23, 1999.
Cole, Timothy W., William H. Mischo, Robert Ferrer, and Thomas G. Habing. " Using XML, XSLT, and CSS in a Digital Library," at ASIS 2000: Knowledge Innovations -- 63rd American Society for Information Science Annual Meeting, Chicago, IL, November 15, 2000.
Cole, Timothy W., Thomas G. Habing, and William H. Mischo. "Illinois D-Lib Testbed: Technologies for Converting Legacy Mathematics for Display on the Web," at MathML and Math on the Web," Urbana, IL, October 21, 2000.
Cole, Timothy W. and Thomas G. Habing. “University of Illinois at Urbana-Champaign: OAI Alpha Experiences,” at Open Archives Initiative: Open Day for the U.S., Washington, D.C., January 23, 2001.
Cole, Timothy W. Participant in “The Open Archives Initiative: Perspectives on Metadata Harvesting,” a panel at ACM/IEEE Joint Conference on Digital Libraries, Roanoke, VA, USA, June 24-28, 2001.
Cole, Timothy W., William H. Mischo, Thomas G. Habing, and Robert H. Ferrer. “Using XML and XSLT to Process and Render Online Journals,” Library Hi Tech 19, no. 3 (2001): 210 - 222.
Cole, Timothy W. “Plans for an OAI metadata harvesting service,” at Digital Library Federation Fall 2001 Forum, Pittsburgh, PA, 18 November 2001.
Cole, Timothy W. “Qualified Dublin Core Metadata for Online Journal Articles,” OCLC Systems & Services (in press).
Cole, Timothy W. “Publishing Mathematics on the Web,” Science and Technology Libraries 20 (3-4), 2001, (in press).
Cole, Timothy W., Thomas G. Habing, and William H. Mischo. " Illinois D-Lib Testbed: Technologies for Converting Legacy Mathematics for Display on the Web," at MathML and Math on the Web," Urbana, IL, October 21, 2000.
Cole, Timothy W., Mischo, William H., and Mary C. Schlembach. “Changing Collaborations to Deliver Information in New Ways: Lessons Learned in the Illinois Digital Library Initiative Project.” In Racing Toward Tomorrow: Proceedings of the Ninth National Conference of the Association of College and Research Libraries, ed. by Hugh A. Thompson, pp. 53-59. Chicago: Association of College & Research Libraries, 1999.
Habing, Thomas G. and Timothy W. Cole. “Experiences Implementing OAI Provider Services,” presented at Open Archives: Communities, Interoperability and Services Workshop for ACM SIGIR 2001, 9-13 September 2001, New Orleans.
Habing, Thomas G., Timothy W. Cole, and William H. Mischo. “Qualified Dublin Core using RDF for Sci-Tech Journal Articles,” DC-2001, Proceedings of the International Conference on Dublin Core and Metadata Applications 2001, National Institute of Informatics, 24-26 October 2001, Tokyo, Japan, http://www.nii.ac.jp/dc2001/proceedings/product/paper-36.pdf.
Mischo, William H., " The Digital Library: Current Technologies and Challenges ," at The SLA Global 2000 Worldwide Conference on Special Librarianship , Brighton, UK, October 18, 2000.
Mischo, William H. “XML Technologies, Metadata, and Linking,” at the 6th Meeting of the Committee on XML for J-STAGE, Tokyo, Japan, March 7, 2001.
Mischo, William H. “XML Technologies and Scholarly Communication,” at The XML Workshop for Electronic Journals sponsored by Japan Science and Technology Corporation, Tokyo, Japan, March 9, 2001.
Mischo, William H. “XML Technologies and Scholarly Communication,” at The XML Workshop for Electronic Journals sponsored by Japan Science and Technology Corporation, Tokyo, Japan, March 9, 2001.
…Mischo, William H., et al. “Linking to the Appropriate Copy: Report of a DOI-Based Prototype,” D-Lib Magazine 7:9, September 2001, http://www.dlib.org/dlib/september01/caplan/09caplan.html.
Mischo, William H. “Library Portals, Simultaneous Search, and Full-Text Linking Technologies”, Science and Technology Libraries (In Press), 20 (3-4), 2001.
Mischo, William H., Thomas G. Habing, and Timothy W. Cole. “Integration of Simultaneous Searching and Reference Linking across Bibliographic Resources on the Web,” JCDL 2002, Joint Conference on Digital Libraries, Portland, OR, 14-18 July, 2002 (*accepted).
Arms, William H., Greg Janee, Mischo, William H., Carl Lagoze, Ginger Ogle, Scott Stevens). “The D-Lib Test Suite: Testbeds for Digital Libraries Research.” D-Lib Magazine 5, no. 2 (February 1999).
Mischo, William H. and Timothy W. Cole. “XML Technologies, Digital Libraries, & Scholarly Communication” presented to LIS 450 EP (Electronic Publishing), 27 September 2001.
Mischo, William H. and Timothy W. Cole. “Processing and Access Issues for Full-Text Journals,” in Successes and Failures of Digital Libraries: Papers Presented at the 35th Annual Clinic on Library Applications of Data Processing, ed. by Susan Harum and Michael Twidale, pp. 21-40. Champaign: Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign, 2000.
Schatz, Bruce, William Mischo, Timothy Cole, Ann Bishop, Susan Harum, Eric Johnson, Laura Neumann, Hsinchun Chen, and Dorbin Ng. “Federated Search of Scientific Literature: A Retrospective on the Illinois Digital Library Project,” in Successes and Failures of Digital Libraries: Papers Presented at the 35th Annual Clinic on Library Applications of Data Processing, ed. by Susan Harum and Michael Twidale, pp. 41-57. Champaign: Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign, 2000.
Schatz, Bruce, William H.Mischo, Timothy W. Cole, Ann Bishop, Susan Harum, Eric Johnson, Laura Neumann, and Hsinchun Chen. “Federated Search of Scientific Literature: A Retrospective on the Illinois Digital Library Project.” IEEE Computer 32, no. 2 (February 1999): 51-60.
UIUC D-LIB TEST SUITE WORKSHOP, August 19-20, 1999, Grainger Engineering Library, University of Illinois at Urbana-Champaign.
UIUC D-LIB TEST SUITE WORKSHOP, September 7-8, 2000, Grainger Engineering Library, University of Illinois at Urbana-Champaign.