Using XML, XSLT, and CSS in a Digital Library
Timothy W. Cole, William H. Mischo, Robert Ferrer, and Thomas G. Habing
Grainger Engineering Library Information Center
University of Illinois at Urbana-Champaign
Abstract
This paper describes the evolving information technologies employed in the Digital Library Initiative (DLI-I) Testbed of full-text journal articles at the University of Illinois at Urbana-Champaign. Specifically, the paper will examine how XML, XSLT, and CSS can be used in a digital library application. The Illinois Testbed, originally established in 1994 and consisting of over 50,000 SGML-formatted articles from more than 44 sci-tech journal titles, has been converted from SGML to XML and employs item-level metadata in XML format utilizing RDF and Dublin Core syntax and semantics extended with project-specific XML tagging. A comprehensive index of article full-text and metadata allows full-text searching across the entire repository. XSLT and CSS stylesheets are used to present metadata and the full-text articles to end-users accessing the Testbed in a Web-based environment. This paper focuses on the techniques used to transform the SGML collection into well-formed XML, the XML metadata structures adopted for the project, and the XSLT and CSS features employed in the Illinois Testbed. Special attention is paid to techniques for rendering mathematics and for transforming real-time between XML and HTML formats.
INTRODUCTION
The University of Illinois at Urbana-Champaign was one of six sites awarded a four year federally funded grant in 1994 under the first phase of the Digital Library Initiative (DLI-I). The principal interest of the Illinois DLI-I project was on developing techniques for the representation and delivery of full-text engineering and physics journal articles in an Internet environment. The Illinois project included research, Testbed, and evaluation components and is described in detail elsewhere (Mischo & Cole, 2000; Schatz et al, 1999, Schatz et al, 1998).
The Illinois Testbed was constructed from source text journal articles in SGML (Standard Generalized Markup Language) format contributed by five professional society publishers. The full-text articles for the Testbed have been contributed by: the American Institute of Physics (AIP), the American Physical Society (APS), the American Society of Civil Engineers (ASCE), the Institution of Electrical Engineers (IEE), and, initially, the Institute of Electrical and Electronics Engineers Computer Society (IEEE CS).
The Testbed is presently comprised of some 55,000 articles from 44 journals in a base XML (Extensible Markup Language) format along with accompanying images of figures and illustrations. The Testbed Team has implemented a Web-based retrieval and rendering system, called DeLIver (Desktop Link to Virtual Engineering Resources) which provides broad access to the full-text material. The Testbed is maintained and administered in the Grainger Engineering Library Information Center, a $22 million facility that opened in 1994.
In 1998, the Testbed Team received a follow-on three-year grant from the Corporation for National research Initiatives (CNRI) to expand and enhance the Illinois DLI-I Testbed. The CNRI development work is being carried out as part of their D-Lib Test Suite program. Additional support for the Illinois Testbed has come from the establishment of a Collaborating Partners program that provides monetary and in-kind support for the Testbed technologies.
TESTBED GOALS AND ISSUES
When work on the UIUC DLI project began in 1994, the World Wide Web (WWW) was in a nascent stage. At that time, NCSA's Mosaic 2.0 beta was the browser of choice, the HTML 2.0 standard was still under development, Netscape had yet to release its first web browser, and Microsoft Windows 3.1 was the standard PC operating system.
The initial task of the Testbed project team was to identify technologies that were both of sufficient maturity to be usable at once and of sufficient potential to evolve over the life of the project. As the Project evolved, two clear trends emerged: one, the WWW has become the standard distributed information system used for text retrieval and display, and as a direct corollary, publishers have taken advantage of emerging Web technologies to establish their own full-text repositories.
The focus of the Illinois Testbed Team has been on the design, development, and evaluation of mechanisms that provide effective access to full-text engineering, physics, and computer science journal articles within a Web-based environment. The primary goals of the Illinois Testbed are:
On an overarching level, the Illinois Testbed project has addressed the issues connected with the migration from a print-based journal environment to an Internet-based model, with particular focus on defining retrieval and rendering mechanisms that can optimize user access to full-text journals. As the project has evolved, the Testbed team has developed conversion, normalization, indexing, and rendering mechanisms and tools.
The publisher distributed repository model has developed quickly over the last five years. The number of full-text electronic journals continues to proliferate, with a total of 5,000 titles presently available (Ketcham-Van Orsdel & Born, 1999). The types of and the functionality of formats available for the online representation of text continues to evolve. ASCII and bit-mapped image representations of text objects (e.g., journal articles) have been superceded by more functional representations such as application-specific word processing formats and proprietary and non-proprietary information interchange formats (e.g., Adobe PDF, Microsoft RTF, TeX, SGML). While PDF is presently the most common full-text format, the use of a standard mark-up language, such as SGML and now XML, which support the representation of a text as an "ordered hierarchy of content objects," is arguably the best and most sophisticated model available for representing text objects (DeRose et al, 1997).
SGML, first ratified as an international standard in 1986, has become a well-established approach for encoding text. However SGML's complexity and requirements for specialized and expensive tools to implement SGML-based systems have limited its scope as an information interchange standard, particularly in today's Web-dominated environment. XML, established as a W3 Consortium Recommendation in February 1998, strives to make the best features of SGML more accessible to Web authors and publishers. The major Web publishers and vendors have trumpeted XML as central to their industry's future (Weil, 2000). However, the XML specification itself does not explicitly deal with the presentation of content, nor does it address document object transformation. These issues must be addressed through the use of CSS and XSLT (both more recent W3 Consortium Recommendations). It is necessary to use these three technologies in concert to create powerful and robust text applications on the Web.
The cornerstones of the Testbed, in terms of its retrieval capabilities, are the effective utilization of the exposed article content and structure revealed by XML and the associated article-level metadata, which serves to normalize the heterogeneous XML and provide short-entry display capability. The metadata also contains links to internal and external data, such as forward and backward links to other Testbed articles and links to Abstracting & Indexing Service databases and other full-text repositories, such as the American Institute of Physics and the American Physical Society sites. An important feature of the Testbed design is the separation of the metadata/index files from the full-text. This allows the metadata/index--containing pointers to the full-text--to be logically and physically separated from the full-text records.
This paper will describe the development of the processing and rendering tools and techniques that have been developed by the Testbed Team during the course of the Testbed implementation. In particular, the relative merits of SGML and XML for fine-granularity markup of documents, the development of dynamic, value-added metadata structures, and the role of recently introduced information technologies such as CSS (Cascading Style Sheets) and XSLT (Extensible Stylesheet Language Transformations) within the Testbed will be described.
DOCUMENT REPRESENTATION AND TRANSMISSION
The Testbed project required a standard document representation format that would provide robust, platform-independent search, retrieval, and rendering within a Web environment. The selected format needed to be capable of full-text indexing and high-granularity, field-specific search and retrieval. Initially considered as candidates for the document format were SGML, HTML, TeX, and PDF.
TeX and LaTeX are well established in the mathematical sciences academic community and support extremely robust rendering of mathematics, but the available authoring and display tools are typically not Web-enabled. In addition, exposure of document structure in TeX as used in real-world applications is limited. The PDF format lacked flexible hyperlink functionality and vital cross-collection indexing capabilities. In addition, both TeX and PDF require Web browser plug-in viewers for document display.
SGML supports powerful indexing, search, and retrieval, but rendering engines for SGML are limited and require separate executables or plug-ins. In addition, SGML is generally regarded as difficult to use, and along with the client, delivery, and rendering issues remains a 'Web-unfriendly' technology. HTML is natively Web-enabled, but at the time HTML 2.0 fell far short of desired structure and rendering functionality. Beginning with HTML 4.0 and the introduction of CSS, more robust rendering capabilities are available. However, HTML remains a presentation-oriented language with inadequate semantic tools for the effective indexing and fine-granularity searching needed for academic journals.
SGML was initially chosen for the Testbed document format because it was a non-proprietary international standard and was inherently best for indexing, search, and retrieval. This decision was consistent with the publishing world's identification of SGML as the emerging standard for document representation and transmission. While the publishers contributing source materials for the project had experience with all three formats then under consideration, it was clear that SGML or SGML with embedded TeX for mathematical equations was the preferred format. It was the hope of the Testbed team that, as the technologies evolved, there would be advances in SGML rendering engines, but these improvements never materialized. To compensate for immediate SGML rendering limitations, several of the publishers provided PDF versions of articles in addition to SGML versions. One of the most problematical aspects of SGML full-text rendering has been in accurately rendering SGML marked-up display mathematics.
It was only with the introduction of XML and associated technologies that the rendering issues can be adequately addressed. XML is a distinguished subset of SGML that retains the key features of SGML, including semantic-based tagging. In addition, XML data can be rendered natively in a Web browser and/or converted to HTML to be rendered using technologies such as XSLT, CSS and Dynamic HTML (DHTML).
The entire Testbed data and metadata has been converted from SGML to XML. The source article data continues to be supplied to us in SGML from our publisher partners. Publishers have a great deal of time and resources invested in SGML authoring and publishing tools. Typically, the publisher's contracted typesetter or in-house typesetting shops will supply the publishers with an SGML version of the full-text article. Currently, equivalent XML publishing tools lag in terms of functionality - in part because XML DTDs (Document Type Definitions) cannot be as rigorous as SGML DTDs (see further discussion below). The SGML feature set used by our publishers when creating their SGML is relatively congruent with the feature set presently available through XML. In creating the algorithms and tools to transform from SGML to XML, we addressed several issues.
TRANSFORMING SGML INTO WELL-FORMED XML
In the conversion from SGML to XML, our goal was as much as possible for the transformed material to be syntactically compliant both as well-formed XML and valid SGML. Also the SGML material needed to be renderable by a commercially available web browser plug-in (e.g., Interleaf's, formerly SoftQuad's, Panorama viewer). Several specific differences in tagging conventions between SGML and XML were addressed. Two representative examples will be given here.
<Empty>
In XML nodes defined with an EMPTY content model are encoded as:
<Empty />
However, both SGML and XML allow elements defined with parsed character data content models (i.e., #PCDATA content models) to contain no data between start and end tags. Thus the following syntax is allowed in both SGML and XML:
<Empty></Empty>
By converting from EMPTY to #PCDATA content models in our SGML DTDs and then adding the necessary close tags in the document instances, we were able to make the document instances in our collection at once SGML and XML compliant in this regard.
<Tag><![CDATA[...unparsed character data ...]]></Tag>
SGML identifies CDATA tags in the DTD without requiring any special markup in the document instance. Currently, CDATA tags are left in their SGML form. Properly identifying and processing CDATA content are achieved during dynamic processing at the time of rendering.
WELL-FORMED VS. VALID XML
There are several functional limitations that impede converting SGML DTDs into XML DTDs. For example, the XML DTD specification does not permit inclusions and exclusions. This complicates the inclusion of common elements that can appear anywhere in a document such as figures and formulas. Floating elements can be represented in SGML as:
<!ELEMENT Article - - (front, body) +(%i.float;)>
Front must precede body. Floating elements may occur anywhere within front and body. An XML DTD would require specifying float elements within the content model of each sub-element within front and body. This can be a tedious and complicated process. An option would be to use a less restrictive content model where front no longer must precede body, as shown by:
<!ELEMENT Article (front | body | %i.float;)*>
However, this approach reduces the usefulness of the DTD as a quality and consistency control mechanism during the authoring and publishing process. Order of elements is no longer enforced, nor is it even required by the DTD that both child nodes appear.
Conversion of an SGML DTD into an XML DTD is further made more complicated by the fact that '&' connectors are not permitted in content models. The '&' connector requires all elements to occur, but in any order. Instead, the content model must be expanded to take into account all possible sequence orders. Another functional limitation is that mixed content models do not permit constraining order or number of occurrences of individual elements. The following content model is not allowed in an XML DTD:
<!ELEMENT Other ((author, journal) | (#PCDATA))>
The result of such limitations is that it is difficult to make XML DTDs as restrictive as SGML DTDs. Furthermore, SGML authoring tools are still much more mature than XML authoring tools. The course of action we've suggested to our publishers is to continue to create document instances in SGML. However, our research has shown that it is possible to modify SGML DTDs and current authoring methodology in small ways in order to generate document instances that are both valid SGML and well-formed XML in all or nearly all respects. It is hoped that the proposed XML Schema Language standards will overcome the current limitations of the XML DTD.
METADATA SCHEMA
Testbed item-level metadata records are used to facilitate searching by normalizing key fields from different publisher DTDs, to provide common and easily displayable intermediate search results, and to offer value-added information in the form of links to the cited or citing article. Links included in the metadata records point directly to the full-text version of the article, to separate article components (e.g., figures and tables), to related full-text articles in the Testbed, and to related article references in external abstracting & indexing database services. Links from metadata to full-text of related articles not in the Testbed has also been investigated using Digital Object Identifiers (DOIs) and similar schema. (A portion of the articles in the Testbed was included in the 1999 DOI-X pilot project investigating linking between online full-text articles in different repositories.)
Experience with documents from various publishers has shown that there are many variations in the way documents are tagged, even for those that claim to use a DTD of a similar type. For example, one DTD points to the author of the article with the tag '<author>'. Another publisher points to equivalent information using the tag '<auth>'. Sometimes relevant information must be culled from multiple nodes. In other cases where the granularity of markup is large, metadata information must be inferred.
Metadata records also are used as part of the Testbed's link management scheme. URLs to document full-text and associated external figures and tables are stored in metadata records. For most of the Testbed collection, document full-text is available in PDF as well as in SGML, XML, and HTML. Also, by keeping separate URLs for each figure, users can retrieve figures independently of document full text. In addition to providing document and figure linkages, the metadata also provide linkages to external related resources. These include other articles cited in the document that might be electronically available. The metadata used in the Testbed provide links to relevant INSPEC and Compendex records, as well as bibliographic records maintained by various publishers. As new items are added to the Testbed each document is checked for links to articles already in the collection. When such links are found, the metadata record for the earlier item is also updated to show the relationship to the newer document. (E.g., addition of an errata article results in a link being added to the original article's metadata showing linkage to the newly added errata.)
The Testbed relies on metadata to serve as document surrogates in the display of intermediate results (as discussed further below). While the full-text of metadata records and journal articles are indexed together, metadata records are maintained separate from the full-text document files themselves. The metadata records are dynamic (e.g., links can be added) while the document instances themselves are static. This helps protect the integrity of the original source documents. This arrangement also facilitates incremental indexing.
We currently maintain our metadata as XML files using the Resource Description Framework (RDF) syntax. RDF syntax provides a standard way for using XML to represent metadata. The semantics of the metadata conform to the Dublin Core (DC) model, with a further level of granularity offered by the inclusion of an extensive set of project-specific, customized elements. Search clients familiar with the corpus of idli tags can achieve results with much greater precision, while index and search systems only familiar with DC semantics can still be used. For example, several idli tags partition the content within the dc:Source tag:
<dc:Source>
<idli:publication type="journal article">
<idli:journal_title>Applied Physics Letters</idli: journal_title >
<idli:journal_title_abbreviation>Appl. Phys. Lett.</idli: journal_title_abbreviation >
<idli:volume>70</idli:volume>
<idli:issue>11</idli:issue>
<idli:first_page>1372</idli:first_page>
<idli:pagination>1372 - 1374</idli:pagination>
</idli:publication>
</dc:Source>
Our metadata schema is verbose in that DC elements are repeated for each occurrence of an intellectual element (e.g., author information for each article author is contained within a separate DC Creator tag). RDF containers (e.g., seq, alt, & bag) are used to describe relations between repeated DC elements. Thus the two authors of an article could be represented in the metadata record for that article as shown below:
<rdf:seq>
<rdf:li>
<dc:Creator>
<idli:author_info>
<idli:author_name>Giust, G. K.</idli:author_name>
<idli:organization_name>Department of Electrical Engineering, Arizona State University, Tempe, Arizona, 85287-5706</idli:organization_name>
</idli:author_info>
</dc:Creator>
</rdf:li>
<rdf:li>
<dc:Creator>
<idli:author_info>
<idli:author_name>Sigmon, T. W.</idli:author_name>
<idli:organization_name>Department of Electrical Engineering, Arizona State University, Tempe, Arizona, 85287-5706</idli:organization_name>
</idli:author_info>
</dc:Creator>
</rdf:li>
</rdf:seq>
A parser written in C++ was created to extract metadata from the XML-formatted full-text. The parser uses the document structure to extract and normalize key elements. Currently, the parser treats the document as a text stream and loads the parsed information of interest into data structures that are then used to construct XML RDF metadata records. Value-added content is incorporated into the metadata from other sources. Most of the added data concern links and relations to other articles inside and outside the testbed. Additional information (e.g., publisher name, copyright statement) not included in the document instance as originally generated for internal use by the publishers is also added as necessary. Calls made to various database repositories determine which links to include in the metadata.
A version of the parser used to generate the metadata is being developed that relies on the Document Object Model (DOM) and the transformative components of XSL (XSLT) to extract the metadata information. XSL stylesheets can be used to selectively expose segments of the original document. This should insure a greater degree of parser flexibility and portability.
USING XSLT TO GENERATE METADATA VIEWS
As described above, the item-level metadata in our digital library Testbed is designed both to assist in item discovery and to describe the items in the Testbed. Enough descriptive information about each item is included in its metadata record to support both summary and extended citation views. Metadata records include descriptive information about the text (e.g., author, title, abstract), about other discrete article components (e.g., figures and tables stored as separate files), and about the document object's relationships with other document objects. Figure 1 illustrates an extended citation view as generated from the XML RDF metadata record for a representative article in the Testbed.
Figure 1 - Extended citation view (selected text deleted for brevity)
XSLT template structures are used within XSL stylesheets to generate both brief and extended XHTML citation views real-time. The XSLT templates are used to selectively present, reorganize, and transform content and markup contained in our metadata records. To fine-tune presentation, CSS instructions are then applied to the transformed output document generated by the XSLT processor.
It is important to note that CSS stylesheets created to work in concert with XSL stylesheets work on the output generated by applying the XSLT stylesheet, not on the original XML document instance. To facilitate construction of our CSS stylesheets we added class and id attributes to several of the XHTML output elements used. For our metadata views our CSS stylesheets were brief. Browser-specific versions were not required.
XSLT facilitates XML to XML transformations, XML to XHTML transformations, and selective extraction of data from XML files. In combination with test and pattern matching rules, the basic XSL elements (e.g., <apply-templates>, <attribute>, <choose>, <copy>, <for-each>, <if>, <template>, <value-of>) provide most functionality required. However, in some cases it is desirable to do direct manipulation of XML element content and/or access external objects (e.g., databases) using XSL. The current W3C specifications do not yet address these matters fully. Microsoft has unilaterally provided methods for invoking scripts and COM objects from within XSL, and we're exploiting these methods pending release of newer standards.
We use XSL stylesheets to assist in creating metadata records, to transform those metadata records to XHTML for display by standard Web browsers, and to transform our base XML files as necessary to support rendering of Testbed articles. XSL templates can be applied real-time at the client (by XSL-aware browsers such as 5.x versions of Microsoft's Internet Explorer) or at the server (as content is requested). Though implementations of XSL are still limited and immature, we have already able to make excellent use of the partial implementations available.
XSLT IMPLEMENTATION ISSUES
Current W3 Consortium Recommendations and Working Drafts describe how XSLT templates are to be included within XSL stylesheets and how such stylesheets can be attached to XML documents using XML processing instructions. However, implementation of these guidelines in production systems is incomplete. Current XSL and XSLT specifications anticipate the need for multiple XSL stylesheets to be attached to a single XML document (e.g., to support multiple views of the data contained within that document), but the mechanisms to determine which of the referenced stylesheets to apply under which circumstances is unclear. At present Microsoft's Internet Explorer web browser always implements the first-named style sheet.
One workaround approach is to implement server-side XSLT processing to handle certain situations. For example, because Netscape web browsers do not yet support XSLT processing themselves, processing XSLT instructions on the webserver and sending only the resultant output to the web browser allows the use of XSLT in Web applications designed to be used by both Netscape users and Microsoft Internet Explorer users. Server-side implementation of XSLT also provides a means to deal with the multiple XSL style sheet issue.
RENDERING MATHEMATICS
One of the major deterrents to scientific or technical publishing on the web has been the difficulty of rendering mathematics precisely as desired. Until recently most web browsers simply did not support the level of formatting, positioning, and special characters needed for anything beyond the simplest of mathematics. The situation has improved with the implementation of several standard technologies in the latest web browsers (e.g., Unicode, CSS, downloadable fonts, JavaScript, the DOM), though there remain problems. Depending on the effects desired and your expectations of which web browsers and which web browser versions will be used to access your site, it may be necessary to create multiple CSS stylesheets to handle browser variations. While both Netscape Communicator and Microsoft's Internet Explorer support the CSS Version 1 specification well, neither supports all of CSS Version 2. Exactly which CSS Version 2 features are supported is different for each browser, and in several cases there are variations in how the specification has been interpreted and implemented.
We have been most successful in utilizing CSS to improve native rendering of complex mathematics using version 5 of the Microsoft Internet Explorer web browser (IE5). IE5 is currently the only widely available browser that supports all of the above technologies to the extent needed for native rendering of complex mathematics. We have had less success with Netscape web browsers to date, though Netscape browsers are capable of rendering some complex mathematics acceptably.
Consider: Figure 2 - Example Mathematics Clause
The as-provided mark-up for this clause is:
<formula>
<f>
<fr>
<nu>1</nu>
<de>3</de>
</fr>
T<inf>F</inf>
</f>
</formula>
XML to XML transformation algorithms (optimized for more complex equations) transform this to the following for rendering in IE5. Note the use of HTML namespace (for tags with inherent functionality).
<formula>
<html:nobr>
<f>
<fr>
 
<html:span id='dli10' class='fr'>
<nu>
<norm>1</norm>
</nu>
<de>
<hidden_sup>|</hidden_sup>
<norm>3</norm>
<hidden_inf>|</hidden_inf>
</de>
</html:span>
 
<html:script language='JavaScript'>SetWidth("dli10");</html:script>
</fr>
T<inf>F</inf>
</f>
</html:nobr>
</formula>
To insure proper rendering considerable presentational markup is added. An HTML no break tag is added to force the browser to render the entire mathematical clause on one line. Non-breaking spaces ( ) and hidden (i.e., not rendered) super and subscripted vertical bars (|) are added to ensure proper horizontal and vertical placements. An HTML span with a unique id attribute and a class attribute equal to 'fr' is added within the <fr> element to allow the fraction width to be set by a JavaScript function (SetWidth). This function calculates an appropriate width for the fraction based on the number of characters present in the numerator and denominator. Numerals are enclosed within a <norm> tag so they can be rendered in a non-italic style. An associated stylesheet (CSS) then provides additional rendering instructions, adjusting font sizes and styles and displaying <nu> and <de> elements as stacked blocks within the width-constrained <fr> element. The bottom border of the <nu> becomes the horizontal line of the fraction.
REMAINING MATHEMATICS RENDERING ISSUES
While the above approach is powerful, works well for many mathematical constructs, and can create HTML that renders well in browsers that support most features of CSS ver. 2 (e.g., IE5), there are limitations. The quality of the result, while understandable, does not approach the quality of a dedicated mathematics typesetting system or the quality of printed mathematics. Fraction bars do not always align perfectly, radical symbols can become detached from associated overbars, etc. Font issues also remain. In some cases public-domain glyphs aren't yet available for some symbols. In other cases downloadable font mechanisms fail when used in conjunction with certain CSS techniques (most notably on Netscape 4.x web browsers).
Some of these problems can be overcome using existing browser capabilities, given enough time and energy, but the preferred approach would be to attack the problem from both directions - that is both to improve stylesheet techniques and browser functionality and consistency. That is only likely when and if the community converges on more consistent ways of marking up complex mathematics. That of course is the goal of the MathML effort, and also of efforts such as STIX, a project by the Scientific and Technical Information Publisher's Group to formulate (and include within Unicode) a comprehensive collection of characters needed in the course of scientific and technical publishing.
CONCLUSION
Our experience to date shows the potential of these new technologies. XML is an excellent format for search and retrieval of textual data. It's supports fine granularity and is useful not only for the markup of primary source material, but also for the markup of metadata about other objects. XSLT and CSS enhance the usefulness of XML in a Web environment. Though still relatively new, both are mature enough and robust enough to be used for large text collections immediately. However the current technologies do have limits. Current implementations are not robust enough to adequately render very complex mathematics, and major implementation variations from vendor to vendor remain. The communities involved need to continue the process of developing and implementing standards that improve the reliability and functionality of digital library systems.
REFERENCES
DeRose, S.J., D.G. Durand, E. Mylonas, & A.H. Renear (1997). What is Text, Really? The Journal of ComputerDocumentation (ACM SIGDOC) 21 (August 1997), 1-25. [Reprinted from Journal of Computing in Higher Education 1 (Winter 1990), 3-26.] See also commentaries that follow on pages 26-44.
Ketcham-Van Orsdel, L. & K. Born (1999). Serials Publishing in Flux. Library Journal 124 (April 15, 1999), 48-53.
Mischo, W.H., & Cole, T.W. (2000). "Processing and Access Issues for Full-Text Journals." In Successes and Failures of the Digital Library Initiative, Proceedings of the 35th Annual Graduate School of Library and Information Studies, University of Illinois of Urbana-Champaign (in press).
Schatz, B., Mischo, W.H., Cole, T.W., Bishop, A., Harum, S., Johnson, E., Neumann, L., & Chen, H. (1999). Federated Search of Scientific Literature: A Retrospective on the Illinois Digital Library Project. IEEE Computer, 32, 51-60.
Schatz, B., Mischo, W.H., Cole, T.W., Hardin, J. Bishop, A., & Chen, H. (1998). Federating Diverse Collections of Scientific Literature. IEEE Computer, 29, 28-37.
Weil, N., (2000). Web Publishers Hinge their Future on XML. Infoworld, February 11, 2000. Online. URL <http://www.infoworld.com/articles/hn/xml/00/02/14/000214hnseybold.xml> [24 April 2000].