Digital Libraries Initiative Fall 1995 Partners' Workshop
November 16-17, 1995

Minutes

I. SUMMARY

A working version of the testbed will be available on 25 terminals in the Grainger Engineering Library, the library of the Beckman Institute for Advanced Study, the Computer Science library, and selected research project offices in Computer Science and Physics by January 12, 1996. Initially the DLI testbed will include materials from AIP, ASCE, and the IEEE Computer Society. Materials from AIAA, APS, IEE, and IEEE will be added as the spring semester progresses.

Currently, embedded math within SGML document instances is provided by DLI publishing partners in a variety of formats: either in SGML markup; as embedded TeX; or in a combination of SGML markup, TeX and/or links to graphic format files (most commonly GIF). Strategies were discussed regarding how to render embedded math: rely on a commercial viewer to render math provided in SGML (currently only SoftQuad's Panorama is available); translate TeX math embedded in SGML document instances into graphic formats (e.g., GIF) which the viewer used can render in line; search SGML but retrieve documents in another format (e.g., PDF) for printing and/or on-screen viewing.

The UIUC DLI project team expressed their preference for only SGML, with the hope that the rendering problems will be solved in the future. Publishing partners were split: several expressing preference for some type of hybrid SGML/PDF approach (e.g., index SGML and retrieve PDF); others wanting to stay with the embedding of TeX in SGML; and others favoring an SGML only solutions. It was agreed to work on a short term solution to better support embedded TeX within the DLI testbed materials. The testbed will also accommodate PDF, PS, or DVI copies of documents for retrieval, as long as SGML versions of all documents entered in the testbed are also provided.

DLI committed to hosting a conference on SGML math rendering in the Spring of 1996. We ask that our partners give us suggestions as to whom should attend this workshop to make it effective. SoftQuad committed to improving rendering of SGML mathematics. Their current strategy calls for working with Design Science (developers of MathType for Windows) to improve math rendering within Panorama. EBT indicated that they are working on a web version of their SGML viewer which will accept 'raw' SGML transferred across the web. It was implied that this version of their SGML viewer would probably be able to handle embedded TeX (or DVI?) much as it currently does when retrieving EBT binary versions of SGML documents.

UIUC DLI project will undertake to provide the initial Panorama style sheet for materials given to us; long-term it is hoped that publishing partners will take over editing, maintenance, etc. of these style sheets. Fonts remain an issue. A new font set for better rendering entities in DLI testbed material is needed and will be solicited from OCLC. SoftQuad also indicated that they are working to provide a mechanism by which special font characters can be sent with (or referenced within?) SGML document instances.

The Illinois DLI project continues to move towards a system for federated repository system. While we are working on the technology for this, we encourage our partners to begin seriously thinking about setting up their own repositories for your materials, based on our technology and assistance whenever appropriate. By the end of the project (August 1998), we hope that all of our main partners will be hosting their own repository servers and that we can demonstrate from our Illinois distributed software a live federation directly to your repositories (single searches with uniform displays, etc.) We look towards the first demonstration to be with AAS (American Astronomical Society).

II. ACTION ITEMS

1. Ensuring feedback to partners

-every two weeks Susan Harum will email our partners the status of the material, and other pertinent information, including interesting URLs, articles, etc.
-Susan Harum will also send out a notice when the quarterly report has been posted on the WEB.

2. DLI will host a specialty workshop on Mathematics

-A small subset of people who know how to render math (any outside experts) will be invited to participate in a Mathematics workshop 1 or 2 days before the next Publisher Workshop. A white paper written by Tim Cole and Tom Magliery detailing problems/progress will be distributed one month before. Tom Magliery (NCSA) will represent DLI at SGMLí95 and report on where SGML community is regarding solving math rendering problems and where the SGML community is on DSSSL.

3. Scheduling next publisher workshop

-We would like to schedule the next workshop in late March or early April. Please let me know what are NOT good times.

4. Upcoming DLI milestones:

-Jan 12, 1996 - limited testbed with the following publications:

Applied Physics Letters, January 1995 to present, published by the American Institute of Physics (AIP)
Journal of Computers in Civil Engineering, January 1995 to present, published by the American Society for Civil Engineering (ASCE)
Computer, January 1995 to present, published by IEEE Computer Society

We have material from the following publishers that will be added to the DLI testbed over the course of the spring semester as we work out the necessary processing procedures, and following appropriate review by publishing partners:

American Institute of Aeronautics and Astronautics (AIAA):
AIAA Journal
American Institute of Physics (AIP):
Journal of Applied Physics
American Physical Society (APS):
Physical Review Letters
American Society of Civil Engineering (ASCE):
Journal of Aerospace Engineering,
Journal of Transportation
Journal of Materials in Civil Engineering
Journal of Construction Engineering and Management
Journal of Performance of Constructed Facilities
Institution of Electrical Engineers (IEE)
IEE Letters
Institute of Electrical and Electronics Engineers (IEEE)
IEEE Transactions
IEEE Letters
IEEE Computer Society
IEEE Software
IEEE Design and Test
IEEE Computational Science and Engineering
IEEE Graphics
IEEE Expert: Intelligent Systems and their Applications

IEEE Micro: Chips, Systems, Software and Applications
IEEE Parallel and Distributed Technologies

- Feb./March 96 - site visit from our project sponsors (NSF, ARPA, NASA)
- August 96 - will produce first Web version (based on Visual Basic program tested during spring 1996 semester). This version will include thesaurus features and word co-occurrence and enhanced full-text searching features. The web version will incorporate JAVA and embedded OCX technologies and will go across Web to our search engine (OpenText) and go out to HTTP server, probably NCSA and retrieve SGML. Database, collection and server will be hosted in Grainger, client will be distributed campus wide.
-1997 - further client enhancements, and we will be testing/implementing distributed server models for all publishers that are ready. The UIUC DLI User Evaluation group will conduct extensive surveys.
-August 98 - user population will expand to CIC (big ten universities) and publisher repositories. (distributed repositories, federations)

5. Alternate Software Packages:

The UIUC DLI project team is currently evaluating SGML database management and rendering software available from EBT. EBT has also updated us on their near-term plans for updates and extensions to their current product line. As of yet, the available EBT software doesn't provide the functionality for moving SGML around the Web that we need for the DLI. We will continue to use Panorama by SoftQuad for display and OpenText for searching.

6. Different formats

-AIP has no objection to putting up PDF versions if an SGML is also available (e.g., read SGML, print PDF). Publishers are invited to start delivering PDF, PS, or DVI on a regular basis.

7. Distribution of custom software to publishers:

-Source code is always available free -We will let our publishing partners know of any advances, and keep you up to date. We can also distribute our client and give DLLs.

8. Contract agreement

-For those publishers who have still not signed and sent in the Joint Partnership agreement, please do so.

III. SYNOPSIS OF MAJOR DISCUSSION POINTS

The pros and cons of using the DTD ISO-12083 math vs. AAP math and the issue of whether additions are essential or desirable from a functional standpoint was discussed. It was agreed that to better support rendering of math marked up in SGML it would be desirable to tighten up the content models of these math DTD fragments with as few additions as possible. We hope to accomplish this through further interactions with the SGML community and with the DLI-sponsored spring meeting on math rendering.

Several of the publishers voiced their concern over not getting enough feedback on the materials theyíre supplying to the testbed and the role of the testbed in technology transfer. In particular, publishers expressed some concern with how long it has taken the DLI project to process and incorporate materials into the testbed. The project team members indicated that resolving rendering problems was taking more resources and time than had been anticipated. Publishers identified pressures by society members and readers to move more quickly on these and related issues. DLI project team responded by describing the tension throughout the project between the short term goal of getting the testbed up and running and long term infrastructure research. The immediate goal of working together to solve rendering problems was also reiterated and it was suggested that DLI partners could serve as an arbitrator between a public and private solution.. Representatives from AIP added that the problem isnít with SGML, itís with the functionality, and that the only way to conceptualize whether the rendering is being done correctly is to try these different solutions (such as TeX). They also added that issues such as fonts need to be dealt with.

The issue of fonts was discussed further. SoftQuad is looking at a two part scheme in which the font travels with the document and a local application looks at the fonts and displays the document. This ensures that the fonts you need with the document are there. SoftQuad needs to hear from the marketplace whether this is a viable idea. The possibility of a font server was also discussed. This would be a service on the web using the ability to go and get fonts as needed. A big problem with this is that every single user of the document has to pay for fonts and users will get weary of paying if fonts change often. Font embedding was also discussed (buy a license and put fonts in documents.) Pat Walker, of IEEE, stated that publishers would shy away from yet another licensing agreement. Murray Maloney, of SoftQuad, stressed that it would protect the integrity of document. Marvin Sirbu brought up the model for caching, that is that during the first two months the font server would get a lot of hits, then taper off after that.

The publishers present also brought to the attendees attention the adoption of a new article identifier, differentiated from an ISSN number, which will begin appearing in material next year. A unique identifier for each article, its purpose is to provide a document delivery number (and for other further use) not unbearably long.

IV. CHRONOLOGY OF MEETING

Demonstration of customized PC interface for the DLI testbed, January - May 1996; Bill Mischo
The interface for the testbed, which uses features document via HTTP, and search, includes the following features:

-Table of contents with names of participating societies.
-New Articles
-Benchmark articles (recommended articles from professors)
-Demo searches (video will walk user through sample search)
-Search Wizard (walks user through search process)
-Search History
-Search forms with general and detailed search choices
-A hyperlink to the OpenText indexing service on the WEB
-A hyperlink to the University of Illinoisí OVID Compendex (and possibly INSPEC databases) and the Grainger Library reference database and journal list
-A hyperlink to the local University of Illinois faculty database
-Spell checker (Microsoft)

When an article is selected, it is then fetched over the Web by a standard Web browser using CCI protocals, with Panorama fetching the associated files (DTD, style sheet, etc.) Panorama then displays the document in a Web environment and highlights key words.

PROBLEMS WITH SEARCHING SGML:

-Term word proximity and lack of a 'word wheel' for dynamic spelling context: the OpenText indexing mechanism uses Patricia Trees to store pointers to document streams in lieu of indexing at the word level. Although this is a highly optimized system for retrieving streams of characters, the user is unable to ask for positions of words (words within X words of another) and instead must ask for characters within characters of each other.
-Heterogeneous DTDs need normalization. Not all of the search forms are usable when searching across different DTDs.
-Inconsistent entry constructs (e.g., author forenames and surnames)
-Limitations in OpenText in http environment, which result in lack of state and search connections. It is currently difficult to combine previous sets, retrieve a set and modify it. One of the programmers on the DLI project (Eric Johnson) is designing a client interface with JAVA, which we hope will address these problems.
-Move toward a federated model: We expect to provide for our users a model for processing, indexing, searching, retrieving, and displaying for searching distributed repositories, hopefully with SGML as a standard, using next generation operating systems which will allow for a distributed object environment.
-An outline of a talk similar to the one presented by Mr. Mischo can be found at: http://www.grainger.uiuc.edu/dli/asistoo.htm
Marvin Sirbu, CMU - NETBILL.
-An overview of the NETBILL project can be found at: http://www.ini.cmu.edu/NETBILL/publications/CompCon_TOC.html
Although the DLI project will not be ready to incorporate NETBILL into the testbed until the Fall of 1996, Dr. Sirbu invited attendees to test the project immediately.
-SERVER SECURITY; Beth Frank. See Bruce Schatzís overview talk ìSemantic Federation from Distributed Repositories of Scientific Literatureî; http://www.grainger.uiuc.edu/dli/semantic.htm.
-SUMMARY OF TESTBED ISSUES, Tim Cole:

The testbed is currently behind schedule due to the following: SGML is not as homogeneous as we had envisioned, technology is not as advanced as we had planned (e.g., the problems weíve had with rendering), and the variety of condition of material from publishers. Our expectation regarding the material from all of the journals in the testbed will be from January 1995 forward.

PUBLISHERS:

-In full production: AIP (2000+ articles)
-Approaching full production: ASCE, IEEE Computer Society, APS
-Samples (UIUC still to incorporate): IEEE, AIAA

TESTBED MATERIALS: DTDs

-All DTDs include the following UIUC modifications: links for category information; UIUC nethead (contains metadata model that is supported by web server); some characters will be replaced with ASCII
-We've been getting a variety of DTDs and have found that ISO12083 shows the most promise. Arbor Text has significant structural differences.
Variants of ISO_12083 Article DTD: APS, AIP, IEEE CS
-Variant of AAP/Online Computer DTD: ASCE
-Variant of Arbor-Text Book: IEEE
-AIAA Book/SoftQuad Canonical Table DTD: AIAA

ENTITY SETS

-Panorama Default Entity Sets (ISO-8879)
-CALS ISO (overlaps with above): IEEE Computer Society
-Publisher specific: APS, AIP, ASCE, IEEE (Arbor Text Equation); AIAA (incorporated within DTD)

STYLE SHEETS

We are using the Lucida Bright Microsoft add on font pack. Our approach is to do a preliminary style sheet for our partners, with our partners then taking over the responsibility. We try to follow the paper copy and would like feedback from everyone.
Style sheet issues: There is a point at which an entity has to be converted by the client. The client will go and look at sdata.map to see how to render the character. Sdata.map has to be there when panorama is launched. potential conflict with sdata.maps. No good way to edit it partially. Resolution with characters can be different from otherís sdata.map.

FONT ISSUES:

-Where are we going to get the right fonts? Are we going to create them within this project? Is it the responsibility of the publishers? New entities will come up inevitably and the occasional character will need to be created.

FIGURES

-JPEG; AIP, APS

-TIFF; ASCE, IEEE CS, IEEE

-We are currently using Ulead viewer. Ulead is a free viewer that uncompresses TIF and JPEG files and handles all current variants. ULEAD can be found at:

http://www.seed.ret.tw/~ulead

(ftp://ftp.ulead.com.tw/pub/goodies/ulview11.zip)

MATH

-ISO-12083 SGML Math Variants: AIP, APS, IEEE CS, and IEE all have some additional tags.

-AAP SGML Math Variants: ASCE, APS

-Embedded TeX: IEEE, IEEE CS, AIAA. Currently, we are translate the TeX into GIFs. When Panorama encounters math in a document, it brings the GIFs over as the document is scrolled through.

-Not all of the problems are due to the renderer (Panorama). Some processing can be done to help get the item looking like the print version.

One solution is to look at TeX. Some experimentation has been done with MathType:

-MathType DLL formats for separate onscreen view

-support for ISO 12083 DTD fragment

-Further support for EuroMath

-Local caching for performance

-Launch helper applications for other notations

-Design Science needs our analysis of 12083 DTD. We need to give them what semantics need to be included so that they can invent a new DTD which handle all situations/specifications.

ALTERNATE PRESENTATION/DELIVERY FORMATS

-Postscript: APS

-PDF: APS (PRL distilled by UIUC), AIP

-Xyascii; APS (Physical Review C)

The DLI will continue to focus on SGML which is, clearly, the most effective tool for indexing and retrieval of full-text elements. However, until rendering problems are worked out we will offer alternate formats for viewing equations and for printing.

SOFTWARE

Index Servers

Delivery Servers:

-HTTP (Windows NT, IBM RS6000)

-Future: EBT DynaTex & DynaWeb

SGML CREATION ISSUES

Processing: Declarations, Public Identifiers, etc.

-We create structure appropriate for transferring DTDs, style sheets, entity sets, figures, etc. over WEB

-OpenText prefers DTD without external file references, HyTime, Etc.; files without DOCTYPE Declarations, etc.

Processing: Document Instance

DTD Errors/Issues:

-Nesting, incomplete Element Declarations, tag minimization, etc.

-Added tags for math, etc.

DTD -Document Instance Inconsistencies:

-Structural hierarchy, Elements not in DTD

Document Instance Errors:

-Typos, bad internal links, etc.

Other Authoring Issues:

-Addition of information by UIUC: URLs, Publishing Detail, etc.

-Procedures for adding links, classification data, category data, etc.

-Multiple Article Instances in Single Files

PANORAMA UPDATE; Murray Maloney, Product Manager, SoftQuad

FORMATTING CAPABILITIES

-Panorama being re-built (alpha version out next month)

-More powerful formatting

-Greater precision of placement

-DSSSL planning underway (waiting for DSSSL to settle down)

GRAPHICS EDITING and DISPLAY

-Agreement with Group 42

-Author/Editor integration

-Panorama integration

-Support for GIF, TIFF, CGM, WMF, BMP, JPEG, etc.

INTERPROCESS COMMUNICATION

-mechanisms to ìtalkî to Panorama

-NCSA Mosaic CCI

-Netscape client API

-Spyglass Software Developerís API

-others to come

WWW CAPABILITIES TODAY

-Panorama relies on WWW browsers

--resolves URLS

-CCI aware

WWW CAPABILITIES

-Native WWW Support

-HTTP Capability

-HTML 2.0 and beyond (table, forms, frames, etc.

-Local file Update (styles, icons, graphics, maps)

SEARCH INTERFACE

-Opentext Latitude Project

-form-based Search front end

-SGML-sensitive (SGML element knowledge, SGML architectural form knowledge

PANORAMA PLATFORM IN 1996

will incorporate DSSSL.

plugable language system localization kit for extensive browser

shrink wrapped ìlanguage packsî

http://www.math.psu.edu/dna/publications.html

Lance demonstrated a PC DVI viewer. In conjunction with Panorama, this viewer could be called on to display mathematics or DVI fragments embedded in an SGML document. In stand-alone mode, the DVI viewer had several valuable features, including the ability to look at more than one document at a time, magnify formulas and equations, and see what font set was used in specific areas of a document.

Go back to the DLI workshop page

DLI Home | Glossary


University of Illinois at Urbana-Champaign Digital Libraries Initiative
Comments to: External Relations Coordinator, Tom Habing
10/15/96