I. SUMMARY
A working version of the testbed will be available on 25 terminals
in the Grainger Engineering Library, the library of the Beckman Institute
for Advanced Study, the Computer Science library, and selected research
project offices in Computer Science and Physics by January 12, 1996. Initially
the DLI testbed will include materials from AIP, ASCE, and the IEEE Computer
Society. Materials from AIAA, APS, IEE, and IEEE will be added as the spring
semester progresses.
Currently, embedded math within SGML document instances is provided by DLI
publishing partners in a variety of formats: either in SGML markup; as embedded
TeX; or in a combination of SGML markup, TeX and/or links to graphic format
files (most commonly GIF). Strategies were discussed regarding how to render
embedded math: rely on a commercial viewer to render math provided in SGML
(currently only SoftQuad's Panorama is available); translate TeX math embedded
in SGML document instances into graphic formats (e.g., GIF) which the viewer
used can render in line; search SGML but retrieve documents in another format
(e.g., PDF) for printing and/or on-screen viewing.
The UIUC DLI project team expressed their preference for only SGML, with
the hope that the rendering problems will be solved in the future. Publishing
partners were split: several expressing preference for some type of hybrid
SGML/PDF approach (e.g., index SGML and retrieve PDF); others wanting to
stay with the embedding of TeX in SGML; and others favoring an SGML only
solutions. It was agreed to work on a short term solution to better support
embedded TeX within the DLI testbed materials. The testbed will also accommodate
PDF, PS, or DVI copies of documents for retrieval, as long as SGML versions
of all documents entered in the testbed are also provided.
DLI committed to hosting a conference on SGML math rendering in the Spring
of 1996. We ask that our partners give us suggestions as to whom should
attend this workshop to make it effective. SoftQuad committed to improving
rendering of SGML mathematics. Their current strategy calls for working
with Design Science (developers of MathType for Windows) to improve math
rendering within Panorama. EBT indicated that they are working on a web
version of their SGML viewer which will accept 'raw' SGML transferred across
the web. It was implied that this version of their SGML viewer would probably
be able to handle embedded TeX (or DVI?) much as it currently does when
retrieving EBT binary versions of SGML documents.
UIUC DLI project will undertake to provide the initial Panorama style sheet
for materials given to us; long-term it is hoped that publishing partners
will take over editing, maintenance, etc. of these style sheets. Fonts remain
an issue. A new font set for better rendering entities in DLI testbed material
is needed and will be solicited from OCLC. SoftQuad also indicated that
they are working to provide a mechanism by which special font characters
can be sent with (or referenced within?) SGML document instances.
The Illinois DLI project continues to move towards a system for federated
repository system. While we are working on the technology for this, we encourage
our partners to begin seriously thinking about setting up their own repositories
for your materials, based on our technology and assistance whenever appropriate.
By the end of the project (August 1998), we hope that all of our main partners
will be hosting their own repository servers and that we can demonstrate
from our Illinois distributed software a live federation directly to your
repositories (single searches with uniform displays, etc.) We look towards
the first demonstration to be with AAS (American Astronomical Society).
II. ACTION ITEMS
1. Ensuring feedback to partners
-every two weeks Susan Harum will email our partners the status of the material,
and other pertinent information, including interesting URLs, articles, etc.
-Susan Harum will also send out a notice when the quarterly report has been
posted on the WEB.
2. DLI will host a specialty workshop on Mathematics
-A small subset of people who know how to render math (any outside experts)
will be invited to participate in a Mathematics workshop 1 or 2 days before
the next Publisher Workshop. A white paper written by Tim Cole and Tom Magliery
detailing problems/progress will be distributed one month before. Tom Magliery
(NCSA) will represent DLI at SGMLí95 and report on where SGML community
is regarding solving math rendering problems and where the SGML community
is on DSSSL.
3. Scheduling next publisher workshop
-We would like to schedule the next workshop in late March or early April.
Please let me know what are NOT good times.
4. Upcoming DLI milestones:
-Jan 12, 1996 - limited testbed with the following publications:
Applied Physics Letters, January 1995 to present, published by the American Institute of Physics (AIP)
Journal of Computers in Civil Engineering, January 1995 to present, published by the American Society for Civil Engineering (ASCE)
Computer, January 1995 to present, published by IEEE Computer Society
We have material from the following publishers that will be added to the DLI testbed over the course of the spring semester as we work out the necessary processing procedures, and following appropriate review by publishing partners:
American Institute of Aeronautics and Astronautics (AIAA):
AIAA Journal
American Institute of Physics (AIP):
Journal of Applied Physics
American Physical Society (APS):
Physical Review Letters
American Society of Civil Engineering (ASCE):
Journal of Aerospace Engineering,
Journal of Transportation
Journal of Materials in Civil Engineering
Journal of Construction Engineering and Management
Journal of Performance of Constructed Facilities
Institution of Electrical Engineers (IEE)
IEE Letters
Institute of Electrical and Electronics Engineers (IEEE)
IEEE Transactions
IEEE Letters
IEEE Computer Society
IEEE Software
IEEE Design and Test
IEEE Computational Science and Engineering
IEEE Graphics
IEEE Expert: Intelligent Systems and their Applications
IEEE Micro: Chips, Systems, Software and Applications
IEEE Parallel and Distributed Technologies
- Feb./March 96 - site visit from our project sponsors (NSF, ARPA, NASA)
- August 96 - will produce first Web version (based on Visual Basic program
tested during spring 1996 semester). This version will include thesaurus
features and word co-occurrence and enhanced full-text searching features.
The web version will incorporate JAVA and embedded OCX technologies and
will go across Web to our search engine (OpenText) and go out to HTTP server,
probably NCSA and retrieve SGML. Database, collection and server will be
hosted in Grainger, client will be distributed campus wide.
-1997 - further client enhancements, and we will be testing/implementing
distributed server models for all publishers that are ready. The UIUC DLI
User Evaluation group will conduct extensive surveys.
-August 98 - user population will expand to CIC (big ten universities) and
publisher repositories. (distributed repositories, federations)
5. Alternate Software Packages:
The UIUC DLI project team is currently evaluating SGML database management
and rendering software available from EBT. EBT has also updated us on their
near-term plans for updates and extensions to their current product line.
As of yet, the available EBT software doesn't provide the functionality
for moving SGML around the Web that we need for the DLI. We will continue
to use Panorama by SoftQuad for display and OpenText for searching.
6. Different formats
-AIP has no objection to putting up PDF versions if an SGML is also available
(e.g., read SGML, print PDF). Publishers are invited to start delivering
PDF, PS, or DVI on a regular basis.
7. Distribution of custom software to publishers:
-Source code is always available free -We will let our publishing partners
know of any advances, and keep you up to date. We can also distribute our
client and give DLLs.
8. Contract agreement
-For those publishers who have still not signed and sent in the Joint Partnership
agreement, please do so.
III. SYNOPSIS OF MAJOR DISCUSSION POINTS
The pros and cons of using the DTD ISO-12083 math vs. AAP math and the issue
of whether additions are essential or desirable from a functional standpoint
was discussed. It was agreed that to better support rendering of math marked
up in SGML it would be desirable to tighten up the content models of these
math DTD fragments with as few additions as possible. We hope to accomplish
this through further interactions with the SGML community and with the DLI-sponsored
spring meeting on math rendering.
Several of the publishers voiced their concern over not getting enough feedback
on the materials theyíre supplying to the testbed and the role of
the testbed in technology transfer. In particular, publishers expressed
some concern with how long it has taken the DLI project to process and incorporate
materials into the testbed. The project team members indicated that resolving
rendering problems was taking more resources and time than had been anticipated.
Publishers identified pressures by society members and readers to move more
quickly on these and related issues. DLI project team responded by describing
the tension throughout the project between the short term goal of getting
the testbed up and running and long term infrastructure research. The immediate
goal of working together to solve rendering problems was also reiterated
and it was suggested that DLI partners could serve as an arbitrator between
a public and private solution.. Representatives from AIP added that the
problem isnít with SGML, itís with the functionality, and
that the only way to conceptualize whether the rendering is being done correctly
is to try these different solutions (such as TeX). They also added that
issues such as fonts need to be dealt with.
The issue of fonts was discussed further. SoftQuad is looking at a two part
scheme in which the font travels with the document and a local application
looks at the fonts and displays the document. This ensures that the fonts
you need with the document are there. SoftQuad needs to hear from the marketplace
whether this is a viable idea. The possibility of a font server was also
discussed. This would be a service on the web using the ability to go and
get fonts as needed. A big problem with this is that every single user of
the document has to pay for fonts and users will get weary of paying if
fonts change often. Font embedding was also discussed (buy a license and
put fonts in documents.) Pat Walker, of IEEE, stated that publishers would
shy away from yet another licensing agreement. Murray Maloney, of SoftQuad,
stressed that it would protect the integrity of document. Marvin Sirbu brought
up the model for caching, that is that during the first two months the font
server would get a lot of hits, then taper off after that.
The publishers present also brought to the attendees attention the adoption
of a new article identifier, differentiated from an ISSN number, which will
begin appearing in material next year. A unique identifier for each article,
its purpose is to provide a document delivery number (and for other further
use) not unbearably long.
IV. CHRONOLOGY OF MEETING
Demonstration of customized PC interface for the DLI testbed, January -
May 1996; Bill Mischo
The interface for the testbed, which uses features document via HTTP, and
search, includes the following features:
-Table of contents with names of participating societies.
-New Articles
-Benchmark articles (recommended articles from professors)
-Demo searches (video will walk user through sample search)
-Search Wizard (walks user through search process)
-Search History
-Search forms with general and detailed search choices
-A hyperlink to the OpenText indexing service on the WEB
-A hyperlink to the University of Illinoisí OVID Compendex (and possibly INSPEC databases) and the Grainger Library reference database and journal list
-A hyperlink to the local University of Illinois faculty database
-Spell checker (Microsoft)
When an article is selected, it is then fetched over the Web by a standard Web browser using CCI protocals, with Panorama fetching the associated files (DTD, style sheet, etc.) Panorama then displays the document in a Web environment and highlights key words.
PROBLEMS WITH SEARCHING SGML:
-Term word proximity and lack of a 'word wheel' for dynamic spelling context:
the OpenText indexing mechanism uses Patricia Trees to store pointers to
document streams in lieu of indexing at the word level. Although this is
a highly optimized system for retrieving streams of characters, the user
is unable to ask for positions of words (words within X words of another)
and instead must ask for characters within characters of each other.
-Heterogeneous DTDs need normalization. Not all of the search forms are
usable when searching across different DTDs.
-Inconsistent entry constructs (e.g., author forenames and surnames)
-Limitations in OpenText in http environment, which result in lack of state
and search connections. It is currently difficult to combine previous sets,
retrieve a set and modify it. One of the programmers on the DLI project
(Eric Johnson) is designing a client interface with JAVA, which we hope
will address these problems.
-Move toward a federated model: We expect to provide for our users a model
for processing, indexing, searching, retrieving, and displaying for searching
distributed repositories, hopefully with SGML as a standard, using next
generation operating systems which will allow for a distributed object environment.
-An outline of a talk similar to the one presented by Mr. Mischo can be
found at: http://www.grainger.uiuc.edu/dli/asistoo.htm
Marvin Sirbu, CMU - NETBILL.
-An overview of the NETBILL project can be found at: http://www.ini.cmu.edu/NETBILL/publications/CompCon_TOC.html
Although the DLI project will not be ready to incorporate NETBILL into the
testbed until the Fall of 1996, Dr. Sirbu invited attendees to test the
project immediately.
-SERVER SECURITY; Beth Frank. See Bruce Schatzís overview talk ìSemantic
Federation from Distributed Repositories of Scientific Literatureî;
http://www.grainger.uiuc.edu/dli/semantic.htm.
-SUMMARY OF TESTBED ISSUES, Tim Cole:
The testbed is currently behind schedule due to the following: SGML is not
as homogeneous as we had envisioned, technology is not as advanced as we
had planned (e.g., the problems weíve had with rendering), and the
variety of condition of material from publishers. Our expectation regarding
the material from all of the journals in the testbed will be from January
1995 forward.
PUBLISHERS:
-In full production: AIP (2000+ articles)
-Approaching full production: ASCE, IEEE Computer Society, APS
-Samples (UIUC still to incorporate): IEEE, AIAA
TESTBED MATERIALS: DTDs
-All DTDs include the following UIUC modifications: links for category information;
UIUC nethead (contains metadata model that is supported by web server);
some characters will be replaced with ASCII
-We've been getting a variety of DTDs and have found that ISO12083 shows
the most promise. Arbor Text has significant structural differences.
Variants of ISO_12083 Article DTD: APS, AIP, IEEE CS
-Variant of AAP/Online Computer DTD: ASCE
-Variant of Arbor-Text Book: IEEE
-AIAA Book/SoftQuad Canonical Table DTD: AIAA
ENTITY SETS
-Panorama Default Entity Sets (ISO-8879)
-CALS ISO (overlaps with above): IEEE Computer Society
-Publisher specific: APS, AIP, ASCE, IEEE (Arbor Text Equation); AIAA (incorporated
within DTD)
STYLE SHEETS
We are using the Lucida Bright Microsoft add on font pack. Our approach
is to do a preliminary style sheet for our partners, with our partners then
taking over the responsibility. We try to follow the paper copy and would
like feedback from everyone.
Style sheet issues: There is a point at which an entity has to be converted
by the client. The client will go and look at sdata.map to see how to render
the character. Sdata.map has to be there when panorama is launched. potential
conflict with sdata.maps. No good way to edit it partially. Resolution with
characters can be different from otherís sdata.map.
FONT ISSUES:
-Where are we going to get the right fonts? Are we going to create them
within this project? Is it the responsibility of the publishers? New entities
will come up inevitably and the occasional character will need to be created.
FIGURES
-JPEG; AIP, APS
-TIFF; ASCE, IEEE CS, IEEE
-We are currently using Ulead viewer. Ulead is a free viewer that uncompresses
TIF and JPEG files and handles all current variants. ULEAD can be found
at:
http://www.seed.ret.tw/~ulead
(ftp://ftp.ulead.com.tw/pub/goodies/ulview11.zip)
MATH
-ISO-12083 SGML Math Variants: AIP, APS, IEEE CS, and IEE all have some
additional tags.
-AAP SGML Math Variants: ASCE, APS
-Embedded TeX: IEEE, IEEE CS, AIAA. Currently, we are translate the TeX
into GIFs. When Panorama encounters math in a document, it brings the GIFs
over as the document is scrolled through.
-Not all of the problems are due to the renderer (Panorama). Some processing
can be done to help get the item looking like the print version.
One solution is to look at TeX. Some experimentation has been done with
MathType:
-MathType DLL formats for separate onscreen view
-support for ISO 12083 DTD fragment
-Further support for EuroMath
-Local caching for performance
-Launch helper applications for other notations
-Design Science needs our analysis of 12083 DTD. We need to give them what
semantics need to be included so that they can invent a new DTD which handle
all situations/specifications.
ALTERNATE PRESENTATION/DELIVERY FORMATS
-Postscript: APS
-PDF: APS (PRL distilled by UIUC), AIP
-Xyascii; APS (Physical Review C)
The DLI will continue to focus on SGML which is, clearly, the most effective
tool for indexing and retrieval of full-text elements. However, until rendering
problems are worked out we will offer alternate formats for viewing equations
and for printing.
SOFTWARE
Index Servers
Delivery Servers:
-HTTP (Windows NT, IBM RS6000)
-Future: EBT DynaTex & DynaWeb
SGML CREATION ISSUES
Processing: Declarations, Public Identifiers, etc.
-We create structure appropriate for transferring DTDs, style sheets, entity sets, figures, etc. over WEB
-OpenText prefers DTD without external file references, HyTime, Etc.; files without DOCTYPE Declarations, etc.
Processing: Document Instance
DTD Errors/Issues:
-Nesting, incomplete Element Declarations, tag minimization, etc.
-Added tags for math, etc.
DTD -Document Instance Inconsistencies:
-Structural hierarchy, Elements not in DTD
Document Instance Errors:
-Typos, bad internal links, etc.
Other Authoring Issues:
-Addition of information by UIUC: URLs, Publishing Detail, etc.
-Procedures for adding links, classification data, category data, etc.
-Multiple Article Instances in Single Files
PANORAMA UPDATE; Murray Maloney, Product Manager, SoftQuad
FORMATTING CAPABILITIES
-Panorama being re-built (alpha version out next month)
-More powerful formatting
-Greater precision of placement
-DSSSL planning underway (waiting for DSSSL to settle down)
GRAPHICS EDITING and DISPLAY
-Agreement with Group 42
-Author/Editor integration
-Panorama integration
-Support for GIF, TIFF, CGM, WMF, BMP, JPEG, etc.
INTERPROCESS COMMUNICATION
-mechanisms to ìtalkî to Panorama
-NCSA Mosaic CCI
-Netscape client API
-Spyglass Software Developerís API
-others to come
WWW CAPABILITIES TODAY
-Panorama relies on WWW browsers
--resolves URLS
-CCI aware
WWW CAPABILITIES
-Native WWW Support
-HTTP Capability
-HTML 2.0 and beyond (table, forms, frames, etc.
-Local file Update (styles, icons, graphics, maps)
SEARCH INTERFACE
-Opentext Latitude Project
-form-based Search front end
-SGML-sensitive (SGML element knowledge, SGML architectural form knowledge
PANORAMA PLATFORM IN 1996
will incorporate DSSSL.
plugable language system localization kit for extensive browser
shrink wrapped ìlanguage packsî
http://www.math.psu.edu/dna/publications.html
Lance demonstrated a PC DVI viewer. In conjunction with Panorama, this viewer
could be called on to display mathematics or DVI fragments embedded in an
SGML document. In stand-alone mode, the DVI viewer had several valuable
features, including the ability to look at more than one document at a time,
magnify formulas and equations, and see what font set was used in specific
areas of a document.
Go back to the DLI workshop page
