Digital Libraries Initiative Spring 1996 SGML Mathematics Workshop
May 1, 1996
Minutes

Minutes courtesy of Tom Magliery, NCSA mag@ncsa.uiuc.edu

After the workshop, Eugene Miya (Information Systems Program, NASA) stood up for the Funding Agencies and expressed his appreciation for the participation of the various attendees. NASA in particular was highly interested in the workshop as they deal extensively with math every day, and it appears in complicated forms.

Presentations

Stephan Wolfram, Mathematica (minutes on Stephan Wolfram's presentation courtesy of Evan Owens, University of Chicago Press http://mish161.cern.ch/sc4wg6/math/dli.htm)

Wolfram Research has spent 5 years and millions of dollars to develop typesetting capabilities of Mathematica. They believe that it can now typeset math to the level of the best commercial system; its page layout capabilities are comparable to WordPerfect or Word and not quite as good as Pagemaker. Unlike TeX, Mathematica can gracefully break formulas into lines without operator intervention because it understands the structure of the math; if it didn't understand the structure, it couldn't calculate. Also it knows how tightly the operators are bound.

Input is palette, keyboard, and strings; has full internal Unicode support. Mathematica has an open architecture for customizing. All formatting styles are pairs: screen and paper style. Uses monospaced fonts on screen for legibility.

To interpret math, it needs to know unambiguously the role of every element in the formula. E.g., is "e" an exponential constant or a variable named "e". Mathematica keeps additional information in its internal representation of the math.

Traditional text book math cannot always be made unambiguous. Mathematica converts from traditional form notation to its internal format using a large collection of heuristic rules; this doesn't always work. When editing in traditional notation, one can easily make something that can't be converted back; their editing environment will prompt about this. It can handle also handle abstract notation, though it needs to given processing rules.

Inside Mathematica, everything is a symbolic expression. Traditional form math is a set of transformation rules. Mathematica's spacing and display is much better than TeX's. It can export math to gif, eps, pict, TeX, HTML, speech, or to ASCII that can be reloaded into Mathematica. Export to/from TeX is done using transformation rules. They have a version that takes math input, sends it to a math server that renders it and returns GIFs; also an inline app to render math into ActiveX and Netscape inline addins.

Wolfram Research would like to work with SGML Math community. Mathematica notebooks are markup language; one could map the structure of the entire notebook (not just the maths) into an SGML DTD.

Dave Raggett, W3C (World Wide Web Consortium)
Discussion of HTML/Math work of W3C

Purpose of W3C HTML math work is to develop open specification. W3C is working with vendors on adding extensions to HTML. Goals of the work include having the language be suitable for both teaching and publishing; work with symbolic and numeric math; be easy to learn and hand-editable; be able to speech synthesize; be formattable for plain text displays; be useful with filters.

Described experiment using Prolog. Goal is to specify mappings from input strings to semantic forms. Mapping rules define for example, how "a*b" would map. Notation must be supplemented with a knowledge base describing mappings, precedence, etc. Knowledge base would be available over the net.

Many questions exist: How will it work? What will the specs look like. Is it too abstract? What if the mappings disappear? At what level should the format semantics be?

Roy Pike, Stilo Associates
12083 SGML math DTD proposal

Discussed proposal for replacing the 12083 math DTD with one with a semantic representation of mathematics. Paper to be submitted to (some group) at SGML' Europe 96.

Applications take math expressions in input forms (e.g., Mathematica) and translate into unique standard internal representation. We need an international standard for this internal representation. Do we agree on the structured representation of mathematics? We must.

Problems exist: Scalability (many functions, symbols, etc.), extensibility (more new ones invented all the time), ambiguity (same symbol used for different meanings in different fields.

Toward solving ambiguity problem: Showed part of proposed DTD; used parameter entities to represent different fields of mathematics. Element names similar to Mathematica's "full form" representation.

Primary task of agreeing on representation of math in full form is secondary to the way you get your input to that point.

Difference between Wolfram/Mathematica proprietary solution and international standard; problem convincing the community, but has advantages, e.g., ability to drag and drop equations into programs.

Must distinguish between the task of making an application and the task of making standards.

Evan Owens, University of Chicago Press (AAS)

AAS gets articles from authors in LaTeX; converts to SGML with Omnimark scripts; edits in ArborText. Use AAP math DTD with some corrections. Currently do presentation by converting SGML back to TeX; one typesetter uses Xyvision, the other is proprietary.

Editing and authoring tools are most important, but presentation is important to users. Uniformly interchangeable SGML is not going to happen at the production level. Additional tags are needed in processing, need to strip out for public archive or presentation.

In general, the rendering system that does the paging supplies the formatting of math. Don't usually care if math is off somewhat, as it's usually not complicated. Don't go back into the SGML to edit it. Math is closer to TeX than LaTeX. Key is to do it before copy editing so it can be cleaned up by the copy editors.

Tim Ingoldsby, American Institute of Physics (AIP)

Need to do something with mathematics right now. Electronic documents may survive as long as print has survived. Mathematica can provide added value, but 60% of authors don't use electronic composition tools that send results. Generation gap before that occurs. HTML is not a short term solution. Must solve fundamental problems, but it will be a decade before the changes discussed earlier in the day will take effect. Need parallel effort that will handle users from English majors to non-educated keyboarders. Must focus on here and now needs.

AIP uses SGML, but uses Xyvision (XSF) as primary publication system. ACS, APS, Elsevier all use Xyvision. Xyvision is the most efficient method to produce complex typography for scientific journals information on paper. Very rich markup available in Xyvision, can translate to SGML. Moving in the same direction as UC -- want to do upfront SGML but Xyvision publication system is going to be the centerpiece for some time. SGML is still central to their vision for the future.

There is a need to do something right now. Mathematica can provide added value, but 60% of authors do not use electronic composition tools that send results. It will be a while (generation gap) before that happens. HTML is not a short term solution. Panorama is a good tool, but with a few drawbacks. Wants you to map diacritics et al to a single character. Would require coming up with 4K or 5K characters. PDF can embed links with links from reference to other parts in a database. Functionality of a PDF viewer can be the same as an HTML product. Should abandon the extensions to 12083.

On converting 12083 to TeX. Difficulty to get feedback when you're just doing tagging. Problem with floating accents -- want to create a separate entity for each accent. Bibliographic materials added features to the reference sections to uniquely define some elements. Adding in different styles, various multimedia and types of reference which can link to other references.

Advantages of using 12083 if it is so similar to TeX: 12083 supposedly covers some of the gaps.

Paul Grosso, ArborText; also Chief Technical Officer of SGML Open

ArborText math editor uses tagging scheme in AAP math, in menus and palettes. Generates SGML according to AAP DTD.

ArborText hasn't done anything with math in the last several years. For much of constituency, the key of the interface is just an implementation issue. Difference between interface should be markup format that protects the information and changes in OS, etc., to preserver the format. This is possible with SGML. Evaluating equations, printing, etc. would be good over time to maintain.

SGML Open consortium work with table model can perhaps be done with math. Take a subset of existing practice that works all over and agree to all support it. Optional attributes that can support semantics could be added to 12083; publishing systems can ignore and others can use to help suggest how 12083 can be used in both worlds.

Search engines working on same DTD can display in different formats. Semantics are fine but want good printing. SGML represents greatest variety. Users don't care about semantics.

Paul Topping, Design Science

Makers of Mathtype, equation editor add-on to MS Word. Mathtype has two components: user interface to enter mathematics and formatting for object oriented graphics. Working on version 4.0 with lots of user interface enhancements to better handle character sets. Also going to use drop-in translators table to convert to DTD and Mathtype and back. Mathtype can be used as part of authoring tools and formatting engine as part of a browser. Translator to 12083 to get back a graphic, GIF, etc. Trying to work with SoftQuad to make something to convert to 12083. Trying to be DTD-independent.

Mathtype only works as add-on, not standalone. Relies on, e.g., Word or Ventura to produce documents. Perfect for SGML community. Plan to make available formatting engine and translators.

12083 to Mathtype conversion has not progressed very far. Implementing the translator architecture. Mathtype internal structure is hierarchy with characters as leaves on trees. Fractions, other mathematical formulae are internal nodes.

Murray Maloney, SoftQuad

Need to address more the notion of transformation between data formats. DSSSL model of conversion from 12083 to other DTDs. DSSSL model of translation identifies one or more solutions with the encoding of mathematics within documents. Working with 12083.

SoftQuad wants to know what publishers want.

Partners Address List

Go back to the DLI workshop page

DLI Home | Glossary


University of Illinois at Urbana-Champaign Digital Libraries Initiative
Comments to: External Relations Coordinator, Tom Habing
11/20/96