ATTACHMENT 3 -- TAG NORMALIZATION / ALIASING (AS OF 3/12/96)

Tag (Region) ContentDLI StandardAIPAPSASCEIEEIEEE CSIEEE
Publication Front Matter <FRONT> <FRONT> <FRONT><FM><FM><FM>note 9
Article Title & Subtitle <TITLEGRP> <TITLEGRP> <TITLEGRP><ATL> <TIG><TIG><ARTICLETITLE>
Article Abstract <ABSTRACT> <ABSTRACT> <ABSTRACT><ABS> <ABS><ABS><ABSTRACT>
Article Body <BODY> <BODY> <BODY><ARTTEXT> <BDY><BDY><BODY>, note 10
Section Title within Body <TITLEX> <TITLE> <TITLE><ST><ST><ST><H1>
Paragraph within Body <P> <P> <P><P><P> <P><PAR>
Figure<FIGGRP> <FIGGRP> <FIGGRP><FIGG> <FIG><FIG><FIGURE-BLOCK>
Figure Caption <TITLEX> <TITLE> <TITLE><LEGEND> <FGC><FGC><CAPTION>
Table<TBLHEAD> <TBLHEAD> <TBLGRP><TBL> <TBL><TBL>note 11
Table Caption <TITLEX> note 1<TITLE>n/a<TI> <TI><TABLE-CAPTION>
Citation List <BIBLIST> <BIBLIST> <BIBLIST>note 4<BIBL><BIBL><REFERENCE-LIST>
Individual Citation <CITATION> <CITATION> <CITATION>n/a<BB><BB><REF>
Title of Cited Article <TITLEX> note 2n/an/a<BRFTI> <ATL>n/a, note 12
Title of Cited Publication <SERTITLE> <SERTITLE> <SERTITLE>n/a<BRFSRTI><TI>n/a, note 12
Organization Name (free-floating) <ORGNAME> <ORGNAME> <ORGNAME>note 5note 7<ONM>n/a
Mathematical Formula (free-floating) <FORMULAX> note 3note 3note 6<FORM> note 8n/a, note 12

NOTES:

n/a - indicates that no equivalent or normalized tag can be created or is available.
note 1 - An artificial table caption region is created from the first cell of each table and then combined with the <TITLE> regions to create a <TITLEX> region.
note 2 - The AIP DTD requires a <TITLE> region within each <CITATION> region, however it is consistently left empty for APL articles.
note 3 - Inline math <FORMULA> regions are combined with display math <DFORMULA> regions to create a <FORMULAX> region.
note 4 - Keying off the word REFFERENCES at the end of the section title and the corresponding close section tag, an artificial <BIBLIST> region is created.
note 5 - Acknowledgment sections and <AFF> regions are combined to create an artificial <ORGNAME> region.
note 6 - Inline math <F> regions are combined with display math <FD> to create a <FORMULAX> region.
note 7 - <AFF> and <BRAFF> regions are combined to create an artificial <ORGNAME> region.
note 8 - <MATH> and <SGMLMATH> regions are combined to create an artificial <FORMULAX> region.
note 9 - Keying off the openning <ARTICLE> tag and the first opening <H1> section title tag, an artificial region <FRONT> region is created.
note 10 - For IEEE <BODY> includes all parts of the article, including front matter.
note 11 - Keying off the openning <TABLE-CAPTION> tag and the closing </FIGURE-BLOCK> tag, an artificial <TBLHEAD> region is created.
note 12 - This content can be deduced easily during pre-processing, but is difficult to determine post-indexing -- in part because these regions are delineated by entities and/or processing instructions normally filtered out during index build. Currently evaluating work-arounds.

M.M. Pflaum & T.W. Cole - 3/12/96

Go back to UIUC DLI Testbed Processing Customization