Quarterly Report for 2nd Quarter 1996 (May - July)
DLI Project, University of Illinois

Federating Repositories of Scientific Literature
Bruce Schatz, PI, schatz@uiuc.edu

SIGNIFICANT EVENTS

This quarter we held our second annual Partner's Workshop and it is evident from both the attendance of higher-level management and number of attendees, that our relationships with our Partners remain solid and that their commitment to the project remains strong. Seven of our partners (AIP, APS, ASCE, IEE, IEEE CS, AIAA) have provided us with SGML from 1995 forward.

In July, the American Institute of Physics sent one of their research programmers (at their own expense) to initiate the first transfer of technology from the DLI to our partners for the purpose of setting up a repository of scientific documents. Scott Johnson was shown how to process and index AIP articles to be used by the DLI client interface, how the client sets up Opentext queries for SGML search, how to use the client interface, and various other programming details of the client interface. Tim Cole also met with Mr. Johnson to help formulate specifications for enhancements to SoftQuad Panorama for SGML display.

The UIUC DLI project is emphasizing the collaborative effort needed between university researchers and our publishing partners. A large part of the project will be to look at the future roles of publishers, libraries, A & I services, and authors, and to extend the model of distributed publisher repositories developed in the DLI project. We now have a viable retrieval system and are in the process of setting up a framework for a system of federated repositories. After going through the process with AIP and one or two others, the DLI will be in a position to advise our publishing partners in setting up their own repositories.

This quarter we also hosted a DLI SGML Mathematics workshop with the goal of bringing together a number of SGML vendors along with major technical publishers, to work on solving some problems of interactive display of mathematical typesetting in SGML. Attendance was very good, surpassing our initial expectations, and a lively debate of the various options ensued. Representatives from the DLI funding agencies also attended the Mathematics workshop, as well as the following Partner's Workshop.

The Technology Research directed by Hsinchun Chen and Bruce Schatz completed the semantic retrieval computation of 10,000,000 abstracts within 1000 concept spaces across all of science and engineering literature. This computation on the HP Convex Exemplar supercomputer at NCSA is the largest ever in information science. This unique data set is now being used to experiment with algorithms for vocabulary switching across disciplines in engineering and science. The computation was widely reported in the press, including a full-page news article in Science (June 7, p1419) and a short note in Business Week (Aug 12, p83). There was also a major press release on this computation and the DLI project put out by our partner Hewlett-Packard.

Our collaboration with the UC Santa Barbara DLI project is now actively functioning. This is partially supported by a supplemental DLI grant to UCSB. Chen and Schatz visited Santa Barbara for two days in June to discuss the collaboration and review progress. Extensive meetings were held with the key members of the UCSB project. A small collection of aerial photographs has been processed by UCSB (B. Manjunath) and demonstration software built to show that small texture blocks do re-occur across such maps, thus potentially being useful for similarity searching. A much larger set is being computed now, as input to the co-occurrence computation that Chen will be performing in the next quarter. The goal is to test whether term suggestion with concept spaces works as well with image textures as it does for text phrases. If so, this will provide a path to automatic searching of the content of spatial data images (as opposed to only the metadata). It will then also represent a major technology transfer of the research of the UIUC DLI into the testbed of the UCSB DLI.

TESTBED

Sent out client to NASA (Eugene Miya), INSPEC (Jeff Pache), ASCE (Carol Reese), AIP (Tim Ingoldsby). Received verification of successful install from ASCE and AIP. AIP demonstrated it at a number of their own publisher meetings.

Author Word Wheels were incorporated into the interface.

Client interface and DLITOP (integrating other library services) was installed in the Beckman Institute library (major interdisciplinary research institute on campus). Revised client interface to be used on machines with 800 X 600 resolution. Revised client interface to work on Windows 95 and NT machines. Windows NT4.0 was previewed in beta form and identified as the probable target for development machines and possibly clients.

Enhanced the client such that the user can choose to view individual figures and tables of an article retrieved from a search before deciding to view the entire article. Also added TOC to the top form, and allow for jumping to Journal titles, and volumes from the short entry form.

Updated the IEE sgml processing code to handle the new dtd which includes header information (<PUBFM>, containing dates, pagenumbers, volume and issue numbers) which was previously unavailable for use in our uinethead. Also changed the processing code to accommodate a hierarchical structure with 12 IEE Journals. Four of the twelve titles have been processed.

Finished implementation of transaction logging as per sociology groups specs.

OpenText's Latitude Web Server Pro 5.0 was installed on freya to provide an early web-based interface to the OpenText database. Difficulties concerning Latitude's ability to access PEMs of PEMs are currently being investigated by OpenText and a new research programmer, Donal O' Connor.

AIP material was processed using publisher-provided SGML-to-TeX macros, which were then converted to GIFs. These documents are now in the http://morrigan/~aip_dvi/ directory.

The new HP UNIX machine yama, obtained through special arrangement with our partner HP, was integrated into the processing pipeline, and massive amounts of processing (2+ GB) have been completed for APS and IEE.

EVALUATION

Major Activities

During the past quarter, the Social Science Team has continued with its efforts to instrument the testbed, study potential and actual users, and develop instruments and analysis techniques to be used in later studies of work and information practices in the evolving digital order, as well as of the nature and extent of testbed use. We have also spent a significant amount of time analyzing the growing body of literature that informs our work, preparing publications based on our work, and leading and participating in activities aimed at nurturing the emerging community of researchers engaged in social science research related to digital libraries, both within and beyond the six DLI projects.

Our specific activities are summarized below:

Data Collection and Analysis Activities

Laura Neumann studied work practices of a group of physicists by interviewing them about their work and observing them as they worked. She wrapped up this study by taking photos of their work area and doing a final interview with one of the most closely observed physicists. Neumann has also produced a draft report on this study.

Robert Sandusky worked closely with the Testbed Team to develop, test, and put the user registration and client instrumentation systems into production. This involved articulating all data to be collected and developing the mechanisms needed to collect, store, and analyze the data. Emily Ignacio prepared a plan for the analysis of the user registration data and worked on testing the plan with pretest data from users.

Robert Sandusky, Emily Ignacio, and Ann Bishop produced the specifications for instrumenting the Testbed to include "Good/Bad" buttons for recording user comments about specific session activities in transaction logs of their sessions.

Laura Neumann explored qualitative research programs to be used in managing and analyzing data gathered from interviews and observations as well as from "open answer" spaces in the user registration form for the testbed.

Emily Ignacio attended a seminar on survey design (6 classes, from July 8 to July 24) so that she could gain additional knowledge and skills to be applied to the conduct of DLI user surveys in 1996/97.

Emily Ignacio prepared a draft survey instrument related to gathering additional data on the information seeking and use activities of scientists and engineers, including their use of digital resources. She also reviewed literature relevant to planning for the analysis of the resulting survey data, as well as other study results.

Disseminating the Results of DL Research

Laura Neumann and Leigh Star revised drafts of their paper "Making Infrastructure: The Dream of a Common Language." One of these was accepted at PDC 96 (participatory design conference) and the other is currently under review for publication in a refereed journal.

Laura Neumann built a new web forum for discussion for community of social scientists of all kinds working on digital library projects of all sorts. This forum was strongly called for by attendees at the project-wide DLI meeting in May. The forum includes the opportunity to gather "team pages" from different social science groups as well as spaces to discuss relevant topics such as methods (http://anshar.grainger.uiuc.edu/dlisoc/SocSci_page.html).

Ann Bishop and Leigh Star wrote a chapter on "Social Informatics of DL Infrastructure and Use," for which other team members performed bibliographic research.

Robert Sandusky wrote a paper on "Network Management as Cooperative Work, which has been submitted for presentation at an international conference.

Emily Ignacio wrote a paper with Frances Jacobson on information seeking and evaluation in DLs used in educational settings that will appear in a prominent journal in library and information science.

Fostering Development of Social Science Research on DLs

Ann Bishop, Emily Ignacio, and Leigh Star worked on planning and organizing activities related to the GSLIS and NSF sponsored Allerton Institute on "Libraries, People, and Change: A Research Forum," which will be held at the University of Illinois on October 27-29, 1996. Leigh Star and Robert Sandusky also began working on the Allerton sessions they will lead.

Ann Bishop led the team in organizing and conducting a range of activities related to user needs and evaluation for the project- wide DLI meeting in May 1996. These included a pre-conference gathering, a plenary session, a working group session, and a demo session.

TECHNOLOGY

Thesaurus Editors

Research explorations centered around classification schedules such as the Dewey Decimal Classification, INSPEC Classification Schedule and PACS (Physics and Astronomy Classification System). Reviewed, upon request, the thesaurus developed by the Astrophysical Society, one of our partners.

Development efforts to create a new tool for thesaurus generation and editing were discussed with Eric Johnson and with thesaurus editors. Such a tool would be part of the IOdyne system.

Eric Johnson has been revising the design of the IODyne client and has also been designing the abstract retrieval protocol as well as the abstract retrieval gateway architecture.

Information Repositories

Adam Cain has been working on the S-HTTP implementation based on Teresa's toolkit (formerly EIT's S-HTTP toolkit). (Note that Judd Weeks has been working with Adam on this.) The S-HTTP client and server should be available for beta test the beginning of July. The changes to the server have been extensive enough that it will be called 1.6.

Chris Dunlap (RA) has continued to work on a PGP implementation.

Chris Gibrich has been handling the httpd support mail. We are slowly converting to the @ATS system.

Scott Powers has been working on the 2.0 server design. (A demonstration of depositing a document with metadata was done in May.) Represented NCSA at the IETF meeting.

Brandon & Scott also did a performance evaluation of several different JAVA servers. All the servers have shown the same performance problems as our initial 2.0 prototype. Our conclusion is that the problem does not lie in our server design. We will continue to re-evaluate server performance as just in time (JIT) and native code compilers become available. So far we have not been able to determine if the performance problem is because the language is interpreted or if it is inherent in the language design.

Semantic Retrieval

The implementation of concept spaces for community repositories has reached a state of maturity. The code was transferred from Chen's algorithms lab in Arizona to Schatz's systems lab in Illinois. Bill Pottenger in Schatz's lab, a student of compilers for parallel computers, rewrote the concept space code to be self-contained and well-packaged. This will be used in the coming quarter for the Testbed efforts (to automatically build concept spaces for the incoming journal articles) and for the Research efforts (to generate concept spaces for the personal repositories of the team members).

Progress continues on algorithms for automatic categorization, using the self-organizing maps of Kohonen. Maps to cluster 10,000 documents have been generated on a sample basis, but the algorithms are still too slow to do automatic classification for a million documents and thus replace the human-generated class codes. Algorithm design continues for automatic categorization.

A complete demonstration of vocabulary switching utilizing the big concept space computation from Compendex and Inspec is being prepared. This is written in Smalltalk by Kevin Powell and others on the DLI and CAN projects, and uses CORBA to communicate with external searchers. The demonstration includes concept spaces for the past 5 years of the two sources including disciplines across all of engineering and science, and includes full-text search in addition to concept spaces. Experiments continue to enhance the algorithms for vocabulary switching.

PUBLICATIONS

Bishop, Ann Peterson & Star, Susan Leigh. 1996, in press. "Social Informatics for Digital Library Infrastructure and Use." In: Martha Williams, ed. Annual Review of Information Science and Technology, vol. 30.

H. Chen, J. Martinez, T. D. Ng, and B. R. Schatz, "A Concept Space Approach to Addressing the Vocabulary Problem in Scientific Information Retrieval: An Experiment on the Worm Community System,'' Journal of the American Society for Information Science, Volume 47, Number 8, Sept 1996.

H. Chen, A. Houston, J. Yen, and J. F. Nunamaker, "Intelligent Meeting Facilitation Agents: An Example on GroupSystems,'' IEEE Computer, Volume 29, Number 8, August, 1996.

H. Chen, B. R. Schatz, T. D. Ng, J. P. Martinez, A. J. Kirchhoff, C. Lin, "A Parallel Computing Approach to Creating Engineering Concept Spaces for Semantic Retrieval: The Illinois Digital Library Initiative Project,'' IEEE Transactions on Pattern Analysis and Machine Intelligence, Special Issue on Digital Libraries: Representation and Retrieval, September, 1996.

Cochrane, Pauline and Johnson, Eric. "Visual Dewey: DDC in a Hypertextual Browser for the Library User", Advances in Knowledge Organization, Vol. 5., International Society for Knowledge Organization, Washington DC, 1996, pp.95-106.

Harum, S., Mischo, W., and Schatz, B. "Federating Repositories of Scientific Literature: An Update on the Digital Library Initiative at the University of Illinois at Urbana-Champaign", D-Lib Magazine, July/August 1996 http://www.dlib.org/dlib/july96/07harum.html

Jacobson, Frances & Ignacio, Emily. In press. Teaching Reflection: Information Seeking and Evaluation in a Digital Library Environment. To appear in: Library Trends.

Laliberte, Dan and Martin Hamilton. "Experimental HTTP methods to support indexing and searching", June 1996. http://ds.internic.net/internet-drafts/draft-hamilton-indexing-00.txt

Neumann, Laura J. & Star, Susan Leigh. 1996, in press. "Making Infrastructure: The Dream of a Common Language." To appear in: Proceedings of PDC (Participatory Design Conference) '96.

Sandusky, Robert. "Network Management as Cooperative Work: Implications for the Design of Integrated Network Management Systems." Submitted to The Fifth IFIP/IEEE International Symposium on Integrated Network Management to be held May 12-16, 1997 in San Diego.

B. Schatz and H. Chen, "Building Large-Scale Digital Libraries,'' IEEE Computer, Special Issue on ``Building Large-scale Digital Libraries,'' Volume 29, Number 5, Pages 22-27, May, 1996.

B. Schatz, B. Mischo, T. Cole, J. Hardin, A. Bishop, and H. Chen, "Federating Diverse Collections of Scientific Literature,'' IEEE Computer, Special Issue on ``Building Large-scale Digital Libraries,'' Volume 29, Number 5, Pages 28-36, May, 1996.

Nancy J. Yeager & Robert E. McGrath, "Web Server Caching'', WebTechniques, Volume 1, issue 2, May, 1996, pp. 47-51.

ARTICLES ABOUT THE PROJECT

"Computation Cracks 'Semantic Barriers' Between Databases", Science, vol. 272, June 7, 1996, p.1419. also news articles in Business Week, Canadian Business, HPCwire, WEBster, HPCnews.

PRESENTATIONS

Bishop, Ann. "User Needs and Evaluation", Digital Library Initiatives All-Project Semi-Annual Meeting, May 16, Ann Arbor, MI

Cochrane, Pauline. chaired session at the International Society for Knowledge Organization conference, Library of Congress, Washington DC, July 15-18, 1996.

Harum, Susan "The Impact of Digital Libraries on Libraries and Librarians", WILSWorld '96, Madison, WI, June 13, 1996.

Johnson, Eric. "Visual Dewey: DDC in a Hypertextual Browser for the Library User", International Society for Knowledge Organization conference, Library of Congress, Washington DC, July 15-18, 1996.

Laliberte, Dan. "Centralized and Distributed Searching", Distributed Indexing/Searching Workshop, MIT, May 28-29, 1996, Sponsored by the World Wide Web Consortium http://www.w3.org/pub/WWW/Search/9605-Indexing-Workshop/ http://union.ncsa.uiuc.edu/~liberte/www/searching/distrib-position.html http://union.ncsa.uiuc.edu/~liberte/www/searching/distrib-slides.html

Laliberte, Dan. "Registration / Notification Breakout", Distributed Indexing/Searching Workshop, MIT, May 28-29, 1996. Reported by Mike Heffernan (Fulcrum) and Dan LaLiberte (NCSA), and edited further by Mike Schwartz (@Home)

Schatz, Bruce. "Highlights of New Activities for DLI", Digital Library Initiatives All-Project Semi-Annual Meeting, May 16, Ann Arbor, MI.

WORKSHOPS ATTENDED

McGrath & Folk attended a NASA workshop on Working Prototype - Earth Science Information Partnerships (WP-ESIPs). (25-26 June). An ESIP is equivalent to the current Distributed Active Archive Centers (DAACs) which are the main repositories of data from NASA's Mission to Planet Earth (MPTE). The WP-ESIPs are intended to be prototypes of smaller, cheaper, better repositories. The workshop was specifically intended to revise the draft Cooperative Agreement Notice (CAN), CAN-96-MTPE-01 http://www.hq.nasa.gov/office/mtpe/esip2page.html

[See also: http://webstar.ncsa.uiuc.edu/Horizon/wp-esip.html]

Mischo, W. and Donal O'Connor attended World Wide Live, a live satellite broadcast to developers the world over by Microsoft on web technologies (including ActiveX) on July 24.

Digital Libraries Initiative All-Project Semi-Annual Meeting, May 16-17, Ann Arbor, MI

GRANTS AWARDED:

H. Chen, Principal investigator (PI), National Center for Supercomputing Applications (NCSA), "Parallel Semantic Analysis for Spatially-Oriented Multimedia GIS Data,'' High-performance Computing Resources Grants (Peer Review Board), on Convex Exemplar (64 processors, 24,000 SUs), June 1996-June 1997 (IRI960001N).

H. Chen, Principal investigator (PI), NSF, CISE, "Supplement to Alexandria DLI Project on Semantic Interoperability,'' $100,000, June 1996-May 1998.

PROFESSIONAL ACTIVITIES

B. Schatz and H. Chen, guest editor, IEEE Computer, Special Issue on "Building Large-scale Digital Libraries,'' Volume 29, Number 5, May, 1996.

PRESENTATIONS

H. Chen, "Semantic Indexing and Interoperability,'' Third NSF/ARPA/NASA Digital Library Initiative (DLI) Workshop, Sponsored by NSF/ARPA/NASA, Ann Arbor, Michigan, May 16-17, 1996.

Ingoldsby, Timothy Demonstration of the DLI at the Council of Engineering and Scientific Society Executives (CESSE), July 11, 1996.

Visitors

DLI Partners' Workshop May 2-3, 1996 List of Participants
DLI SGML Mathematics Workshop May 1, 1996, List of Participants

May 9 Karen Jambeck, Western Connecticut State University Demonstration of DLI testbed and discussion of SGML. Karen is a professor of Medieval Literature at Western Connecticut State University. She is interested in setting up an SGML databases for Proust scholars around the world.

June 24-28 Scott Johnson, American Institute of Physics. Scott worked with the DLI testbed team for one week in preparation of AIP setting up the first repository of SGML documents for the DLI.

July 8 Tuija Sonkkila, Helsinki University of Technology Library, Finland. Ms. Sonkkila is in charge of researching the best way to publish, index, and distribute electronically Finland's scientific reports.

On July 12, NCSA hosted a very productive visit from Dr. Jeffrey Percival of the University of Wisconsin to discuss potential support for Progressive Image Transmission in a JAVA image data browser. In the morning, Percival gave a general talk about the work his group has done implementing a progressive image transmission scheme that is especially well-adapted to slow networks. After lunch, Jeff met with members of the Project Horizon Java subgroup in a roundtable discussion about Progressive Image Transmission and the image data browser.

Anselm Baird-Smith (ABS) of the WWW Consortium visited NCSA on 11-12 July. He is the developer of the Jigsaw server, which is a ``next generation'' Web server written entirely in Java. (see: http://www.w3.org/pub/WWW/Jigsaw/) NCSA is interested in collaborating with W3C, using Jigsaw as the basis of future information server and repository projects.

July 23 Shirley Streib and colleagues, Caterpillar Inc., Technical Information Center. Shirley and her colleagues returned for an update on the progress of the DLI. Caterpillar is an industrial partner of NCSA.

July 30 Ted Dews and Judith Clark, James Cook University, Queensland Australia. Ted and Judith are researching libraries around the country that distribute electronic documents.

Go back to the DLI progress reports page

DLI Home | Glossary

University of Illinois at Urbana-Champaign Digital Libraries Initiative
Comments to: External Relations Coordinator, Tom Habing
10/15/96