Quarterly Report for 4th Quarter 1996-97
(November '96 - January '97)
DLI project, University of Illinois

Federating Repositories of Scientific Literature
Bruce Schatz, PI, schatz@uiuc.edu

SIGNIFICANT EVENTS

The American Institute of Physics has formalized a cooperative arrangement to incorporate much of the technology being developed by the DLI into a later release of its On-line Journal Publishing Service, and to support installation and remote operation of a distributed repository for AIP journals available to the DLI project.. They are funding the DLI under contract to set up the facility.

Software has now been written and implemented that provides the display of full-text Testbed journal articles from citations retrieved within the Ovid search software. The Ovid DLI search software allows searchers of both the Compendex and INSPEC databases to display the full-text of selected articles from the displayed retrieved citation. These bibliographic databases cover nearly all of the articles in the areas of engineering and science the Testbed is serving, so that their availability greatly broadens the coverage beyond the SGML collection.

As a result of the May 1 SGML Mathematics Workshop hosted by the DLI, SoftQuad has committed to developing capabilities for the successful formatting and rendering of mathematics in Panorama. This commitment follows a meeting and subsequent agreement by DLI partners and other scientific publishers to contribute to the cost and expertise of a program to be developed by SoftQuad.

The Social Science Team conducted usability tests for two of the three different DLI testbed interfaces, the IODYNE Internet and web-based Latitude systems, currently under development (the custom Testbed client was tested in the previous quarter). They prepared reports of the results of the three usability tests and met with the system designers to present and discuss these studies.

Funded by the DLI, NCSA Researcher Robert McGrath completed a thorough technical review and evaluation of the large Testbed scale-up planned in the next two years. McGrath is the recent author of a book on Web server performance, based on his experiences with the NCSA Web servers. This study found that the scale-up is expected to be smooth and successful, but identified a handful of areas for concern and further study. The report is titled "UIUC DLI Project Scale-up: A Technical Evaluation'", and is available on line from our Web pages. See Appendix A for a brief summary.

A joint DLI supplement project between Illinois and UCSB has been funded by NSF. Hsinchun Chen, Bruce Schatz, and Terry Smith are experimenting with concept space and self-organizing map (SOM) techniques for GIS visual thesaurus generation. The goal is to test whether the same statistical algorithms that worked well for semantic retrieval of terms within documents works for textures within maps. A testbed of several hundred digitized aerial photos has been created and will grow to a few thousand in the next three months. The collection has been analyzed using the NCSA SGI Power Challenge and Convex Exemplar supercomputers. Preliminary results are encouraging that similarity searches can be done on GIS images and a gazeteer used to link features to related documents on geography indexed by concept spaces.

PI Schatz had a cover lead article in Science magazine for January 17, 1997 entitled Information Retrieval in Digital Libraries: Bringing Search to the Net. This contained screendumps of the Testbed and the Research efforts in the DLI, in the context of describing the evolution of search technology across the Net.

TESTBED

OVERALL ACTIVITIES

The American Institute of Physics (AIP) and the DLI have formalized a cooperative agreement to establish the first distributed repository. AIP intends to incorporate much of the technology being developed by the DLI into a later release of its Online Journal Publishing Services (OJPS). That release, which will become V3.0 of the OJPS, is scheduled for implementation beginning in the Spring of 1998. Technologies such as pre-processing services, all necessary source code, assistance for the installation, configuring and production tasks, etc., will be incorporated in the data preparation and information retrieval aspects of the V3.0 system.

The Testbed team also continues to make significant progress with the processing of materials provided. As of today, there are approximately 18,000 articles from five publishing partners processed and indexed. See Appendix B for a breakdown of the titles that are currently in the Testbed.

SPECIFIC ACTIVITIES

Ovid DLI Software Development

One obvious limitation of the DLI Testbed client is that it provides author, title, and text word access only to the small corpus of titles presently in the testbed. Users wishing to perform more comprehensive author, title, subject, and abstract searching (using Grainger information tools) must also search the on-line A & I Services contained in the IBIS and Ovid systems. The Ovid system, which resides on a server in the Grainger Library, is presently comprised of the Compendex (Engineering Index) database of close to two million citations with descriptors and abstracts back to 1987, and the INSPEC (electrical engineering, computer science, and physics databases), of some 2.2 million citations with descriptors and abstracts, also dating back to 1987. These two databases collectively examine over 6,000 journals, conference proceedings, and technical reports.

One of the clearly-defined goals of the DLI Project has been to integrate the Testbed with major A & I databases by making the full-text Testbed articles available through A & I Service search software. We are happy to report that software has now been written and implemented that provides the display of full-text Testbed journal articles from citations retrieved within the Ovid search software. The Ovid DLI search software allows searchers of both the Compendex and INSPEC databases to display the full-text of selected articles from the displayed retrieved citation. For example, a user seeking to perform a comprehensive search for articles and conference papers written by a specific author can retrieve a set of relevant references and display individual citations on the screen. As these records are displayed, items available in the DLI Testbed collection or accessible from an alternative full-text publisher repository available on the Web will be indicated by the appearance on the screen of a button labeled "View the Full-Text." The user can then elect to view the full-text from the public workstation via the same mechanisms used in the custom client that is, the Netscape Web client operating in conjunction with the Softquad Panorama SGML viewer or the Adobe Acrobat PDF viewer.

The Ovid DLI software is available at public terminals in the Grainger Library. A web version is in the planning stages. Response by users to this integrated "one-stop-shopping" access to available full-text of articles via the Ovid system has been very positive.

The Testbed team is in the process of upgrading from OpenText 5.0 to OpenText 6.0, which is a complete redesign of OpenText's database system. This includes changing from Latitude to Livelink, OpenText's new version of the search and indexing engine for the World Wide Web. The Livelink Web version is expected to be introduced in February. In addition, DLI project staff are working on another Web-based approach, utilizing Microsoft Active Server Pages technology.

INTERNET

OVERALL ACTIVITIES

The functionality of the IODYNE client has converged to a powerful multiple view client incorporating term suggestion and full-text search engines. Previous work has concentrated on evolving the interactive displays and search capability. Work this quarter begin the process of connecting to the production Testbed sources rather than using sample data, so that current information is searchable.

This connection process was greatly aided by the recruiting of Kevin Gamiel as the NCSA lead on the DLI. Kevin comes from CNDIR, an NSF center at North Carolina where he was a technical lead on the ISITE project which is the most widely used Z39.50 gateway on the Web. His help has already been invaluable in connecting to Ovid and in planning for connecting to the DRA Library catalog.

SPECIFIC ACTIVITIES

Development of IODyne continues. A major event was porting the source code to a 32-bit development environment. This was done to accommodate the Z39.50 DLL being developed by Kevin Gamiel as well as in response to the general direction of Windows software development; all campus lab computers will have 32-bit operating systems (either Windows 95 or Windows NT Workstation 4.0) by the beta release date.

We now have a Z39.50 DLL (for 32-bit systems only) which seems to work quite well. It allows a client to simultaneously connect to and query many servers at once, though it does not currently run queries in a non-blocking manner. This DLL is the missing link we have sought to provide connections to Z39.50 servers from PC-based clients. Development of both the Z39.50 DLL and the clients which use it (IODyne in particular) will continue for the foreseeable future.

Other developments in IODyne which are currently underway include support for multiple simultaneously connected thesauri, classification systems, keyword, keyword-in-context (KWIC), and concept-space databases. Porting of these databases to SQL server is anticipated during February, although the current method of using Microsoft Access databases will still be supported. This will at last enable all IODyne database connections to be true client/server. A Term Suggestion Service object layer is currently being developed and tested which will work within the IODyne client software to allow dynamic connection to and disconnection from these types of databases during retrieval sessions.

Kevin Gamiel is supervising a new research programmer, Bill Wentling, at NCSA supported by the DLI. His project is designing a universal gateway for the Web, as an extension of ISITE. This is sort of the inverse of the Iodyne client which maps queries to multiple indexes into a common format to be sent onto the actual search engines. The gateway would take a single input into multiple outputs. The single input will likely be a variant on Z39.50 (and Iodyne will then convert to this as its common query language) and the outputs will include Z39.50, SQL, HTTP, and specialty protocols such as Opentext. This Java gateway will then enable clients to connect to essentially any server on the Web and issue fully fledged queries to it.

RESEARCH

OVERALL ACTIVITIES

The artificial intelligence group, headed by Prof. Chen, continues to develop various clustering, neural networks, and advanced visualization techniques for multimedia digital library collections. Many of the new research outputs are available to their web site with a Java interface component: http://ai.bpa.arizona.edu.

In particular, the collaboration with UCSB DLI is beginning to bear fruit, as evidenced by a joint poster and demo session at the last DLI wide workshop at Stanford.

SPECIFIC ACTIVITIES

The large Compendex/Inspec computation is now incorporated into the Smalltalk Interspace prototype. 1000 concept spaces representing community repositories across all of engineering generated from 10,000,000 abstracts can be navigated across. An integrated interface enables the simultaneous use of concept spaces, full-text search, and category maps. Thus indexes and search exist at several levels of abstraction across many different subject domains.

The cover lead article in Science by Schatz (January 17, 1997 -- Information Retrieval in Digital Libraries: Bringing Search to the Net) emphasized that this was the first major crack in the semantic barrier, realizing the grand visions of JCR Licklider in 1961 for concept-based retrieval across collections covering the entire literature of science.

Preliminary results from the multimedia semantic interoperability experiment with UCSB are encouraging. Similarity matching on textures from a sample aerial photograph using statistical techniques seems to work well. And the gazetteer is detailed enough to link 20-30 coordinates per map section to named features which can be searched for in the geography literature. Work is proceeding to digitize and index 5000 aerial photographs around Santa Barbara and 500,000 text abstracts from Georef and Petroleum Abstracts. This work is being performed by the UCSB DLI project but funded by the UIUC DLI project.

The Interspace team is also parallelizing the concept space algorithms as part of the process of embedding them into the environment. Optimizing results indicate speed-ups of 20-50 times including a close-to-real-time query refinement algorithms. Such speedup is necessary for the forthcoming world of community repositories and represents a major influx of new knowledge in parallel optimization applied to information retrieval problems.

The system performance team in Computer Science was augmented this quarter by Bob McGrath from NCSA who did a major study of scaling up in the Testbed. The conclusion is that scaling up is a reasonable proposition given our technology choices and collections, but a few areas to watch out for were identified. More details are contained in Appendix A below.

The Computer Science team continued its research into server performance with a study on the use of HTTP servers as object servers. The goal is to analyze the general problems of the

HTTP server and how to improve its performance in the application of digital libraries. Various caching schemes have been proposed and are being simulated to take advantage of the bursty nature of digital library traffic and the structured formats of the objects in the collections.

EVALUATION

OVERALL ACTIVITIES

During the past quarter, the Social Science Team continued its work in developing DLI testbed instrumentation and registration procedures. They completed a round of usability testing and practices. Team members also continued their professional contributions and the dissemination of their work by preparing and presenting papers, submitting proposals, and planning for the third Allerton Institute on user-centered design and analysis of DLs.

SPECIFIC ACTIVITIES

The Social Science Team conducted usability tests for two of the three different DLI testbed interfaces, the IODYNE and web-based systems, currently under development (the custom client was tested in the previous quarter). They prepared reports of the results of the three usability tests and met with the system designers to present and discuss these studies. These reports are available on the DLI Web pages.

One team member continued work with the Testbed Team to refine and improve the testbed instrumentation and data collection, including creating a user comment facility that is available from every testbed interface screen. The comment facility is intended to capture specific problems and suggestions that arise in individual DLI testbed use sessions.

Team members redesigned the user registration process to reduce the number of questions and steps presented to testbed registrants, and implemented the new registration process. The Team also continued its reporting of DLI usage statistics.

Usage statistics for the period August to December 1996 have been collected during the first period of limited deployment of the Testbed custom client in the Grainger Engineering Library. There were 102 registrations and 190 logins. Note: during the same five month period, 20 DLI project members and U of I library employees registered to use the tested. These 20 people logged in 441 times, mostly for testing and demonstration purposes.

Social Science Team members designed a new naturalistic study of work and information practices, to be conducted in a campus computer engineering lab. Lab inhabitants have expressed interest in participating in DLI research. One goal of the new study is gaining a single subject pool for the conduct of research that previously has been carried out among different groups. A single subject pool will provide a focal point for integrating results from transaction logs, interviews, observations, and surveys, as well as for considering the methodological issues related to the Team's multifaceted approach to studying DL practices. In addition, the investigation of desktop and office classification work, begun by one Team member in Summer 1996, continued this quarter.

Efforts to contribute to the emerging research community interested in social aspects of DLs also continued, with team members presenting and submitting papers, and submitting workshop and research proposals. The Team began planning for the third Allerton Institute on user-centered DL design and analysis. Their current plan calls for integrating the Institute with the production of a monograph that would serve as a culmination of all three meetings by presenting research reports and essays devoted to what has been learned about and from the DL studies conducted by Institute participants.

SUPPLEMENTAL GRANT FOR IMAGE PROCESSING

First, the content-based retrieval engine being developed as part of the MARS system was extended with information retrieval technique of relevance feedback to improve retrieval performance. The initial algorithms developed were tested over a testbed of 386 textureimages divided into 26 classes and illustrated significant improvement in retrieval performance warranting a detailed study which is now being conducted.

Second, a new testbed consisting of georeferenced satellite images was developed for the MARS system. Experiments were conducted on the developed testbed to measure MARS's performance for information retrieval over geographical data. Research issues and challenges in extending MARS for geographical information retrieval were identified. These research issues will be explored in depth in the following quarters.

Third, significant advances were made in developing efficient techniques for supporting concurrent operations over multidimensional data structures. Multidimensional indexing is the key for a scalable multimedia retrieval system. Supporting such index structures in the database motivates the development of algorithms to support concurrent operations on the data structures.

Fourth, a new image wavelet-based coding technique was devised which combines high compression efficiency featuring a successively refinable bitstream with the segmentation of semantically meaningful objects directly in the wavelet domain, to generate a bitstream in which each object is encoded independently of every other object in the image, without the need to explicitly store expensive shape boundary information. The full potential of this novel paradigm will be fully studied in the near future.

OVERALL ACTIVITIES

Over the past quarter, the supplementary grant team continued with its efforts towards development of a scalable multimedia retrieval system. The team spent a significant amount of time in researching the growing body of related work being done at other institutions, preparing publications on our own work, extending the original prototype in many ways (as discussed above), instrumenting new testbeds into the MARS framework, and identifying new challenges to MARS brought forward by these testbeds.

SPECIFIC ACTIVITIES

Team members extended the MARS system with the relevance feedback mechanism to improve the retrieval effectiveness. With some feedback from the user, the algorithm grasps what the user really wants and adjust the weighting function accordingly. The extended system was applied to a testbed of 384 carefully selected texture images belonging to 26 classes. First, wavelet decomposition was used to do the texture analysis.

More specifically, each image was decomposed into 10 sub-bands, and the mean and standard deviation for each sub-band are extracted as the texture representation. The extracted information was stored in a MARS database and the retrieval algorithm (extended with relevance feedback) was used to retrieve similar images. A report describing MARS extension and experiments was prepared.

In another experiment, a geospatial testbed was created for MARS. For this testbed satellite imagery of the Fort Irwin Area at 30 meter resolution was used. The data composed of seven bands corresponding to properties like water penetration, visible green, vegetation detection, etc. Role of MARS in supporting higher level reasoning based on low level geospatial features was explored. This resulted in a large set of research issues which will be examined in the following year.

Database management and scalability issues in multimedia retrieval system were explored. Specifically, research was conducted on scalable multidimensional data structures. Existing mechanisms do not scale over 5-7 dimensions. Multimedia data may however correspond to 50-100 dimensional space (depending upon the feature representation). Techniques to overcome the dimensionality curse of existing data structures were explored.

Our research in this direction is not yet complete and will be continued and reported over the coming year. Another problem of supporting concurrent operations over multidimensional index structures (a necessity for scalability over large databases) were explored. Techniques for supporting concurrent operations were developed and a report on the developed algorithms was prepared.

ARTICLES ABOUT THE PROJECT

3rd runner-up for Breakthrough of the Year in Science magazine, Dec. 27, is "electronic publishing of journals" with a link to the UIUC DLI as the primary research project in this area . See http://www.sciencemag.org/science/content/current/ under Cyber crush.

PUBLICATIONS

H. Chen, J. Martinez, T. D. Ng, and B. R. Schatz,``A Concept Space Approach to Addressing the Vocabulary Problem in Scientific Information Retrieval: An Experiment on the Worm Community System,'' Journal of the American Society for Information Science, Volume 48, Number 1, Pages 17-31, January, 1997.

Kaushik Chakrabarti, and Sharad Mehrotra, Phantom Protection in R-trees, First Symposium on the Federated Lab on Interactive and Advanced Display, Army Research Labs, Aberdeen, MD, Jan 28-30, 1997.

Kaushik Chakrabarti, and Sharad Mehrotra, Concurrency Control in Multidimensional Access Methods, Technical Report, Department of Computer Science, University of Illinois at Urbana-Champaign, Jan, 1997. (to be submitted for publication).

R. E. McGrath, ``UIUC DLI Project Scale-up: A Technical Evaluation'', http://www.ncsa.uiuc.edu/People/mcgrath/DLI/Scaling.

R. Orwig, H. Chen, D. Vogel, and J. F. Nunamaker, ``A Multi-Agent View of Strategic Planning Using Group Support Systems and Artificial Intelligence,'' Group Decision and Negotiation, Volume 6, Number 1, Pages 35-58, January, 1997.

Schatz, B.R. "The Development Of The Information Sciences At UCSB" Report of an External Advisory Panel, February, 1997.

Schatz, B.R. ``Information Retrieval in Digital Libraries: Bringing Search to the Net,'' Volume 275, pp 327-334, Science, January 17, 1997 cover story and lead article. see reference at http://www.canis.uiuc.edu/keytalks.html

S. L. Star, "Working Together: Symbolic Interactionism, Activity Theory and Information Systems," pp. 206-257 in Yrjo Engestrom and David Middleton, eds. Communication and Cognition at Work. Cambridge: Cambridge University Press, 1996.

GRANTS AWARDED

NSF CAREER AWARD: Kannan Ramchandran (for period 1997-2001, Total Amount: $200,000).

PROFESSIONAL ACTIVITIES

Chen, Hsinchun:

Guest editor, Journal of the American Society for Information Science special issue on "Artificial Intelligence Techniques for Emerging Information Systems Applications," 1997.

Member, ACM SIGIR-97 Conference Organizing Committee, 1997.

Member, ACM Digital Library Conference Program Committee, 1997.

Partner, Data and Collaborative Computing Team, NCSA (National Computational Science Alliance), NSF 96-31 Program Solicitation for Partnerships for Advanced Computational Infrastructure, 1996-1997.

Schatz, Bruce, Chair, Advisory Panel at UCSB Workshop, December 13, 1996. This workshop is an outgrowth of the DLI project at UCSB and concerned starting a School for Information Sciences there. Schatz chaired the panel consisting of many DLI investigators and colleagues, and wrote the summary report.

Leigh Star continued service on NSF Ethics and Values in Science Panel. Consulted for white paper to NSF, "Machinery for Predictability of Complex Systems," September, 1996; co-organizer, with Rob Kling, of Social and Organizational Section, NSF Workshop on Human Centered Computing and Intelligent Systems (February, 1997).

Robert Sandusky submitted a proposal for a workshop on collaboration activities in digital library use (with Michael Twidale) to Digital Libraries 97. He is also organizing a panel on DLI user studies for the Mid-Year Meeting of the American Society for Information Science.

Ann Bishop submitted a proposal entitled "Digital Libraries and the Disaggregation of Knowledge: An Investigation of the Use of Journal Article Components by Researchers," which would grant a semester's release from teaching to pursue a research project, to the Center for Advanced Study at the University of Illinois.

Bishop also oversaw the development of material for the Allerton 1996 website and drafted a plan for Allerton 1997 which would result in the publication of an edited monograph devoted to user-centered design and analysis for DLs.

Kannan Ramchandran, Special Sessions Chair, 1998 IEEE International Conference on Image Processing, Chicago.

Sharad Mehrotra, Publications Chair, 1997 IEEE Workshop on Research Issues in Database Engineering.

PRESENTATIONS

Bishop, Ann P., "The Digital Library Initiative: Studying Use and Users," Technologies for Learning Seminar Series, University of Illinois, November 22, 1996.

Chen, Hsinchun,``Semantic Retrieval for CancerLit,'' National Cancer Institute, Bethesda, MD, Nov 5, 1996. ``The Interspace Knowledge Architecture,'' National Library of Medicine, Bethesda, MD, Nov 6, 1996.

Sharad Mehrotra, ``Concurrency control in R-trees'', First Symposium on the Federated Lab on Interactive and Advanced Display, Army Research Labs, Aberdeen, MD, Jan 28-30, 1997.

Mike Ortega, ``Terrain Similarity Matching'', First Symposium on the Federated Lab on Interactive and Advanced Display, Army Research Labs, Aberdeen, MD, Jan 28-30, 1997.

Neumann, Laura, "Making Infrastructure: The Dream of a Common Language," Participatory Design Conference (PDC 96), Cambridge MA, November 1996.

Schatz, Bruce. ``Federating Repositories of Scientific Literature". Fourth NSF/ARPA/NASA Digital Library Initiative (DLI) All-Project Workshop, Sponsored by NSF/ARPA/NASA, Stanford, CA, December 16, 1996. http://dli.grainger.uiuc.edu/ppt/stfdwksp/index.htm

Schatz, Bruce., Information Analysis in the Net: The Interspace of the Twenty-First Century, lecture for National Technological University, video satellite broadcast to industrial partners, Nov.12.

Schatz, Bruce, The World of a Billion Publishers. invited lead lecture at CSC Index Vanguard meeting on Implications of Universal Publishing, Pasadena, CA, Nov 22. (high-level CIO consulting group)

Schatz, Bruce, Collaboration on the Net: The Worm Community System as a Model Community, Electronic Collaboratories session, AAAS Annual Meeting, Seattle, Feb 15.

Schatz, Bruce, Information Analysis on the Net: The Interspace of the 21st Century, Prospecting for Knowledge session, AAAS Annual Meeting, Seattle, Feb 17.

Star, Susan Leigh, "A Good Infrastructure is Hard to Find: Designing Communication Tools for Scientific Communities," Department of Library and Information Science, UCLA and January 10, Interdisciplinary Humanities Center, UC Santa Barbara, January 9 and 10, 1997.

Star, Susan Leigh, "Whose Voice? Whose Differences? The Politics of Classification," at Workshop on Electronic Orders: Classification, Standardization, Formalization, and Genre in Electronic Orders, UC Santa Barbara, January 11, 1997.

Star, Susan Leigh, "The Feminism Question in Science and Technology Projects," Plenary address to the Conference on Technology and Democracy, TMV Centre, University of Oslo, Norway, January 18, 1997.

Star, Susan Leigh, "Infrastructure, Work and Digital Libraries," Norwegian Computing Centre and Institute for Informatics, System Development Group, University of Oslo, Norway, January 20, 1997.

Visitors

Note that we hosted highest level visits from the two largest publishing concerns in the science and engineering business, Elsevier and Thomson, during this quarter

November 5: Hewlett Packard

The goal of the visit was to discuss ways that more Hewlett Packard Equipment could help the DLI.

Dave Camous, University Relations Manager

Kim Mast, program manager, Automotive Test Business Team/Manufacturing Test Division.
 

November 14: Elsevier at the University of Illinois

The goal of the meeting was to discuss the potential for a CIC-Elsevier license for SGML journals and the potential for a joint UIUC-Elsevier SGML project.

Roland Dietz, Senior Vice President, Elsevier

Edward Hueckel, Account Manager, Elsevier

Darrell W. Gunter, Vice President, Sales and Services Americas, Elsevier

Barbara Allen, Assistant Director, Center for Library Initiatives

Emily Mobley, Dean of the Library, Purdue University

Sheila Curl, Engineering Librarian, Purdue University
 

November 15:

Delegation of Scientists from the People's Republic of China

Goal: To demonstrate the DLI and to discuss the development of telecommunications infrastructure in the PRC.

Mr. Xie Huanzhong, Deputy Director-General, Department of Science and Technology, State Education Commission (SEDC), Beijing, manufacture engineering

Mr. Wang Dongli, Division Chief, Department of Foreign Affairs, SEDC, Beijing, administration of foreign affairs

Mr. Qiang Di, Professor, Deputy Dean of Studies, Beijing University, chemistry

Dr. Chen Ken, Associate Professor, Qinghua University, Beijing, manufacture engineering CIMS

Dr. Chen Liangyao, Professor, Fudan University, Shanghai, solid physics

Dr. Chen Da, Professor, Shanghai Jiaotong University, material science

Dr. Zhang Weiyi, Professor, Nanjing University, solid physics

Dr. Gu Ning, Associate Professor, Southeast University, Nanjing, nami-sci & tech

Dr. Zhang Jing, Professor, Qingdao University of Oceanography, biogeochemistry

Ms. Lang Huiqing, Professor, Northeast Normal University, Jilin, plant ecology

Dr. Fu Hengsheng, Vice Director of Basic Research Division, Department of Science and Technology, SEDC, Beijing, material science

Mr. Jason Rekate, Program Assistant, National Committee on U.S.-China Relations

Ms. Sheree Willis, Interpreter, National Committee on U.S.-China Relations

On November 22 Leigh Star hosted (with STIM) Prof. Anne Figert, Loyola University, who spoke on medical classification at the Grainger Library. She hosted Prof. Bente Elkjaer, Copenhagen Business School, an expert on information systems and organizational learning, and John Christiansen, of Copenhagen Library Council, to explore research visit in 1998. Star also met with Dr. Charlotte Linde, Institute for Research on Learning, and discussed joint work with her information and training project with State Farm Insurance Company. She hosted Prof. Marc Berg, University of Maastricht, and developed proposal for panel on social science and design, 4S meetings, and analysis of his data on medical records.

December 20:CIC/Thomson International Meeting

Goal: To develop a prospectus and outline for one--or several--collaborative electronic reference system projects. The group meeting December 20 agreed that, the following areas offered the greatest opportunity for collaboration:

1) Integration of resources

2) Co-publishing

3) Developing access points that allow the user to search many resources

A group of technical staff from the CIC libraries will meet with Thomson technical staff to discuss the capabilities that exist within the Thomson, Corp., to meet the development needs, and to identify potential areas of research that transcend the reference product (for example, conducting large scale automatic indexing with scalable semantics as described by Bruce Schatz).

Angee Baker, Director Electronic Licensing, Gale Research
Dedria Bryfonski, President, Gale Research
Roger Clark, Director, Committee on Institutional Cooperation (CIC)
Barbara Allen, Director, CIC Center for Library Initiatives
Cindy Clennon, Contract Coordinator, CIC
Gary Fouty, Librarian, University of Minnesota
Kenneth Frazier, Director of General Library System, University of Wisconsin-Madison
Katherine Haskins, Acting Head, University of Chicago Library
Sharon Hogan, University Librarian, University of Illinois at Chicago
Keith Lasser, President/CEO, Thomson Corporation Publishing
Christopher McKenzie, Account Manager, Gale Research Inc.
James Mouw, Head, Serials, University of Chicago
Martin Runkle, University Library, University of Chicago

Appendix A: DLI Scale-up Evaluation

In the next two years, the UIUC DLI project will scale up from a small experiment to a production system. This scale up will occur in several dimensions simultaneously: the number of documents, the number of users, and the number of connected systems will all increase dramatically. During the first two weeks of October, NCSA researcher Robert McGrath conducted a comprehensive review of the proposed expansion, based on interviews with key individuals of the DLI project, as well as published documentation. This work was funded by the DLI grant as part of its strategic planning.

The main finding is that the overall prognosis is good. There is every reason to expect that the projected expansion will be successful. The report gives five significant reasons for this confidence:

  1. The basic architecture is scalable.
  2. The components are mostly based on Commercial-Off-The-Shelf (COTS) or developed with tried technology. Most of the components are already demonstrated.
  3. The size and nature of the collection is not extraordinary compared to contemporary Web servers.
  4. The number of users and expected traffic of the collection are not extraordinary, compared to large Web servers
  5. The networking and server infrastructure at Grainger and the University of Illinois is excellent.

The other major findings are:

  1. The project scale up is not really very large. Single big web servers are probably much larger than the projected target of the DLI.
  2. The most difficult issues during scale up probably involve heterogeneity, not scale.
  3. There is a need to tune OpenText servers and Web servers, and it is possible that more hardware will be needed for servers
  4. The Latitude gateway (and future successors) are crucial bottlenecks, their performance needs to be analyzed and optimized.
  5. The DLI client and gateway software is pushing the limits of current technology.
  6. Data collection of all kinds will require continuing attention.
  7. Adding cryptography based security or payment services will have a potentially serious performance impact.

Appendix B: Testbed Status

As of this month, the UIUC DLI Testbed has approximately 18,000 articles from five of our publishing partners processed and indexed:

American Institute of Physics (AIP)

Applied Physics Letters (1995 to present)
Journal of Applied Physics (11/96 to present)
Review of Scientific Instruments (10/96 to present)

American Physical Society (APS)

Physical Review Letters (1995 to present)

American Society of Civil Engineering (ASCE)

(1995 - 1996)

Journal of Architectural Engineering
Journal of Performance of Constructed Facilities
Journal of Construction Engineering and Management
Journal of Computing in Civil Engineering
Journal of Cold Regions Engineering
Journal of Infrastructure Systems
Journal of Materials in Civil Engineering
Journal of Transportation Engineering
Journal of Environmental Engineering
Journal of Professional Issues in Engineering Education and Practice
Journal of Engineering Mechanics
Journal of Energy Engineering
Journal of Geotechnical and Geoenvironmental Engineering
Journal of Hydraulic Engineering
Journal of Irrigation and Drainage Engineering
Journal of Management in Engineering
Journal of Structural Engineering
Journal of Surveying Engineering
Journal of Urban Planning and Development
Journal of Water Resources Planning and Management
Journal of Waterway, Port, Coastal and Ocean Engineering
Journal of Bridge Engineering
Journal of Hydrologic Engineering
Practice Periodical on Structural Design and Construction

Institute of Electrical Engineers (IEE)

(1995-96, partial both years)

Electronics Letters
IEE Proceedings on
-Science, Measurement and Technology
-Electrical Power Applications
-Generation, Transmission and Distribution
-Control Theory and Applications
-Computers and Digital Techniques
-Radar, Sonar and Navigation
-Circuits, Devices and Systems
-Microwaves, antennas and Propagation
-Communications
-Optoelectronics
-Vision, Image and Signal Processing

IEEE Computer Society

(1995)(1996 will be added to the testbed in March, 1997)

IEEE Computer
IEEE Software
IEEE Multimedia
IEEE Design and Test of Computers
IEEE Computational Science and Engineering
IEEE Graphics and Applications
IEEE Expert: Intelligent Systems and their Applications
IEEE Micro: Chips, Systems, Software and Applications
IEEE Parallel and Distribute Technologies
IEEE Annals of the History of Computing

Go back to the DLI progress reports page

DLI Home | Glossary


University of Illinois at Urbana-Champaign Digital Libraries Initiative
Comments to: External Relations Coordinator, Tom Habing
2/12/97