Quarterly Report for 1st Quarter 1997
(February - April ‘97)
DLI Project, University of Illinois

Federating Repositories of Scientific Literature
Bruce Schatz, PI,
schatz@uiuc.edu
contact:
dli@uiuc.edu

SIGNIFICANT EVENTS:

This quarter we held our third annual Partner's Workshop. Technology transfer and the continuation of the testbed beyond the end of the DLI were our partners' main concerns. The attendee list is provided below in Appendix A and the minutes in Appendix B.

Superlinear speedups for the generation of concept spaces have been achieved through the implementation of a C++ dynamic memory allocator.

Dynamic linking was added to testbed document bibliographies.

  • The custom client for the UIUC DLI testbed has been modified to provide links from article bibliographies to records in the INSPEC database or full-text when available.
  • TESTBED

    This past quarter the testbed team concentrated on the upgrade to OpenText 6.0, which is a complete redesign of OpenText's database system. This included the purchase of software and hardware, and the development of software to aid the conversion of testbed databases. New software was developed for processing, and metadata structures were revised and updated. The conversion from Latitude to Livelink was made, which involved writing new configurations to make it work on the World Wide Web. Livelink (OpenText's new version of the search and indexing engine for the World Wide Web) supports searching across multiple repositories and more complex boolean searching. The custom client for the testbed has been modified to provide links from article bibliographies to records in the INSPEC database or full-text when available. In addition, the testbed team has been working closely with the Social Science team on the interface for a Web-based client which will provide simulated stateful connections.

    In addition to our annual Spring Partners Workshop, the testbed team also hosted the American Institute of Physics and the IEEE CS on separate occasions to go over the current status of the testbed and future plans in greater detail.

    EVALUATION

    During the past quarter, the Social Science Team has been involved in system design, testbed usage evaluation, and the conduct of research aimed at producing new insights into the intersection of digital infrastructure with scientific and engineering work and communication. Team members became heavily involved in the early phase of design and implementation for our web client. They also began producing more detailed analyses of registration and transaction log data. A study of the work and information practices of an engineering workgroup was launched, a preliminary analysis of data on the use of document structure was completed, and a study of office classification practices continued. The Social Science Team also continued its efforts to foster the emerging research community in the social informatics of DLs through presentations, publications, and planning for the production of a monograph devoted to human-centered design and analysis of DLs.

    The Social Science Team began working closely with both intended users and designers on the preliminary development of our new web client. A focus group with science and engineering librarians provided suggestions for design as well as deployment. Librarians contributed ideas about how to enhance use of the web client through improved functionality and interface design, marketing, user instruction and support, and increased involvement of librarians in the design and implementation process. The Social Science Team began meeting on a weekly basis with web client designers, bringing their expertise on user needs and preferences to bear on the design of system features and promotional material.

    Team members increased their analysis and reporting of usage data produced by the custom client's registration and transaction logging procedures. The usage data reveal such things as the field and career level of testbed users, the number of searches performed by patrons in different fields, the frequency of log-ins per user, and the number of transactions per search session. In addition, the in-depth analysis of individual sessions was begun. The particular question pursued was the nature of use of individual components of documents, but exploring session logs to address this question also served as a good trial of log analysis techniques.

    One team member conducted interviews with a research group in mechanical and industrial engineering and began observing their work practices. This study will contribute to understanding group and individual work practices and information use. It encompasses study of the use of physical materials, electronic information systems and software as well as interaction of people in the work group. This will lead to further development of the concepts of information convergence and floating information, as well as other emerging points of interest, such as how groups manage and share electronic information resources and collaborate across large geographic distances.

    Another team member began work on a study of how researchers use the individual components of journal articles. This study will seek insights into the nature of document disaggregation as well as produce data on usage of DLI testbed capabilities related to searching and viewing, via SGML, the individual components of journal articles. In this initial phase of the study, a preliminary categorization of component use was developed from data from earlier interviews and focus groups, several new interviews on component use, and both usability and transaction logging data on use of SGML features.

    In the third significant study currently under way, another team member continued exploring the organization of people's office spaces (piles of papers on desks, file cabinets, and personal computer desktops) through the lens of everyday or folk classification. Following the analysis of data gathered in visits to peoples offices this past fall, a web site was designed this quarter, inviting people to describe their work sites. This study will encompass both DLI-impacted communities and others. This study will link people's everyday classification of information with web tools, structures, and practices and shed light on federated DLs, broadly conceived.

    Social Science Team members continued their efforts to foster a research community focused on human-centered studies of information systems. One significant contribution in this area was Leigh Star's service as co-organizer, with Rob Kling, of the Social and Organizational Section for an NSF Workshop on Human Centered Computing and Intelligent Systems held in February, 1997. As a culmination of the Allerton Institutes, Ann Bishop is working with Barbara Buttenfield (from UC Santa Barbara's DLI project) and Nancy Van House (from Berkeley's DLI project) to publish an edited volume devoted to human-centered design and analysis of DLs. Authors will be solicited, primarily from Allerton participants, to write chapters (1) serving as case studies of particular DL user needs and evaluation projects, (2) describing relevant theories and methods for human-centered DL analysis, and (3) discussing important social, political, and management issues related to DL design and evaluation.

    RESEARCH

    Recently, the Interspace tem completed the largest ever full-text computation of a semantic index. The input consisted of 19,526 full-text SGML documents. The number of concepts represented as terms (noun phrases) in the collection was 2,225,090, and the number of co-occurrences (pairs of concepts which occur in the same document) was 2,027,940,870. The sparsity of the (logical) matrix can be computed as 2,027,940,870/2,225,090^2 ~= 0.041%. It was thus an extremely sparse computation completed using the cSpace application which was implemented as part of work completed as a Research Programmer with the DLI.

    The cSpace application is a parallel C++ hybrid symbolic/numeric application which determines relationships between terms in a collection of documents. The resulting map between terms is designated a Concept Space and as noted is useful in the refinement of queries presented to the collection. Detailed information on the cSpace algorithms and implementation can be found in Chapters 2 and 3 of the following:

    ftp://ftp.cs.uiuc.edu/pub/dept/tech_reports/1997/UIUCDCS-R-97-1999.ps.Z

    The actual computation took ~24h:40m with a peak memory usage of over 8GB. Of this, ~19h:40m were spent in creating the initial multi-gigabyte sparse data structure which contains the concepts. The remaining ~5h was spent performing the bulk of the computation on 24 processors. This involved computing co-occurrences and similarities, and producing the output Concept Space (~3GB in size).

    Speedup numbers are not yet available for this run: we need ~5-600 hours (estimated) single-processor time on a machine with ~7GB of core. Estimates of speedup can, however, be made based on runs performed on the SGI Power Challenge which achieved 100% efficiency (reported in UIUCDCS-R-97-1999.ps.Z).

    In order to accomplish speedups reported in UIUCDCS-R-97-1999.ps.Z it was necessary to parallelize the outermost loop in cSpace. This loop, however, performs output. Nonetheless, in UIUCDCS-R-97-1999.ps.Z we outline how sequential output can be modeled as an associative append operation. Such operations can be parallelized based on a transformation described in Chapter 2 of UIUCDCS-R-97-1999.ps.Z.

    A second factor is achieving the superlinear speedups reported in UIUCDCS-R-97-1999.ps.Z involved the implementation of a C++ dynamic memory allocator. In http://polaris.cs.uiuc.edu/reports/1511.ps.gz we discuss the process whereby we determined that libraries implementing dynamic memory allocation in C++ on the SGI Power Challenge prevent significant speedup of the application in parallel. We have confirmed this result on the new SGI/Cray Origin2000 architecture as well, and the above results for the DLI Testbed SGML collection are based on a private-memory allocator implemented specifically for the SGI/Cray Origin2000 which sucessfully bypasses vendor-provided dynamic memory allocation routines.

    Further experiments on distributed-shared-memory architectures such as the SGI/Cray Origin2000 and the Convex/HP Exemplar are underway.

    The Artificial Intelligence Group has continued to refine it visual thesaurus techniques and is now able to extract information from more than one aerial photograph. The system has been enhanced to allow generation of visual SOM based on photographic region similarity. A similarity search feature has been added that allows the user to select an image from the SOM and see the associated regions. The user may then select one of those regions and return to the original photograph where the region is displayed in red and all SOM associated regions on that photograph are displayed in blue.

    The AI Group is also exploring the use of the natural language parsing and the UMLS Metathesaurus. Currently the Arizona Noun Phrases and the Arizona application of the Metathesaurus have been used to index a small concept space. Testing will begin in June and July to compare the automatic indexing, natural language parsing, and UMLS tools.

    PUBLICATIONS

    R. E. Orwig, H. Chen, and J. F. Nunamaker, ``A Graphical, Self-Organizing Approach to Classifying Electronic Meeting Output,'' Journal of the American Society for Information Science, Volume 48, Number 2, Pages 157-170, February, 1997.

    H. Chen, Y. Chung, M. Ramsey, C. Yang, P. Ma, and J. Yen, ``Intelligent Agents on the Internet,'' Proceedings of the 30th Annual Hawaii International Conference on System Sciences (HICSS-30), Maui, Hawaii, January 7-10, 1997.

    H. Chen, J. Martinez, A. Kirchhoff, T. D. Ng, and B. R. Schatz, ``Alleviating Search Uncertainty Through Concept Associations: Automatic Indexing, Co-occurrence Analysis, and Parallel Computing,'' Journal of the American Society for Information Science, Special Issue on ``Management of Imprecision and Uncertainty in Information Retrieval and Database Management Systems,'' 1997, forthcoming.

    H. Chen, G. Shankaranarayanan, A. Iyer, and L. She ``A Machine Learning Approach to Inductive Query by Examples: An Experiment Using Relevance Feedback, ID3, Genetic Algorithms, and Simulated Annealing,'' Journal of the American Society for Information Science, 1997, forthcoming.

    H. Chen, T. R. Smith, M. L. Larsgaard, and L. L. Hill, ``A Geographic Knowledge Representation System (GKRS) for Multimedia Geospatial Retrieval and Analysis,'' International Journal of Digital Libraries, 1997, forthcoming.

    H. Chen, Y. Chung, M. Ramsey, and C. Yang, ``An Intelligent Personal Spider (Agent) for Dynamic Internet/Intranet Searching,'' Decision Support Systems,} 1997, forthcoming.

    H. Chen, B. Schatz, J. Martinez, and T. Ng, ``Generating a Domain-specific Thesaurus Automatically: An Experiment on FlyBase,'' Information Processing and Management, 1997, forthcoming.

    H. Chen, A. L. Houston, R. R. Sewell, and B. R. Schatz, ``Internet Browsing and Searching: User Evaluation of Category Map and Concept Space Techniques,'' Journal of the American Society for Information Science, Special Issue on AI Techniques for the Emerging Information Systems Applications, 1997, forthcoming.

    H. Chen, Y. Chung, M. Ramsey, and C. Yang, ``A Smart Itsy Bitsy Spider for the Web,'' Journal of the American Society for Information Science, Special Issue on AI Techniques for the Emerging Information Systems Applications, 1997, forthcoming.

    Yong Rui, Thomas S. Huang, and Sharad Mehrotra, Content-based Image Retrieval with Relevance Feedback in MARS, submitted to ICIP-97.

    Sergio Servetto, Kannan Ramchandran, and Thomas S. Huang, A Successively Refinable Wavelet-Based Representation for Content-Based Image Retrieval, submitted to the First IEEE SP Society Workshop on Multimedia Signal Processing, Princeton, June 1997.

    Sharad Mehrotra, Yong Rui, Michael Ortega-B., and Thomas S. Huang, Supporting Content-based Queries over Images in MARS, submitted to ICMCS97

    L. Neumann, G. Bowker and S. L. Star, "Things Come Together: Information Convergence," submitted to Journal of the American Society for Information Science, February, 1997.

    S. L. Star and A. Strauss, "Layers of Silence, Arenas of Voice: The Dialogues between Visible and Invisible Work," submitted to Journal of Computer-Supported Cooperative Work, April, 1997.

    PRESENTATIONS

    Bishop, Ann P. UIUC DLI Spring '97 Partners Workshop, research presentation: ``Social Science Research,'' April 3, 1997.

    Bishop, Ann P. NSF/DARPA/NASA Illinois site visit, research presentation: ``Social Science Research,'' April 24, 1997.

    Chen, Hsinchun, NSF Knowledge and Distributed Intelligence (KDI) initiative Knowledge Networking Workshop, organizing committee member and session leader, May 8-9, 1997.

    Chen, Hsinchun, NSF/DARPA/NASA UCSB site visit, research presentation: ``The Geographic Knowledge Representation System,'' May 2, 1997.

    Chen, Hsinchun, NSF/DARPA/NASA Illinois site visit, research presentation: ``The Interspace Knowledge Architecture,'' April 24, 1997.

    Chen, Hsinchun, NSF/ITO Grantee Meeting, Session Chair, Washington, DC, April 18-19, 1997.

    Chen, Hsinchun, ``Internet Categorization and Search,'' The NSF Information Technology and Organizations Grantee's Workshop, Arlington, Virginia, April 18-20, 1997.

    Chen, Hsinchun, ``Research from the Illinois Digital Library Initiative Project'' and ``Research from the UCSB Digital Library Initiative Project,'' 5 posters for the 20th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '97), Philadelphia, PA, July 27-31, 1997.

    Harum, Susan. "UIUC DLI Testbed", UIUC Cyberfest, Urbana, Illinois, March 13-14, 1997.

    Johnson, Eric. "Using IODyne as an Indexing Tool", 34th Annual Clinic on Library Applications of Data Processing: Visualizing Subject Access for 21st Century Information Resources, Urbana, Illinois, March 3, 1997

    Johnson, Eric. "IODyne, an Internet Client", UIUC Cyberfest, Urbana, Illinois, March 13-14, 1997.

    Johnson, Eric. UIUC DLI Spring '97 Partners workshop, research presentation: ``IODyne, an Internet client,'' April 3, 1997

    Johnson, Eric. NSF/DARPA/NASA Illinois site visit, research presentation: ``IODyne, an Internet client,'' April 24, 1997.

    McGrath, Robert. NSF/DARPA/NASA Illinois site visit, research presentation: ``UIUC DLI Project Scale-up,'' April 24, 1997.

    Mischo, Wm. "UIUC DLI Testbed", College of Engineering Annual Dinner, Sponsored by AT&T, University of Illinois, Urbana, Illinois, February 12, 1997

    Mischo, Wm. "UIUC DLI Testbed", 34th Annual Clinic on Library Applications of Data Processing: Visualizing Subject Access for 21st Century Information Resources, Urbana, Illinois, March 3, 1997

    Mischo, Wm. "UIUC DLI Testbed", UIUC Cyberfest, Urbana, Illinois, March 13-14, 1997.

    Mischo, Wm. UIUC DLI Spring '97 Partners Workshop: ``UIUC DLI Testbed Updates,'' April 3, 1997.

    Mischo, Wm. NSF/DARPA/NASA Illinois site visit, research presentation: ``UIUC DLI Testbed Updates,'' April 24, 1997.

    Pottenger, Wm. NSF/DARPA/NASA Illinois site visit, research presentation: ``Parallel Optimization of Concept Spaces for DLI Testbed,'' April 24, 1997.

    Schatz, Bruce. UIUC DLI Spring '97 Partners Workshop, research presentation: ``Federating Repositories,'' April 3, 1997.

    Schatz, Bruce. NSF/DARPA/NASA Illinois site visit, research presentation: ``Federating Repositories,'' April 24, 1997.

    Star, Susan Leigh. "The Politics of Classification," Colloquium, School of Library and Information Science, Indiana University, Bloomington, April, 1997.

    Star, Susan Leigh, NSF/DARPA/NASA Illinois site visit, research presentation: ``Toward a language for Convergence in the Workplace and Digital Library: Office Classification Project,'' April 24, 1997.

    Star, Susan Leigh. "Layers of Silence, Arenas of Voice: The Dialogues between Visible and Invisible Work," Science and Technology Studies Program, Rensselaer Polytechnic Institute, April, 1997.

    Star, Susan Leigh. "Invisible Work and Information Technology," Science Studies Program, University of California at San Diego, March, 1997.

    Star, Susan Leigh. "Infrastructure and Learning," to the Technologies in Learning Program, Graduate School of Education, University of Illinois, February, 1997.

    Star, Susan Leigh. "Understanding Infrastructure and Work," Colloquium, Department of Industrial Engineering and Engineering Management, Stanford University, February, 1997.

    Star, Susan Leigh. "Studies of Infrastructure, Libraries and Classification," Colloquium, Graduate School of Education, Library and Information Studies, University of California, Los Angeles, February, 1997.

    PROFESSIONAL ACTIVITIES

    Ann Bishop participated in a grantees workshop sponsored by NSF's Information Technology and Organizations Program. The goal of the workshop, held in Arlington, VA on April 18-20, 1997, was to build on the cross-fertilization of research in ITO and help plan for the future of ITO-related research initiatives.

    Chen, Hsinchun: Member, Journal of the American Society for Information Science, Editorial Board, 1997.

    Chen, Hsinchun: Member, Convening Committee (5 members), NSF, Knowledge and Distributed Intelligence (KDI) Initiative ($50M). Assisting in organizing workshops and planning RFP for the new KDI initiative, 1997.

    Chen, Hsinchun: Member, ACM SIGIR-97 Conference Organizing Committee, 1998.

    Chen, Hsinchun: Member, ACM SIGIR-97 Conference Organizing Committee, 1997.

    Chen, Hsinchun: Member, ACM Digital Library Conference Program Committee, 1997.

    Chen, Hsinchun: Partner, Data and Collaborative Computing Team, NCSA (National Computational Science Alliance), NSF 96-31 Program Solicitation for Partnerships for Advanced Computational Infrastructure, 1996-1997.

    Chen, Hsinchun: Coordinator, ``Semantic Indexing and Interoperability,'' Third NSF/ARPA/NASA Digital Library Initiative (DLI) Workshop, Sponsored by NSF/ARPA/NASA, Ann Arbor, Michigan, May 16-17, 1996.

    Chen, Hsinchun: Reviewer for IEEE Transactions on Systems, Man, and Cybernetics, IEEE Transactions on Knowledge and Data Engineering, IEEE Computer, IEEE Expert, International Journal of Man-Machine Studies, Information Processing and Management, Journal of the American Society for Information Science, Communications of the ACM. 1989-present.

    Star, Susan Leigh, Elected Scholar of the Year (elected by graduate students), Science Studies Program, University of California, San Diego, March, 1997.

    Star, Susan Leigh. National Science Foundation. Ethics and Values Proposal Review Panelist, 1995-1998, continuing service.

    Star, Susan Leigh (with Geoffrey Bowker), Co-organizer, "Opening the Pod Bay Doors: Social and Cultural Perspectives on Cyberspace," Day-long conference in association with Cyberfest '97, at University of Illinois, February, 1997.

    Star, Susan Leigh, Consultant and Group Leader, NSF Workshop on "Human Centered Computing and Intelligent Systems, " February, 1997.

    GRANTS AWARDED

    Department of Defense, Advanced Research Projects Agency (ARPA), ``The Interspace Prototype: An Analysis Environment based on Scalable Semantics,'' $3,309,000, April 1997-March 2000 (subcontract to H. Chen, PI, University of Arizona: $1,078,991).

    VISITORS

    Star, Susan Leigh, hosted a two-day visit from Prof. Charles Goodwin, Program in Applied Linguistics, who gave a seminar on "Vision as Practice" to a well-attended cross-campus group and met with and discussed issue of visualization and practice in the digital library.

    Feb. 6, Scantech Systems, Inc.
    Bob Venable, President
    Mike Smith, Systems Manager

    Illinois State Geological Survey
    Donal Luman, Senior Geologist
    Christ Stohr, Associate Professional Scientist

    Feb. 20, American Institute of Physics (AIP)
    Marc Brodsky, Executive Director and CEO
    Tim Ingoldsby, Director of Product Development
    Peggy Judd, Director of Information Technology and Products
    Darlene Walter, Vice President of Publishing

    Feb. 27, American Astronomical Society (AAS)
    Chris Biesmesderfer
    Archie Warnock

    March 14, State of Illinois
    Robert Kustra, Lieutenant Governor, State of Illinois

    March 15, Advanced Micro Devices
    Jerry Sanders, CEO

    April 1, AT&T Research Laboratories
    Joel Winthrop, AT&T

    April 3-4 Spring Partners Workshop.
    See Appendix A for list of attendees

    April 4, Hewlitt Packard
    Richard W. Sevcik, University Manager

    April 15, University of Pretoria
    Monica Hammes, Academic Information Service, University of Pretoria, Pretoria, South Africa

    April 18, IEEE Computer Society (IEEE CS)
    T. Michael Elliott, Executive Director
    Bob Care, Director, Electronic Publishing
    Mathew S. Loeb, Publisher

    April 21, Getty Information Institute
    Christie Stephenson, Director, Museum Educational Site Licensing Project, Santa Monica, California

    April 28, W.W. Grainger, Inc.
    David Grainger, CEO

     

    Appendix A

    DLI Spring 1997 Partners Workshop, April 3-4, 1997, List of Participants

    Appendix B

    DLI Spring 1997 Partners Workshop, April 3-4, 1997, Minutes