Quarterly Report for 1st Quarter 1996 (Feb-April)
DLI Project, University of Illinois

Federating Repositories of Scientic Literature
Bruce Schatz, PI, schatz@csl.ncsa.uiuc.edu

Testbed

Testbed activities continue to emphasize the development of document servers for our publishing partners. Journals from the American Physical Society (APS), the American Society of Civil Engineers (ASCE) and the Institue of Electrical Engineers (IEE) were added to the testbed and are now available from the top interface in multiple at the Grainger Engineering Library. A remote terminal was also set up in the Beckman Institute for a small campus Physics research group.

The testbed team incorporated the OpenText Parallel Execution Monitor for searching multiple journals at a time. Tags from different publishers were normalized to allow a consistant search across different journals and access to the interface was enhanced to give users the option of searching one or all of the journals currently on line. Scripts were developed to generate and process metadata.Word Wheels containing every word occuring in the collection of articles accessed by a user are being created and processed for incorporation into the interface. These will help prevent spelling errors by users, and will avoid loss of time due to empty searches.

The usability group met with the programmers of the interface and presented problems their users encountered during usability studies. Changes are being made to resolve problem areas. A copy of the interface in it's current state was sent to the American Institute of Physics (AIP).

A client to unpack the SGML, gif, catalog, and entityrc files was developed, integrating PGP security with unpack and ole clients, Panorama, Ulead viewer.

To address permissions and user authenticication, we have implented standard HTTP IP address checking and unencrypted (uuencoded) login/password security on HTTP gateways and documen resitory servers. We've also implemented standard unencrypted UNIX login/password security on index servers -- maintaining separate accounts and groups for gateways, in-library clients, on-campus clients, and off-campus clients (currently limited to sponsor agencies and publisher partners).

Internet Interface

February was spent implementing the first demonstration of the IODyne (Information Objects Dynamically) information retrieval client. This involved defining repository configuration file formats, the Abstract Retrieval Protocol (ARP), including abstract query generation and conversion to "real" query languages, abstract representations of query results, conditions for client-side vs. server side set combination, and standard methods required of all repository proxy and Abstract Retrieval Gateway (ARG) implementations.

In general, all of the code dealing with specific protocols and query languages of earlier deomstrations was eliminated, and replaced with generalized code controlled by configuration parameters.

Other improvements included eliminating the sequential files required for keyword file scroll indexes (*.sdx files). All data required for keyword lists is now integrated into the KWIC/KWOC SQL databases, making server-side maintenance cleaner and runtime access faster.

The actual running implementation demonstrates how the user can, with no knowledge of the actual protocols and query languages involved, connect to more than one repository and query them simultaneously using IODyne. The two demonstration repositories are on an OpenText server and a SQL server, and use completely different protocols and query languages. IODyne also statelessly and modelessly integrates use of keyword lists, keyword-in-context lists, and thesauri with the retrieval interface. It demonstrates a seamless drag-and-drop environment in which the user can put information objects into different spaces to make them do different kinds of things, such as term suggestion and bibliographic retrieval. In bibliographic retrieval, the user can build boolean trees of information objects, in effect constructing boolean searches.

Distributed Repositories

Adam has been discussing funding from USPS (Unites States Postal Service)

Chris Dunlap (RA) has joined the group. He's working primarily on PGP.

Carlos Pero left, and we have not yet been able to replace him (he was doing the CGI support mail)

Chris Gibrich has been handling the httpd support mail, he as also installed a new search engine (see http://www.excite.com/) We have had several discussions with Bob Dow of Architext Sofware, and are in the process of setting up a cooperative agreement about the use of the free version of their search engine.

Scott Powers has replaced Stan Guillory on the server team (This happened at the end of Dec, and may have been included in last quarter's report.) Represented NCSA at the IETF meeting.

Micheal Shapiro decided not to renew his IPA, and has left the server group. We have not yet replaced him. We may not be able to replace him without more funding. (Joseph said go ahead, but Jae said she doesn't think there's enough money, so it's still up in the air.)

Brandon Long released 1.5.1 this quarter, and has primarily worked on fixing bugs, responding to CERT advisories, and putting out fires. He also did our evaluation of FastCGI and helped Scott Powers develop our position on the CGI/API standardization issues.

Matt Holiman, undergrad programmer, has left. We have not yet been able to replace him.

We submitted a proposal in response to ARPA BAA96-07. The Digital Libraries Initiative (DLI) NSF site visit was held in Mid-March.

Work on the next generation server (2.0) continues. We plan on having a demo in May.

(I don't know if this is appropriate for the QSR - but I've set up several separate protected HyperNews Domains on the SDG server including areas for W3C2 discussions.)

User Evaluation/Social Science Research

Social Science Team Quarterly Report
May 1, 1996

During the previous quarter, the Social Science team continued its work in user needs assessment, research tool development, usability testing, targeted sociological studies, and fostering the development of a community of researchers committed to social studies of digital libraries. Specific contributions to the Digital Libraries Initiative are outlined below, as are plans for the next quarter. Particular analytic threads that link past and current Social Science team work include: the nature of journal use; problems with electronic systems (especially semantic retrieval); changing patterns of information seeking and use behavior; the context for the use of federated online repositories, including the manner in which individuals organize and classify the material in their offices; the nature of large-scale infrastructure development; and changing relationships between knowledge creators, consumers, and intermediaries. For more information about the activities outlined here, see the activity summaries presented in the March- April 1996 section of the Social Science team's homepage (http://anshar.grainger.uiuc.edu/dlisoc/home_page.html).

Needs Assessment

In order to explore the user context for federated repositories of scientific literature, the Social Science team began an in-depth study of researchers in Physics. Beginning in late January, semi- structured interviews were conducted with five people belonging to a small campus research group working in the area of general relativity and the behavior of black holes. The physicists were asked to describe their work flows and practices, how they find information, why they need information, and how they use libraries and computer-based tools. They also described in detail one to two recent incidents in which they sought and used journal material; these data are being analyzed to develop a more specific picture of, on the one hand, the clues researchers bring to library systems and, on the other, the cues in documents and retrieval systems that researchers need to complete their searches.

Research Tool Development

The Social Science team has completed the development and pretesting of an online user registration form that will be used to limit testbed access to legitimate users and to collect demographic data from them (see http://anshar.grainger.uiuc.edu/dlisoc/rfpt.html). Also completed this quarter was the specification of the format for instrumenting the testbed to log all usage transactions that occur. The Social Science and Testbed teams are currently working together to integrate the registration form and instrumentation programming into the testbed itself. In addition, a survey instrument for collecting user needs data from additional members of the Physics research community at UIUC has been designed.

Usability Testing

In February, the Social Science team conducted cooperative evaluation sessions of the DLI prototype with ten Physics researchers. This series of usability tests uncovered 31 specific problems with prototype functionality, interface design, and ease of use. A summary report was prepared for and discussed with the Testbed team and a number of the user-identified problems have already been corrected. This first step also allowed the Social Science team to begin work to design strategies for collecting user feedback aimed specifically at the design and evaluation of mechanisms for user support.

Targeted Sociological Studies

Laura Neumann and Leigh Star have written a paper on the process of building infrastructure, based on their earlier study of DLI development at the University of Illinois, which they plan to submit for the upcoming Participatory Design conference. Star has also completed an informal prospectus for sociological studies that the Social Science team would like to undertake in the future, perhaps in collaboration with other researchers. This set of studies will focus on how people use digital material in their workplaces, what difference it makes to the work organization, and implications of "folk classifications" for creating and maintaining collections of digital document representations. Neumann has begun a targeted study of the information flows of the Physics research team by following up the interviews described above with ongoing observation at the Relavitity group's student office.

Development of a Research Community

The Social Science team has continued its commitment to fostering the development of a community of researchers committed to social studies of digital libraries. To encourage the sharing of knowledge across the six DLI projects during the upcoming synchronization meeting in Ann Arbor, Ann Bishop is organizing a preconference seminar, a plenary session, a working group breakout session, and a demo/discussion session in which all DLI social science researchers have been invited to participate. Bishop also helped to organize a workshop on User Needs Assessment at the ACM Conference on Digital Libraries in March 1996; Star served as a leader for this workshop, delivering the session on ethnography. Bishop submitted a proposal to NSF to sponsor the second Allerton Institute on user- centered design and evaluation in digital libraries, to be held in October 1996. Star and other members of the Social Science team have continued their work in planning for this conference, which will provide an interdisciplinary forum for researchers engaged in social studies of digital libraries. Star attended an NSF-Sponsored invitational workshop at the UCLA Department of Library and Information sciences in February, where she presented a discussion paper and worked on a national agenda for research in social contexts and impacts of digital libraries. Star and Bishop began working with Nancy Van House (UCLA) and Barbara Buttenfield (Santa Barbara) on the development of a monograph which will be devoted to user needs assessment for digital libraries.

Plans for the Next Quarter

During the next quarter, the Social Science team will conduct usability tests of the next version of the DLI prototype, including the online documentation portion, which is currently under development. The online user registration forms and transaction logs should become operational in the next quarter, allowing preliminary analysis of the resulting user and usage data. In addition, a new type of system instrumentation will be added: targeted surveys and user comment buttons. Analysis of user needs will continue with the study of user interaction logs, which are being manually maintained by librarians to record their interactions with patrons using the DLI testbed in the Grainger Library. A user needs survey for Physics researchers that will serve as a follow-up to the interviews conducted with members of the Relativity research group will be conducted; targeted study of the Relativity group, through ethnographic observations, will also continue. Finally, the Social Science team will continue planning for the 1996 Allerton Institute and for initiating a new set of sociological studies related to the organization and use of digital material in the workplace.

Publications

Bishop, Ann P. (Guest Editor). SIGOIS Bulletin, Special Issue on Digital Libraries, Volume 16, Number 2 (December 1995), 2-44.

Gross, Benjamin, "Preserving and Securing the Electronic Record'' Finding Common Ground Harvard College Library, Cambridge, MA, March 30-31 1996, in proc. http://interspace.grainger.uiuc.edu/~bgross/pubs/preserving.ps

H. Chen, C. Schuffels, and R. Orwig, "Internet Categorization and Search: A Machine Learning Approach, ''Journal of Visual Communication and Image Representation, Special Issue on Digital Libraries, Volume 7, Number 1, Pages 88-102, March 1996.

H. Chen, J. Martinez, T. D. Ng, and B. R. Schatz, "A Concept Space Approach to Addressing the Vocabulary Problem in Scientific Information Retrieval: An Experiment on the Worm Community System,'' Journal of the American Society for Information Science, Volume 47, Number 8, August 1996, forthcoming.

H. Chen, B. R. Schatz, T. D. Ng, J. P. Martinez, A. J. Kirchhoff, C. Lin, "A Parallel Computing Approach to Creating Engineering Concept Spaces for Semantic Retrieval: The Illinois Digital Library Initiative Project,'' IEEE Transactions on Pattern Analysis and Machine Intelligence, Special Issue on Digital Libraries: Representation and Retrieval, 1996, forthcoming.

H. Chen, B. R. Schatz, T. D. Ng, and M. Yang, "Breaking the Semantic Barrier: A Concept Space Experiment on the Convex Exemplar Parallel Supercomputer,'' submitted to IEEE Parallel and Distributed Technology, 1996.

H. Chen, B. R. Schatz, T. D. Ng, and M. Yang, "Large-scale Digital Library Analysis Using Supercomputers: An Experiment on CM-5, SGI Power Challenge, and Convex Exemplar,'' submitted to IEEE Computer, Special Issue on Applications for Shared-Memory Multiprocessors, 1996.

H. Chen, J. Martinez, A. Kirchhoff, T. D. Ng, and B. R. Schatz, "Alleviating Search Uncertainty Through Concept Associations: Automatic Indexing, C-occurrence Analysis, and Parallel Computing,'' submitted to Journal of the American Society for Information Science, Special Issue on ``Management of Imprecision and Uncertainty in Information Retrieval and Database Management Systems,'' 1996.

Ignacio, Emily, et al. Usability Report, February 1996. [http://anshar.grainger.uiuc.edu/dlisoc/usability.rept.html]

Johnson, E., Schatz, B., Cochrane, P. and Chen, H. "Interactive Term Suggestion for Users of Digital Libraries: Using Subject Thesauri and Co-occurrence Lists for Information Retrieval." Proceedings from Digital Libraries '96: 1st ACM International Conference on Research and Development in Digital Libraries, March 20-23 1996 in Bethesda, MD.

C. Lin and H. Chen, ``An Automatic Indexing and Neural Network Approach to Concept Retrieval and Classification of Multilingual (Chinese-English) Documents,'' IEEE Transactions on Systems, Man, and Cybernetics, Volume 26, Number 1, Pages 1-14, February 1996.

B. Schatz and H. Chen, ``Building Large-scale Digital Libraries,'' IEEE Computer, Special Issue on ``Building Large-scale Digital Libraries,'' May, 1996.

B. Schatz, B. Mischo, T. Cole, J. Hardin, L. Jackson, A. Bishop, L. Star, P. Cochrane, and H. Chen,``Digital Library Infrastructure for a University Engineering Community: Towards Search in the Net via Structure and Semantics,'' IEEE Computer, Special Issue on ``Building Large-scale Digital Libraries,'' May, 1996.

Star, Susan Leigh. (Staff editor). Special double issue of Computer- Supported Cooperative Work on Computer Mediated Communication and Small Groups with guest editors Joseph McGrath and Holly Arrow. Appearing in Spring 1996.

PRESENTATIONS

Bishop, Ann P. "Engineers and Scientists on the Net" [Presentation to the Manufacturing 2002 Colloquium Series sponsored by the University of Texas at Austin's Engineering School, Austin, Texas, March 1996].

Johnson, E., Schatz, B., Cochrane, P. and Chen, H. "Interactive Term Suggestion for Users of Digital Libraries: Using Subject Thesauri and Co-occurrence Lists for Information Retrieval." Digital Libraries '96: 1st ACM International Conference on Research and Development in Digital Libraries, March 20-23 1996 in Bethesda, MD.

Laliberte, D. "NCSA Repository Research". CNRI Repository Interfaces Workshop, March 11-12, 1996, Reston, VA .

Star, Susan Leigh. "Problems of Learning and Communication in Establishing an Information System for Scientists." [Colloquium presentation to Education Policy Studies, College of Education, UIUC]

Star, Susan Leigh. "Slouchng toward Infrastructure." [Discussion paper presented at the NSF-Sponsored invitational workshop on Social Impacts of Digital Libraries at the UCLA Department of Library and Information Sciences, Feb. 1996]

Star, Susan Leigh."User Needs Assessment: Ethnography." [Session presented at the User Needs Assessment Workshop at the ACM Conference on Digital Libraries, Bethesda, Maryland, March 1996].

Star, Susan Leigh. "To Classify is Human." [Keynote address, ACM Hypertext '96, University of Maryland, Bethesda, March 1996].

PROFESSIONAL ACTIVITIES

Chen, H. Guest editor, Journal of the American Society for Information Science special issue on "Artificial Intelligence Techniques for Emerging Information Systems Applications,'' 1997.

Schatz, B. and Chen, H. Guest editors, IEEE Computer May 1996 special issue on "Building Large-scale Digital Libraries''.

RECOGNITION, AWARDS, AND GRANTS

Featured in Volume 5, Number 15 of HPCWire ("Digital Library Initiative Tackles Grand Challenge of Information Science,'' April 12, 1996).

Chen, H. AT&T Foundation Award in Science and Engineering, 1995-1996.

Chen, H. Co-Principal investigator (Co-PI, PI: B. Schatz, University of Illinois), ARPA, BAA 96-06, "Breaking the Semantic Barrier: Protocols and Environments for Scalable Domain-Independent Concept Spaces,'' $3,000,000, June 1996-May 1999. (pending)

Chen, H. Principal investigator (PI), NSF, CISE, "Supplement to Alexandria DLI Project on Semantic Interoperability,'' $100,000, June 1996-May 1998. (pending)

Chen, H. Principal investigator (PI), National Center for Supercomputing Applications, "Information Analysis and Visualization for Business Applications,'' $35,000, May 1996-December 1996. (pending)

Visitors

Feb. 1, Ralph Sanchez, Senior Systems Engineer; Tom Stemen, Corporate Account Representative, Microsoft

Feb. 22, Rob Bouzon, Hewlett Packard

March 5, Dick Lampman; Paul Bemis; Gallagher, Hewlett Packard

April 2, Mr. Hatsuhito Mitsuhashi, Computer Center for Agriculture, Forestry and Fisheries; Mr. Chuiti Tanaka, Library and Information Section, Tohoku National Agricultural Experiment Station; Takanori Hayashi, Ministry of Agriculture, Forestry and Fisheries in Japan.

April 10, Mike Levell, Vice President, Hewlett Packard

Go back to the DLI progress reports page

DLI Home | Glossary


University of Illinois at Urbana-Champaign Digital Libraries Initiative
Comments to: External Relations Coordinator, Tom Habing
10/15/96