Internet Interface
The goal here is to provide a multiple view user interface across the Internet. The responsible programmer is Eric Johnson. Multiple views means that phrases can be drag-and-drop across each individual interface for each information source. The views also integrate together the parts of our project, as they include the TestbedSGML collection, the Research concept spaces, and traditional library A&I thesauri. A complete prototype in Visual Basic was completed this year, integrating all these sources. (concept spaces and subject thesauri from INSPEC for electrical engineering and computer science, an interface to the OpenText engine).Development on the multiple-view interface continues. We have of late addressed some of the stickier client-server issues and modified the database schema accordingly. Design of the thesaurus browser seems to have reached a stable point, and we have reimplemented the thesaurus import software to better construct server databases, allow for error detection and correction, and process the transaction files generated by the thesaurus editor we will implement in February. Efforts towards the end of the year are directed toward demonstrating use of the multiple-view interface on federated databases, with the federation occurring in the client software. We will also make the client software use IP sockets instead of depending on a Novell network. User interface development will concentrate on the search document, to allow the user to construct and edit boolean trees through manipulation of graphic objects. Search documents will be saveable and loadable just like documents in other applications, and users will be able to load and display multiple search documents, cut and paste search terms between them, and drag-and-drop search subtrees between search documents. The short record display on the search document will have various sorting and filtering capabilities, and will allow hierarchical display of the contents of journals. This project has been planned from the beginning with an eye toward cross-platform development, even though so far it only runs on a Windows PC. In doing so we have not used any of the proprietary GUI objects available for use with Microsoft Visual Basic. Now we have Java as a more specific cross-platform goal. Our current plan is to continue design and prototype development in Visual Basic and have a complete implementation in time for the NSF site visit in early March. We will then reimplement it in Java. Reimplementation should be complete by August 1996.
Stateful Gateways
The goal here is to connect transparently to distributed repositories across the Internet. The responsible programmer is Jason Ng. This involves protocol translation across multiple protocols and the saving of the search state. Gateways have been built using CGI and using JAVA, and extensive design done to place these results into the NCSA repository package.A basic prototype was demonstrated in March of how existing Grainger documents (stored in a MS-SQL database) can be put online and accessible through the web. Software was a suite of Perl, shell and C scripts that connect between HTTP and the SQL engine. It was also a rudimentary demo on state, i.e. intermediate results are saved to prevent recomputing. In September, we demonstrated searching across 2 databases (Opentext and SQL) from one search form. User inputs are translated into the respective query language of the search engine. This work underscores the differences between search languages, and emphasizes the need for an Abstract Query Language general enough to support the differences. This work also show that the richness of the search is always limited by the degree of intelligent tagging and indexing of the documents. The demo also showed that the application can "remember" previous user requests, ie maintain state. A user can continue the query the next time he/she accesses the search engine. The first external distributed repository was connected to, via a collaboration with the NSF-funded project by the AAS (American Astronomical Society). We extensively surveyed the available software for the Z39.50 protocol and worked with the NSF Center CNIDR in North Carolina to customize their software (Isite) for use on the web. Effort on the way to link up to ApJ Astrophysical Journal's documents which are accessible through z39.50 software. These stateful gateway efforts are also being embedded into the NCSA Server Repository software (see more below), in particular the Search Module. This is the specific component of the new HTTP design that the DLI work can contribute towards. A work-in-progress draft of requirements for a search module is being put together by me. This would be the basis for a light-weight implementation of a application interface allowing users to customize their web sites to include their favorite search engines.
Distributed Repositories
In addition to the direct implementation for the Testbed, the DLI is also being used as a stimulus to the NCSA Server Group, managed by Beth Frank. This server, still the most popular in the Web, is moving from a simple HTTP server to a Repository Package, with multiple protocols and integrated security. The DLI efforts as above are one of the major inputs to the design and implementation. The NCSA vision of a repository is fairly broad reaching. In November 1995, NCSA officially released version 1.5 of our HTTPd server. The new release was focused on improving security and adding some heavily requested features like virtual hosts. Several scripts were updated or created in conjunction with release 1.5. Most significant are the conversion from script-based imagemaps to internal imagemaps and creation of web based installation scripts. In conjunction with the 1.5 release, we've updated our documentation. We anticipate one more major revision/reorganization phase after which the documentation should remain relatively stable. The export restricted server for distribution of non-exportable source code has been set up. But, due to the loss of a developer, and a change in the PGP protocol, the export restricted code is not yet complete. The PGP protocol was modified to make it more robust and easier to implement. We anticipate making an export restricted version of the server available 1Q96. Continuing work has been done to negotiate with EIT for the right to use their S-HTTP implementation. Discussions with Microsoft have begun to gain access to PCT libraries. Adam Cain, our security guru, has been traveling worldwide presenting web security information at several conferences. Work has begun on our next generation server, currently referred to as a repository. It is currently still in the architecture and design phase.
Semantic Federation across Repositories
We also have longer-term research of two types. The major effort is in information science and systems, as discussed here. The smaller effort in computer science is discussed below. The leads on the information science are Hsinchun Chen at the University of Arizona on algorithms and Bruce Schatz at the University of Illinois (Community Systems Laboratory at NCSA) on architectures.Our ongoing experiment, which involves concept space generation for 5 million Compendex engineering abstracts, is currently based on a 64-processor Convex Exemplar provided by NCSA. This year we generated a concept space for 1000 different community repositories based on the Compendex class codes, which we will be using for vocabulary switching experiments across all of engineering. We also have designed a complete analysis environment, based on concept spaces, called the Interspace, which will be our research vehicle for testing future generations of information systems. Through a just beginning collaboration with the Santa Barbara DLI project, we will be testing images as well as text (textures from maps instead of phrases from documents). Future work also will include generating individualized concept spaces for assisting in user-specific concept-based information retrieval. Results from our research will be incorporated into an operational SGML search interface for the Illinois DLI engineering testbed in 1996. We are also investigating methods by which this semantic retrieval capability might be extended and scaled up to large distributed repositories (the Net or the NII).
Computer Science Research
The computer science research is directed by co-PI Roy Campbell. In the past year, we focused on two issues, distributed naming and retrieval and performance in client-server digital library system. Our main contributions are summarized as follows. A number of papers at major conferences have been presented on this work. * a new implementation of the handle system * a proposal to solve inverse mapping problem * a new object representation and retrieval scheme * a new server scheduling algorithm * a dynamic extensible operating system kernel * a new efficient disk scheduling algorithm * security issues for continuous media in digital library
Social Science Research
The evaluation team is led by Ann Bishop, supported by graduate students in sociology and library science. Our focus in 1995 was on needs assessment and developing programs needed for instrumenting the DLI testbed. In 1996, with the prototype available, we will conduct a number of use and usability studies of the testbed. Our sociological investigations related to the changing nature of information infrastructure figure strongly throughout 1995 and 1996.Most Significant Accomplishments include:--Conducted focus groups with faculty, grads, and undergrads in Computer Science, Engineering, and Physics to gather data on user needs for the DLI (Jan. 1995)--Conducted usability tests for the online thesaurus developed as a potential component of the DLI (March 1995)--Observed users of current online library systems in the Grainger Library to gather data on user needs and capabilities regarding electronic information services (June-Aug. 1995)--Conducted NSF-sponsored Allerton Institute on "How We Do User-Centered Design and Evaluation for Digital Libraries: A Methodological Forum," which attracted over 50 top academic, industry, and government researchers from five countries (Oct. 1995)--Developed plan for instrumenting Mosaic and the DLI testbed, in preparation for automatically collecting data on individual and aggregate use (throughout 1995)--Developed registration program to collect basic data from all DLI users and assure authorized access to the testbed (throughout 1995)
Image Processing Research
A supplementary grant (two years, starting September 1995) was awarded to us aiming at the investigation of issues related to image databases. The lead is Tom Huang in the Beckman Institute. Of special interest are image compression/ representation methods which are suitable for browsing and retrieval, and image content-based indexing and retrieval. A long term goal is to link image databases to the main Digital Libraries Initiative Project's engineering journal database.As an initial testbed we are using subsets of images from the Getty Museum sponsored image databases of paintings and photographs of artifacts. A short-term goal is to build a simple prototype image database system which incorporates novel image compression/representation methods and queries based on image content such as color, texture, shape, and spatial layout. The interplay between image representation and retrieval is to be explored. Internal UIUC funds were used to purchase a SUN SPARC 20 Workstation and hard disks for the sole use of this project. During the last few months we have been writing front-end interfaces for image queries, investigating useful color, texture, and shape features, and preparing the image database (images and associated text materials). In 1996 we plan to put the pieces together to build a simple prototype image database system, which can be used as a framework for future research. Simultaneously we shall search for suitable image databases that can be linked to the main DLI project's engineering journal database.
Publications
How We Do User-Centered Design and Analysis for Digital Libraries: A Methodological Forum [Collected papers from the 37th Allerton Institute, Oct. 29-31, 1995. Compiled by A. Bishop]. URL:http://edfu.lis.uiuc.edu/allerton/95
Cain, A. McGrath, R.E. "Digital Commerce on the World Wide Web'', NCSA Access, Summer 1995 URL:http://www.ncsa.uiuc.edu/Pubs/access/95.2/DigitalCommerce.html
Chen, H. and J. Kim, "GANNET: A Machine Learning Approach to Document Retrieval,'' Journal of Management Information Systems, 11(3): 9-43, Winter 1994/95.
Chen, H., Schatz, B., Yim, T. and Fye, D. "Automatic Thesaurus Generation for an Electronic Community System,''Journal of the American Society for Information Science, 46(3):175-193, April 1995.
Chow, H., Chen, H., Ng, T., Myrdal, P. and Yalkowsky, S.H. "Using Backpropagation Networks for the Estimation of Aqueous Activity Coefficients of Aromatic Organic Compounds,'' Journal of Chemical Information and Computer Sciences, American Chemical Society, 35( 4): 723-728, July/August 1995.
Cole, Tim and Michelle Kazmer. SGML As a Component of the Digital Library. Library High Tech, 13(4): 17-90 , 1995.
Kwan, T., McGrath, R.E. and Reed, D.A."NCSA's World Wide Web Server: Design and Performance'', IEEE Computer, 28(11):69-74, November, 1995.
Lin, C., and Chen, H. "An Automatic Indexing and Neural Network Approach to Concept Retrieval and Classification of Multilingual (Chinese-English) Documents,'' IEEE Transactions on Systems, Man, and Cybernetics, 26(1): 1-14, February 1996.
McGrath, R.E. "Caching for Large Scale Systems: Lessons from the WWW'' D-Lib Magazine, January, 1996. URL:http://www.dlib.org/dlib/january96/ncsa/01mcgrath.html"
McGrath, R.E., Yeager, N. and Cain, A."World Wide Web servers at NCSA'', NCSA Access, Spring 1995. URL:http://www.ncsa.uiuc.edu/Pubs/access/95.1/WWWservers.html
McGrath, R.E. "Performance of Several HTTP Demons on an HP 735 Workstation'', April, 1995. URL:http://www.ncsa.uiuc.edu/InformationServers/Performance/V1.4/report.html
McGrath, R.E. "Comments on Haynes & Company CGI Benchmarks'', November, 1995. URL:http://www.ncsa.uiuc.edu/InformationServers/Performance/CGI/cgi-nsapi.html
Sandusky, R. A Dynamic Repository for Organization Specific Information. December, 1995. URL:http://anshar.grainger.uiuc.edu/dlisoc/dynamic-repository-report.html
Star, S. L. Ecologies of Knowledge: Work and Politics in Science and Technology. Editor. Albany: SUNY Press, 1995.
Star, S. L. " Work and Practice in Social Studies of Science, Medicine and Technology," Science, Technology and Human Values, (1995) 20: 501-507.
Star, S. L. "Infrastructure and Organizational Transformation: Classifying Nurses' Work," (with Geoffrey Bowker and Stefan Timmermans) Pp. 344-370 in W. Orlikowski, G. Walsham, M. Jones and J. DeGross, eds. Information Technology and Changes in Organizational Work. (Proceedings IFIP WG8.2 Conference, Cambridge, England.) London: Chapman and Hall.
Star, S. L. "Steps toward an Ecology of Infrastructure: Problems of Design and Access in Large Information Systems"(with Karen Ruhleder), Information Systems Research, forthcoming.
Star, S. L. "Working Together: Symbolic Interactionism, Activity Theory and Information Systems," in Yrj Engestrm and David Middleton, eds. Communication and Cognition at Work, Cambridge: Cambridge University Press, in press, 1996.
John and Jane Q. Engineer: What About Our Users? [A report by the Social Science Team on the Grainger Library observations and an integrated analysis of all data collected to date] URL:http://www.grainger.uiuc.edu/dli/socintro.htm
Yeager, N., and McGrath, R.E. Web Server Technology: An Advanced Guide for Information Providers, Morgan Kauffman Publishers, in press, 1996.
Yen, J., Chen, H., Ma, P. and Bui, T."An Issues Identifier for Online Financial Databases,'' Proceedings of the International Society for Decision Support Systems Third International Conference, ISDSS'95, Hong Kong, June 22-23, 1995.
Yongcheng Li and Roy Campbell, A Dynamic Priority-based Scheduling Method in Distributed Systems, Proc. of International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA'95), pp.177-187, Georgia, Nov. 1995.
Yongcheng Li, Varna Puvvada, and Roy Campbell, Dynamic Retrieval of Remote Digital Objects, Proc. of ACM Fourth International Conference on Information and Knowledge Management (CIKM'95), pp.182-187, Maryland, Nov. 1995.
Yongcheng Li, Zhigang Chen, See-Mong Tan, and Roy Campbell, Security Enhanced MPEG Player, to appear in Proc. of IEEE First International Workshop on Multimedia Software Development (MMSD'96), Berlin, Germany, March 25-26, 1996.
Technical Reports
Lagoze, C., McGrath, R. , Overly, E., and Yeager, N.."A Design for Inter-Operable Secure Object Stores (ISOS)''. Cornell Computer Science Technical Report TR95-1558. 1995.
Yongcheng Li, See-Mong Tang, Mohlalefi Sefika, Roy Campbell, and Willy Liao, Experience with Scripting Kernel, Technical Report, Department of Computer Science, UIUC, Nov. 1995.
Varna Puvvada and Roy Campbell, Inverse Mapping in the Handle Management System, Technical Report, Department of Computer Science, UIUC, Oct. 1995.
Yongcheng Li, Varna Puvvada and, Roy Campbell, Index Management of the Digital Library Repository, Technical Report, Department of Computer Science, UIUC, May 1995.
Yongcheng Li, The Design of Physical Support for Digital Libraries, Technical Report, Department of Computer Science, UIUC, Dec. 1994.
Yongcheng Li, Issues in Distributed File System and the Digital Library Project, Technical Report, Department of Computer Science, UIUC, Dec. 1994.
Editing Work and Related Grant Activities -
Chen, H. and Schatz, B. Guest editors, IEEE Computer, May 1996 special issue on"Building Large-scale Digital Libraries''.
Chen, H. "Digital Library Concept Space Research,''Featured in Volume 9, No. 2 of Access,High-performance Computing Magazine (Page 17), National Center for Supercomputing Applications, Summer 1995 and Volume 269 of Science (Page 1361, "Off-the-Shelf Chips Conquer The Heights of Computing,'' September 8, 1995.
Chen, H. AT&T Foundation Award in Science and Engineering, 1995-1996.
Chen, H. Principal investigator (PI),AT&T Foundation Special Purpose Grants in Science and Engineering, "Intelligent Internet Resource Categorization and Discovery,'' $10,000, October 1996-September 1996.
Chen, H. Principal investigator (PI), National Center for Supercomputing Applications (NCSA), "Information Analysis and Knowledge Discovery for Digital Libraries,'' High-performance Computing Resources Grants (Peer Review Board), on SGI Power Challenge Array (48 R8000 processors, 3000 SUs), October 1995-September 1996 (IRI950001N).
Chen, H. Co-Investigator (Co-I, PI: N. Strausfeld, University of Arizona), National Science Foundation, Database Activities Program -- Division of Biological Instrumentations and Resources, Database Activities Relating to identifiable Neurons, "FLYBRAIN, The First in a Federation of Databases for Insect Neurology,'' $896,424, September 1995-August 1998.
Chen, H. Co-Investigator (Co-I, PI: J. Yen, Hong Kong University of Science and Technology), Hong Kong Research Grants Council, "Intelligent Agent for the Financial Databases,'' HK$544,000, July 1995-June 1997.
Chen, H. Principal investigator (PI), National Science Foundation, CISE, IRIS, "Concept-based Categorization and Search on Internet: A Machine Learning, Parallel Computing Approach,'' $200,755, September 1995-August 1998 (IRI9525790).
Chen, H. Principal investigator (PI), National Center for Supercomputing Applications (NCSA), "NSF/ARPA/NASA Digital Libraries,'' High-performance Computing Resources Grants, on Cray CS6400 (8 processors) July 15, 1995-Dec 31, 1995 (IRI950003N).
Chen, H. Principal investigator (PI), National Center for Supercomputing Applications (NCSA), "Terabyte Information Analysis and Knowledge Discovery,'' High-performance Computing Resources Grants, on Connection Machine CM-5 (521 nodes), SGI Power Challenge (16 R8000 processors), and Convex Exemplar (24 HP processors), September 1994-July 1995 (IRI950001N).
Chen, H. Principal investigator (PI), AT&T Foundation Special Purpose Grants in Science and Engineering, "An Artificial Intelligence Approach to Creating an Intelligent Group Systems Environment,'' $10,000, September 1994-August 1995.
Presentations
Chen, H, Schatz, B.R. and Lin, C. "Concept Classification and Search on Internet Using Machine Learning and Parallel Computing Techniques, ''World Wide Web 4 Conference, Boston, December 11-13, 1995.
Logoze, C. A Design for Inter-Operable Secure Object Stores (ISOS) Digital Library Initiative All Project Workshop, November 9-10, Santa Barbara. URL:http://www.ncstrl.org/Dienst/UI/2.0/Describe/ncstrl.cornell%2fTR95-1558
McGrath, R. "Scalable Web Servers: Architecture and Caching'', invited talk at IBM T. J. Watson Labs, Hawthorne NY, October, 1995.
Rui, Y, She, A. and T. S. Huang, Automatic Object Segmentation Using Attraction and Rule Based Clustering, submitted to the IEEE International Conference on Image Processing, Sept. 16-19, 1996,Lausanne, Switzerland
Leigh Star was guest scholar at the University of California Humanities Research Institute, University of California at Irvine, November 20-21. Her work was the topic of a two-day seminar on feminism and methodology, with a focus on linking information systems and feminist theory.
Star, S. L. "Infrastructure and Organizational Transformation: Classifying Nurses' Work," Conference on Information Technology and Changes in Organizational Work, IFIP WAG 8.2, University of Cambridge, England, December, 1995.
Star, S. L. Plenary commentator, "Social theory and the study of computerized work sites," (Burn Latter), Conference on Information Technology and Changes in Organizational Work, IFIP WAG 8.2, University of Cambridge, England, December, 1995.
Yongcheng Li, Nov. 1995 at PDPTA'95.
Yongcheng Li, Nov. 1995 at CIKM'95.
Visitors
Nov. 3: Annelise Pejtersen, Riso National Research Lab, Denmark
Nov. 3: Mrs. Lock Thi Xuan, Librarian and Mr. Ling Siew Kheong, Librarian, Ngee Ann Polytechnic, Singapore
Nov. 21: Brian Mittelstaedt, Scientist, Skip Janis, Director for Educational Sales and Marketing, and Debby Markley, Business Research, Digital and Applied Imaging, Kodak
Nov. 3: Marilyn Beamish, Librarian, Griffith University, Australia
Nov. 13 -16: Kent Summers, Electronic Book Technologies
Dec. 4-8: Howard Pell, Manager of Professional Services OpenText
Dec. 18: Susumu Nakagawa, Engineer, Software Development Center, Hitachi, Ltd.
Jan. 25: Darrel Ram, Database Technology Research, Silicon Graphics, nc., David Belanger, Head, Communications Information Systems, AT&T, Eleftherios Koutsofios, Technical Staff, AT&T
Appendix: Status of Publisher Material in Illinois DLI
Partner's Update 1/29/96
1. The first few PCs at the Grainger Engineering Library have been configured to provide direct end-user access to the UIUC DLI testbed. We will be bringing up the DLI client on the rest of the public terminals at Grainger and the other sites around campus over the next week to 10 days.
2. The UIUC DLI testbed database currently accesses, via the public client, approximately 2400 articles from Applied Physics Letters (American Institute of Physics). This includes all APL issues from 1995 excepting Volume 66, issue 2, Volume 66, issue 4, and Volume 67, issue 24. The first 3 APL issues of 1996 have also been indexed.
3. Note that the full-text articles are fetch via HTTP and rendered in Panorama. Users search and retrieve DLI testbed materials through searches of indexes constructed by the OpenText database management system. At Present, users do not see the HTML table of contents pages for the various journals and publications in the testbed. Those HTML routes into the testbed were set up for internal development use and for the use of publishing partners. The testbed index is frequently more up to date than the table of contents pages.
4. We'll be updating and expanding the UIUC DLI testbed index frequently over the next month., In addition to weekly updates of the index to incorporate new issues of APL, we expect to incorporate the 1995 volume of Journal of Computing in Civil Engineering (American Society of Civil Engineers) within 2 weeks. Following that we'll be adding (not necessarily in this order) Physical Review Letters (American Physical Society), Computer (Institute of Electrical and Electronics Engineers, Computer Society), sections of IEEE Letters, sections of IEE Letters, AIAA Journal, and other titles from ASCE and IEEE CS.
5. In bringing up APL and preparing to add in CCE, we have incorporated a metadata approach in indexing author names and affiliations, figure captions, and other display information. We are also in the process of using index aliasing of SGML tag structures, and indexing filters to allow us to search and retrieve using the OpenText Parallel Execution Monitor (PEM) over multiple heterogeneous DTDs. We'll detail that work in 3 separate notes you'll receive over the course of the next 3 to 4 weeks.
6. In preparing to bring up material from IEEE, IEE, and AIAA, we've also made some headway in developing an interim work-around (pending implementation of the SGML catalog or equivalent approach) for handling SGML with embedded TeX instances. We'll detail that in another note to follow soon.
7. We'll also be providing you updates on work being done in regard to online thesaurus development, Web browser implementations of a testbed searching client, and SGML math rendering issues.
Industry Contact and Support: DLI Partners' Workshop, November, 1995
Go back to the DLI progress reports page
