Digital Libraries Initiative Spring 1996 Partners' Workshop
May 2-3, 1996
Minutes

I. SUMMARY

The UIUC DLI project is emphasizing the collaborative effort needed between university researchers and our publishing partners. A large part of the project will be to look at the future roles of publishers, libraries, and authors, and to extend the model of distributed publisher repositories developed in the DLI project. We now have a viable retrieval system and are in the process of setting up a framework for a system of federated repositories The DLI will be in a position to advise our publishing partners in setting up their own repositories and we will continue to work on the research needed for effective indexing, retrieval and display of these collections.

The University of Illinois Library announced its plans for an industrial partnership program to augment the research and development work going on in the DLI project. Because of the amount of time and personnel needed to process and mount new titles and the limited grant resources that can be applied to the Testbed, the DLI needs to receive outside support in order to include more publishers and journals in the testbed. The Industrial Partner support will also allow the DLI project team to extend the testbed past the four year NSF/ARPA/NASA grant period. The UIUC Library hopes to establish a collaborative environment with DLI publishing partners in order to more fully explore the potential of SGML retrieval and display and the distributed repository model. More details regarding the Digital Library Industrial Partners Program will be forthcoming.

Ron Larsen, Head of Digital Projects for ARPA, thanked all the Partners for participating in the UIUC DLI project and urged everyone to work together in order to move forward beyond the four years of the grant. Mr. Larsen spoke of ARPAís fundamental role in research as a 5-10 year opportunity and stressed the importance of making technology available to the community at large. Also critical is that those in Washington hear of what's needed in order to plan for future research. Mr. Larsen would appreciate the involvement of the UIUC DLI partners and encourages those with ideas and/or suggestions to contact him at: (703) 696 2227 .

II. Upcoming DLI milestones:

--Summer '96: Utilizing the OpenText Latitude Web server, we will be providing Web access to the Testbed databases from the Netscape and Microsoft Internet Explorer Web browsers. The Latitude server allows Boolean searching of OpenText databases. Article display will be through the Panorama (SoftQuad) and Adobe Acrobat PDF files.

- Summer--Fall '96: Will produce first Web version from Visual Basic custom client program tested during spring 1996 semester. This version will include thesaurus features and word co-occurrence and enhanced full-text searching features. The web version will incorporate JAVA and/or Active X technologies and will go access the OpenText databases and will utilize HTTP protocols being developed by NCSA.

Fall ë96: NetBill will conduct commercial trials with real goods and publish a progress report. They are currently working on Mac and PC clients and will be extending beta trials for these in the Fall '96, with commercial release in early 1997.

-1997 - further client enhancements, and we will be testing/implementing distributed server models for all publishers that are ready. The UIUC DLI User Evaluation group will conduct extensive surveys.

-August 98 - user population will expand to CIC (big ten universities) and publisher repositories. (distributed repositories, federations)

Distribution of custom software to publishers: -Source code is always available free. We have our custom client available for ftp to any of our publishing partners. Please contact Susan Harum (dli@uiuc.edu) for further information regarding access to the client.

III. SYNOPSIS OF MAJOR DISCUSSION POINTS:

URL's to viewgraphs of speakers.

Bruce Schatz: DLI Overview Viewgraphs

Bill Mischo: UIUC Testbed

Tim Cole: UIUC DLI Testbed Processing Customization

Ann Bishop: Sociology Evaluation

Bruce R. Schatz and Hsinchun Chen: Information Analysis in the Net: The Interspace of the New Millennium Viewgraphs

Chris Deephouse: Overview of NETBILL Home Page

IV. CHRONOLOGY OF MEETING

DLI collaborators: To learn more about the Digital Library Initiative as a whole - see the IEEE Computer Society's May 1996, special issue on Digital Libraries.

To summarize briefly what is being done within the other 5 DLI projects across the country:

Researchers at Stanford University are developing the technologies for heterogeneous repositories to be integrated into a "universal" library. They are sending queries across different search systems using programs like CORBA and wrappers.

The University of California at Santa Barbara DLI would like to provide easy access to a large collection of maps and images, using semantic technology to map concepts instead of matching text and images This kind of research hasn't been done in this subject area before.

Carnegie Mellon University DLI research revolves around establishing and searching a large, online video collection. Progress on NetBill, which we will use a year from now when our testbed is working on the Web, has also been made. NetBill will provide fine grain charging (page by page/members/non-members), and trials for charging. The technology is almost there.

The University of Michigan is working on the location of sources across the network or what used to be called a gateway. User Interface agents will conduct interviews with users to help establish their information needs.

Researchers at the University of California at Berkeley are setting up an electronic library dealing with the California environment such as satellite photos, videos, and remote sensing data. Their automatic processing is of a high quality and utilizes feature recognition. This is a more high volume version of our project for images that makes use of hand markup.

Eric Johnson gave a demonstration of Jason Ng's Java Applets: Demonstrates server-end combination of queries. Java Applets expand functionality of Web browser and provide keyword suggestion. Server-side CGI generates queries for the OpenText server in basement of the Grainger (with testbed documents stored in SGML) as well as queries for the ASJ server in Phoenix, AZ (with documents stored in HTML). These servers use different query languages. CGI binaries also combine short-record results before they are returned to the client. When you access the Web page containing the demo, a Java Applet displays a floating window with text entry fields, checkboxes for selecting databases, and a word wheel. The word wheel is generated from document titles in both databases. If you have both databases selected, search results from both appear in the same window in the Web browser. Choosing an ASJ document fetches the HTML and displays it in various ways within the Web browser; choosing a testbed document displays the document abstract, optionally the figures, and a link to the full document. Selecting this link launches Panorama to display the SGML.

The upshot of all this is that server-end binaries (CGI) can support simultaneous queries to multiple repositories, and client-end services (CCI) can support display of different kinds of documents.

Questions and comments: Demo software doesn't preserve. Questions were raised about location of documents and indexing and metadata. The point was made that when you have remote repositories and the data changes Perl and CGI scripts do the work (formatting) and the client works like a dumb terminal.

Eric Johnson's demonstration of his hypertextual thesaurus (IODyne): Unlike HTML-forms-based retrieval, IODyne really exploits the power of client machines, which today are typically as powerful as servers, and in some cases more so. The name IODyne comes from the dynamic use of "Information Objects" which display different behavior depending on which space they inhabit, also the suffix -dyne meaning a device which combines objects without loss of information (as in heterodyne). IODyne lets the user construct queries abstractly, translating them into real queries when it submits them to repositories.

Questions and comments: Does it scale up? If results are large, the server will return with number of hits with choices of getting all of them, a sample, or canceling the search altogether. It won't work well if you combine 10K hits with 10K hits (as in a set intersection)- you may end up with 8 hits. However, there are provisions in the design for doing such large set combinations serverside. Subject searches depend on A&I services, and if these are not available, Bill's indexing methods so far would be the only service usable within the searching abilities. Due to rapid changes of duties of publishers, users, etc. anyone can publish now and the future of A&I services is uncertain. Not same degree of semantic usage with full text indexing. It would not be difficult to do co-occurrence databases from existing indexes.

Ann Bishop: Focus groups:
http://anshar.grainger.uiuc.edu/dlisoc/home_page.html

Tom Magliery: Report on Math workshop:
http://dli.grainger.uiuc.edu/workshops/minutesmath96.htm

Questions and answers:

What articles are being accessed and for how long? - transaction logs - pulls things out on a general level - how many databases and what level. detailed - can capture keystrokes. SGML structure good for doing instrumentation..

How much effort is going into the end state of using web browsers as client and how much depends on Netscape or alternately, is Panorama going to turn their product into Web aware?

Panorama might have a Netscape plugin in the future. Ole side needs an SGML renderer. We would like to see better competition. Will you be linked into a web aware browser? Even current Panorama version is CCI aware. We're going to do PC and Windows and wait until the infrastructure gets better.

The original proposal projected 100,000 documents with 100,000 users at the end of the four years -has this been scaled back?

We know we will have in excess of 20,000 documents within 4 years. In terms of critical mass - we need IEEE documents for breadth and depth(originally planned on this to make this figure. We will have 1995 and forward of several pubs (IEEE CS, IEE, ASCE, AIP, APS) It will not be 100,000 or pre-1995 documents. We think we can handle a lot of users and have strategies of splitting off publishers on their own servers as it becomes appropriate.

How long will it be until the DLI can give publishers what they need to be a repository?

We would want it to look like single virtual collection and we would maintain and index repository to search in seamless fashion. We will get in touch with Tim Ingoldsby of AIP mid-Summer to give details of what you need, then negotiate standards for structure of SGML so that you can generate metadata, etc. Hopefully by next year we'll be able to set up more repositories.

What obstacles have you had using ASCE and IEEE data?

When we tried to catch images and convert them to GIFs on the first pass our algorithms broke down and we lost a string of text. Once we have a sample up, the question is what is the quality of indexing and is it sufficient. There are a large number of TeX images and we don't know how this is going to scale. We might be able to use typography to decipher citations. We will have samples of Electronic Device Letters up in a couple weeks.

What happens when the grant runs out? Is UIUC proposing to run a centralized indexing service?

The DLI is an integral part of the UIUC Library and we expect to be in the forefront if that means keeping up a centralized repository. People on campus will be used to the testbed by the end of the four years

Is UIUC DLI having discussions with other libraries?

Only a few places are working in the same areas-University of Michigan is an example - and we have good communication with them. The Humanities Text Initiative is providing a body of literature on an OpenText server made available through Panorama. The CIC is working on a chemical database project. Stanford library also is heavily involved with SGML through Highwire Press. They are putting up HTMLversions of magazines.

The terms co-occurrence research - where do you see that going?

We're going to incorporate that in our project very soon. We have large amounts of co-occurrence terms already generated - 5 years each of INSPEC and Compendex with both broad and good coverage. 10 years down the road we expect that concept space switching will intersect across different subject areas.

Go back to the DLI workshop page

DLI Home | Glossary


University of Illinois at Urbana-Champaign Digital Libraries Initiative
Comments to: External Relations Coordinator, Tom Habing
11/20/96