Digital Library Instrumentation Outline
DRAFT - November 1994
Alaina Kanfer
I. What data will be collected?
As a prerequisite for using the alpha version of the Digital Library, all users must agree to participate in the study.
A. Demographics
- At first time use/registration collect the following:
- Name
- Gender
- Age
- Education level
- Occupation
- Geographic Location of primary work site- organization, city, state, country
- Other work sites often used.
- Major area of interest
- If at a university, departmental affiliation(s)
- Primary type of computer used
- A user name, password, and participant ID will be assigned at registration.
- User name, name, contact information and participant ID will be stored in one place and kept relatively closed. A script
will be written so that when someone logs in, this file is accessed, and ID is attached to user log.
- ID and demographics will be stored in a different place and relatively open for analysis
- At each login record:
- ID (obtained from closed file of user name, ID)
- Cyberspace Location- client host IP
- system use
B. Contact Information
- For follow up survey:
- Email address
- Telephone
- Name
- Post Address
C. System Use
- User ID, host IP, date, time, action on a file, file acted upon, [in cases where a word is entered: word entered]
- actions on a file:
- go back to a url
- go forward to a url
- open hotlist and go to url
- open hotlist and delete a url
- open window history and goto url
- open url
- print file
- save file to disk
- annotate file (can annotation capability be expanded?)
- add file to hotlist
- mail file to [if we want to record to whom file is mailed, we need an extra 'field']
- actions on files with words:
- search for (something in file) - record word searched for
- thesaurus (word entered)
- Files can be:
- table of contents of a journal
- contents of an article
- abstract of an article
- body of an article
- references
- tables, charts and other appendices (algorithms?)
- (each of these are stored as separate files, but can be displayed together
- is this optimal in terms of thought processes)
- Look into physical simulations or animations to include- database was mentioned on 10/28 (animation, simulation, video,
sounds?)
**As this list gets finalized, we need to have exact wording for data collection to be included in the instrumentation.
II. How will the data be collected?
- Administrative information will be conveyed to user, and surveys administered from the server.
- Data collection on actual usage will be collected at the client, and sent to a file system, somewhere.
- Two stages:
- Alpha version on relatively small number of desktops to practice data collection & management and to conduct
preliminary analyses which will inform the beta version. Beta version will be installed in some public sites as well as
private desktops and may differ from the alpha version based on previous findings. This way there will be two
distinct deployments of the instrumented systems and usage can be compared at least within version.
- Each individual user logs onto system. The registration program will assign an ID number to the user which will be
used to record the user's activities on the system. The program will ask the user the set of demographic & contact
questions. In a separate file, user information will be kept with the user ID.
- The session will end by user logging off, or with the system remaining idle for some set period of time.
** This is an issue we need to address.
- If the time out is instrumented into Mosaic, it is possible for a user to be logged off while using an external viewer - or at
least interrupted frequently. This is particularly problematic with the use of SGML to view many of the files. If the system
times out, the user should be prompted to please press return if he or she wants to continue the session, before being
automatically logged out.
- While the user is working with the system, a usage file will be kept. When the user logs off, the system will send the usage
file to "DATA STORAGE CENTRAL"
**We will need to give SDG an estimate of timing for these two versions
III. How will the data be processed and stored?
- More questions than proposals now:
- Who will need access to data, in what form, and at what stage of the process?
- How many users are estimated
- How much usage is estimated, (eg. estimated file size)
- For how long will data need to be accessed?
- We need to identify what will be obtained from the usage files so that we can develop automated data analysis tools.
- Although we should provide an estimate of the overall storage space required, the files can be stored infinitely, but not
necessarily accessed for analysis easily when they are in storage.
DL Instrumentation -