| Index
Site Visit and Quarterly Reports
send comments or questions to: l-neuma1@uiuc.edu |
Digital Libraries and Knowledge Disaggregation: The Use of Journal Article ComponentsAnn Peterson Bishop Digital Libraries Initiative From DL98: Proceedings of the 3rd ACM International Conference on Digital Libraries. New York, NY: ACM. ABSTRACTA scientific journal article is comprised of standard components, such as author names, an abstract, figures, a bibliography, and sections describing methods and results. With the creation of digital documents and new tools for manipulating them comes the ability to facilitate the disaggregation of journal articles into separate components. This paper describes how article components are identified, mobilized, and used by students and faculty members, based on the preliminary analysis of data collected through focus groups, workplace interviews, transaction logging, and usability testing associated with the University of Illinois Digital Libraries Initiative project. The paper presents a schema of component use purposes, discusses the intellectual and physical processes of component use, identifies several issues and implications for digital library design, and highlights the need for multiple methods in studying document disaggregation. KEYWORDS: User studies, documents, information seeking and use INTRODUCTIONThis paper explores the nature of journal article disaggregation in the work practices of researchers and considers implications for both knowledge construction and digital library (DL) design. Journal article disaggregation refers to the ability to access and manipulate individual components of a document, such as its figures, conclusions, or references. This paper draws on data gathered in the NSF/DARPA/NASA Digital Library Initiatives (DLI) project at the University of Illinois [1] to investigate how article components are identified, mobilized, and used in the work of researchers (i.e., students and faculty members). The DLI testbed contains the fulltext of recent articles from over 50 science and engineering journals. One innovative aspect of our DLI testbed is its capacity, through SGML and enhanced search features, to support retrieval of newly foregrounded document components: information in individual parts can be disaggregated from the surrounding textual package and retrieved for use in a way not possible with traditional bibliographic retrieval systems. Through DeLIver, the web interface to the DLI testbed [http://dli.grainger.uiuc.edu/ deliver.htm], researchers at the University of Illinois can search for terms in particular document components (e.g., MIT in the author affiliation, spectrum in a figure caption) to make searching more precise. They can also view certain components that have been extracted from the complete document. Currently, users have the option of viewing the abstracts, figures and tables, bibliographies, and author affiliations before viewing the fulltext of the document. As part of its user-centered design and evaluation work, the University of Illinois DLI Social Science Team has sought to develop an understanding of researchers' work and communication practices, with a focus on the use of journals and digital infrastructure. Focus groups, workplace interviews with individual researchers, transaction logging on the DLI testbed, and usability testing are building a picture of how article components in science and technology journals are used by faculty and students. This paper presents preliminary findings in two areas. First, it presents a schema describing the purposes of component use. Second, it describes the process by which document components are extracted by researchers and then eventually incorporated into their own writing. The paper also highlights several DL design considerations based on study findings, identifies issues related to document disaggregation and knowledge creation, and discusses the contribution of multiple methods to understanding component use in both online and offline environments. BACKGROUNDDocuments shape and are shaped by social activities [2]. Document genres (e.g., journal articles, office memos, romance novels, email messages) are social constructions in that they represent conventional forms imposed upon information [3,4]. A particular genre brings together a prescribed form with a particular set of functions related to work practices [5]. The scientific journal article as a genre has remained relatively stable for several hundred years. It exhibits a conventional structure and is embedded in social and intellectual activity [6, 7]. The use of journal article components, then, can be viewed as a behavior that is embedded in work practice and social norms and acts upon a particular genre. The genre itself implies standard components in a predictable order -- the components, in fact, reflect work practices associated with the research reported in the document and are designed to facilitate the communication of that research to others. Journal articles exhibit two types of structure. The organization of the standard scientific journal article into narrative sections ostensibly provides a description of how the reported work was done, moving from the review of background literature to methods, results, and conclusions. A standard structure is also visible in the different segments of the physical item, e.g., abstract, text, figures,and bibliography. People have, of course, always mobilized individual article components. Individual researchers develop different techniques for identifying, extracting, annotating and compiling pieces of what they've read as they work toward creating a document that reflects their own ideas and work. Quoting particular passages in their own work and photocopying only the bibliography are common examples of how researchers disaggregate paper articles for use. Rayward's description of Paul Otlet's "monographic principle" (which Otlet developed around 1920) suggests that the vision of disaggregating documents into individually usable components is not new: "The indexing of materials could be at any level of detail: the whole document, chapters, sections and so on, down to 'facts' which the indexing process would 'detach' from the text of a document" [8]. Commonplace books, which were especially prominent during the Rennaissance, represent one tradition of compiling information for subsequent use that has been extracted from existing sources [9, 10]. Commonplace books are similar to scrapbooks or notebooks in that they are comprised of bits of material compiled from other sources. They were used by teachers, philosophers, clergymen, poets, and others as a source of pertinent material to incorporate into their own talks and writing or otherwise apply to their own activities. The University of Illinois library holds, among other examples of the commonplace tradition, a cookery commonplace book, a literary commonplace book created by Thomas Jefferson, and a collection of dance step notations by an 18th century dancemaster. Some commonplace books were intended for publication and use by others; others, like Jefferson's, appear to have been created for personal use. Some commonplace books contain only excerpts from the source documents, while others include annotations of one sort or another. With the advent of digital texts and new tools for manipulating them, the fundamental concept of the document is up for grabs [11] and attention is beginning to turn to the social arrangements surrounding the creation and use of electronic texts. Electronic journal articles and other digital documents are increasingly seen as interfaces through which creators and users of knowledge interact [12, 13]. The malleability of digital documents is fundamental to the articulation by Paepcke [14] of the concept of "information compounds," or documents--like commonplace books--constructed from pieces of previously retrieved information. Following an empirical investigation of the information practices of technical workers, the author concluded that malleable documents aid the structuring and retrieval of information for group use and also improve incorporation of information into one's personal knowledge repository. Malleability also allows the creation of texts whose appearance and content may vary for different audiences. In describing Perseus (a digital library for Greek scholars that contains texts, lexicons, images, maps, etc.), Crane proposes: "What better contribution could a scholar make than an article which could . . . provide a clear, but vivid argument to the tenth grader but which, if unraveled, could provide the rigor demanded by the most crusty specialist?" [15, p. 9]. The multivalent documents developed as part of the Berkeley DLI project are based on technologies that foster the ability to create and use different document layers [16]. The provision of different layers underlying the Perseus and Berkeley DLI projects is conceptually similar to the ability to view and manipulate selected components -- one is able, as either an author or reader, to create a layer whose components are unique to a particular situation. A recent addition to the suite of Berkeley multivalent document tools facilitates this process by allowing the user to "fold" a digital document so that only the relevant pieces are visible [17]. This type of malleability is also fostered by Standard Generalized Markup Language (SGML), which permits documents to be treated as structured data sources whose separate components may be individually identified and used. The assumption that document components can and will be indexed is something to keep in mind as we look at how SGML is implemented and used by individuals and groups in various settings. The retrieval of data from within documents is similar to selecting documents from within a collection and identifying appropriate collections to search, but it brings a unique set of difficulties and opportunities [18]. Little is known about how the individual components of articles are used in the work of researchers. Studies of scientific and technical information seeking focus primarily on identification and use of documents. Nor have the implications of the potential fragmentation of knowledge due to the disaggregation of documents been fully explored. Maurice Line presents an early critique of the notion of compiling "info bricks"--the extraction of individual facts and ideas as separate units from a text--noting potential negative implications of trying to pull out and use bits of information from documents in a manner that divorced the individual pieces from their original context [19]. PURPOSE OF JOURNAL ARTICLE COMPONENT USEDuring the needs assessment phase of our DLI research, we held three focus groups [20], one each with faculty, graduate students, and undergraduates. Each focus group combined participants from physics, computer science, and electrical engineering [21]. In this set of interviews, participants discussed how they found and used journal articles. A number of comments were made about individual article components in the retrieval and subsequent use of journal material. We also conducted a set of semi-structured interviews with six members of a particular research team in physics. One section of these interviews elicited a critical incident report [22] of why and how particular journal articles were sought and used. The results of these two data collection activities were reviewed to identify and categorize instances of component use. Based on these preliminary findings, five categories of purpose in component use, as outlined in the schema presented below. But the nature of disaggregation is complex. A particular component may be used for different purposes, e.g., author affiliation may be used to either identify documents or to assess the relevance of documents retrieved. The same component may also be used in somewhat different ways for the same basic purpose. For example, a professor used figures as a synopsis of a paper to assess its relevance, while an undergraduate used figures to gauge whether reading the paper would hold his interest, another type of relevance assessment. 1. Finding relevant documents I was looking for an alternative model for storing stuff in neural nets... I got a bunch of abstracts and picked out truly related articles of the original author I knew about. Then I took the relevant articles and cross-checked the coauthors to see what they had their names on. I discovered the same names kept cropping up--a group of collaborators--a common situation. I also found "hot spots" of research: MIT, UCSD, Santa Fe. It's important to search on institutional affiliation of the author; research groups usually clump in places. 2. Assessing relevance before retrieving or reading an article Many of the examples given highlighted the importance of non-topical relevance criteria [23]. For example, relevance assessments can depend on the perceived authority of the retrieved document. One person commented that checking first to see who was cited in a paper revealed something about the degree to which the paper's author had a firm grasp on the topic in question. The author's affiliation also grants authority; sometimes decisions to read a paper are based on the prestige of the author's home institution. A graduate student commented that reading an article's introduction revealed whether the author's perspective on the topic matched the student's particular need, another factor in assessing relevance. Interest level was also determined by scanning individual components. An undergraduate, for example, claimed that "some tables get me to read an article... words are boring, but pictures get my interest." 3. As a document surrogate after retrieval 4. To provide needed bits of information I look for specific surface tensions, experimental measurements. I recently looked for the best efficiency for an electric motor [...] and had to just search the entire database for the term 'electric motor.' You can spend hours looking this way. I sometimes want to look specifically at others' methods and theories. I often need multiple copies of a specific piece, like a table, for class. As these examples illustrate, at times the needed piece of information corresponds to a component from the document's physical structure (e.g., a figure), while at other times the needed information is associated with the narrative structure of an article (e.g., an experimental measurement reported in the findings section). 5. Conveying knowledge not easily rendered by words COMPONENT USE AS REVEALED IN TRANSACTION LOGS AND USABILITY TESTSAccording to transaction logs recorded during the first year our DLI testbed was deployed, only six testbed users (about 5%) took advantage of the ability to search for terms in individual components of articles--such as author names, titles, author affiliations, figure captions, or section headings--as opposed to searching the fulltext of an article. Most of the component searching was for authors. Users appeared to incorporate cited authors and personal names in the fulltext of an article in order to broaden their author searches, though some seemed to be just trying to figure out how the system worked. One user tried multiple searches on an author's name using each of several combinations of initials and the last name. When no hits resulted, the user tried invoking the online hints feature and then marked the checkbox to search for occurrences of the author's name in the fulltext of an article. Another user toggled back and forth between looking for an author's name as an article author versus the author of a cited article. That user then invoked hints and never actually executed a search. It appears that no one has used the DLI testbed to view individual components from the short record format, i.e., no one looked at the available lists of author affiliations, cited works, or figures and tables. Similar results were reported by the CORE project, which developed an online collection of chemistry journals: no one used that system's ability to search for terms in figure captions or the text of tables [25]. Given the suggestions from interviews and focus groups that component searching and viewing would indeed be valuable, this lack of use is puzzling. In fact, an early version of the DLI testbed displayed the list of table and figure captions associated with each article's short bibliographic record, but did not allow tables and figures to be viewed without first calling up the fulltext. In usability tests (conducted according to guidelines presented in [26]) of this version of the testbed, several users complained that they wanted to be able to retrieve tables and figures before viewing the complete article. From our usability tests [27], it appears that lack of use is due in part to not noticing these features on the testbed interface or not knowing how to use them. Several users had no problem finding and using either the checkboxes that allowed component searching or the displays for component viewing, when given a task that specifically required component searching and viewing. But other users did not employ component searching for tasks such as "Find all articles in which C. J. Summers appears as both an author and someone cited in the bibliography"; rather, they would just search for C. J. Summers anywhere in the article. One such user remarked that he preferred to make the search as broad as possible, "just to be safe." Others appeared not to have noticed the component searching and viewing checkboxes; one recommendation stemming from the usability tests was to make these more prominent. While we need additional data to explore more fully the lack of use of component searching and viewing, I think this is not simply an interface design problem. It appears that some people don't use these features not because they are unaware of them or can't figure out how to use them, but because they do not expect them and thus are not inclined to notice them, let alone to seek their availability. Because these capabilities are novel and people are not familiar with their use, it appears that it may be difficult to even recognize instances in which their use would be helpful. Even if the component search and viewing features are noticed and their potential utility grasped, some people--naturally enough--may simply not trust a computer feature that from their point of view is untested. DOCUMENT DISSAGGREGATION: NOTES ON PROCESSFour interviews conducted in Summer and Fall 1997 have focused on the process of using components to identify documents, read them, and incorporate them into one's ongoing work. These semi-structured interviews were conducted with three graduate students (called here Sara, Steven, and Dave) and one faculty member (Dr. Lane) who work together in the area of human-computer interaction. Each interview lasted approximately 1 1/2 hours and was held in the participant's office or lab. The participants were asked to describe how they searched for journal articles, to describe how they read articles and incorporated them into their own work (by focusing on one or two specific articles the subject had at hand and identified as being important), and to discuss how they managed their personal collections of articles. They were also explicitly asked to describe how they used article components and to comment on how a DL might foster component use. Findings from the earlier focus groups and individual interviews were more oriented toward pinpointing types of use, rather than rendering the overlay of physical and intellectual practices that occur in the use of document components. People read to learn, to figure something out. While they read and think, they are doing something. Aside from turning pages or clicking, they are making notes, annotating, sorting papers into piles. The reading, thinking, and doing are intertwined. The process that the people in this set of interviews described, at its broadest, was the movement toward developing, from previous work, one's own understanding, encapsulated in one's own documents. As one interviewee put it: "coming up with my ideas in my own language." (This formulation echos Ong's description of commonplace books as "organized trafficking in one way or another with what is already known" [10, p. 151].) This intellectual process is accompanied by physical practices related to a stream of documents. One begins by identifying and reading a source document and presumably ends with the production of a document representing one's own work. For two of the graduate students interviewed (Steven and Dave), the basic downstream document was their prelim statement, the paper they completed in order to officially end coursework and begin their dissertation research. For Sara, it was her dissertation. For Prof. Lane, typical downstream documents were research proposals and articles. Between these two points of reading a source document and creating a new document, people create a variety of transitional, transitory texts, such as annotated citations, notes, and rough drafts. These documents include fragments of the source material, but become something else as they shadow and shape and eventually become the final downstream document. I imagine people shaking a journal article like a patchwork quilt and having certain pieces wiggle loose, float up, and settle down in new arrangements on a fresh length of fabric. Dave remarked that "These interim forms turn into reusable pieces. But the scraps don't live long; they turn into a draft or a presentation." For all four interviewees, the process of reading a source document--and the use of components during this activity--was similar. Each chose an article or two that they had used in their work and told me how they had read it the first time. All reported the same basic process: 1) Read the abstract and introduction in order to ascertain key points about the article. These key points seem to be as useful for determining relevance according to the situation at hand as for getting a quick picture of what the paper is about. Sara reported for one article that "the introduction told me what they based their work on"; for another, "the introduction told me what, how much to look at in rest of article." 2) Skim article headings for a synopsis of the work done, to get a sense of the flow of the author's argument, and decide which sections to look at more closely. 3) Look for and at bulleted lists, summary statements, definitions, and pictures in order to capture the key points of the article. 4) Zero in on any particular sections that seem especially relevant, like methods, findings, directions for further research. 5) Read the conclusions to check your understanding, identify any other key points. 6) Skim references (this step was sometimes done first, sometimes last, sometimes throughout skimming the body of the article). It was also reported that document components were used for orientation and attention markers in the context of reading, telling the reader when to read carefully or skim or skip a particular section. Sara said: "When I saw the equations, I skipped that section" and "the headings and pictures told me what the surrounding text would be about." But for each of the four interviewees, the physical practices that accompanied the manipulation of articles and the nature of transitional documents crafted was unique (see Table 1). The differences reflect different ways of thinking and working and, perhaps, different degrees of tenure in the field. Steven and Dave began adding their own ideas when reading each article, by annotating and making notes. Steven's approach seemed to rely on working through his thoughts on each paper very carefully, then basing further work on integrating his ideas across the papers. He created an elaborate literature database that represented both the articles read and his ideas about them. Dave leapt more quickly from the source document to a set of notes and used less of a building block approach than Steven. Sara didn't start adding her own ideas until she began the draft of her own thesis chapters. She perused all the extracts from all the articles in the context of thinking about each chapter, remarking that this gave her a sense of bigger issues and helped organize her thoughts. Prof. Lane said that she digested an article as she read it the first time and immediately extracted any key point (such as a theoretical concept or validation of an experimental method) that would serve her in her work. She noted that "the little nuggets I get out, that's it" and that she often will not even bother retaining a copy of the article. She has developed what she called her "canon," a set of articles she refers to frequently and whose citations she has memorized. The "nugget" (i.e., the a key point filtered through the lens of her own work) and citation show up immediately in drafts of her own documents; thus, she moved directly from the source document to her own draft, with the intellectual aspect of the extracted pieces stored in her own memory, and the physical residue--the citations--becoming the only pieces actually extracted and kept on file somewhere until they, too, were committed to memory. It appears that the intellectual and physical practices associated with reading, and integrating what is read into one's own work and ideas, vary by individual style and stage of work. These four researchers evince different cognitive styles and represent different stages in their academic careers. The types of transitional documents they created and the way they worked through the literature seem tied to their own stage of development as researchers. The underlined sections in Table 1 indicate the type of "commonplace book" or compilation of excerpts created by each researcher. The doctoral students are moving toward intellectual independence and writing documents for which close study of previous work is especially important; Sara, a bit further along in her work, created fewer intermediate documents representing the gradual articulation and integration of ideas from articles into her own draft document. The faculty member created no intermediate documents at all. As the three graduate students described the way they annotated articles, it became obvious that annotation can be thought of as another form of filtering, rather than augmenting, the source document. As evident from Marshall's study of annotation practices [28], you are not adding so much as extracting only the pieces of interest. You are deleting, mentally (and eventually physically), the bulk of the document that isn't relevant, while at the same time recording your own interpretation and thoughts. Relating their reading and document practices to potential capabilities of the DL, it appears that it would be useful to view certain components in order to decide whether to read, obtain or keep a copy of the full paper. It would also help to be able to cut and paste pieces into transitional documents while still retaining one intact copy. The notion of actually working from some standard, predetermined patchwork of pieces that represented the full article didn't seem to make much sense, however, to the researchers interviewed in this study. Dave said "you have to read the whole thing before you can decide what you want to do with it." Sara said "I could usefully see the title, abstract, headings, figures, conclusions, and references without looking at any other text. But this is just a temporary artifact to use for photocopy decisions -- then I'd throw it away." Prof. Lane noted that she would appreciate the ability to easily extract a picture to use in class lectures. But she also noted the desirability of a DL that would allow her to develop an online conceptual map of the literature. One example she gave was the ability to draw a circle around three abstracts and label them in a way that made their relationship explicit. ISSUES IN THE DISAGGREGATION OF KNOWLEDGETargeting searches to desired portions of documents may aid information seeking, but with what consequences as context and completeness are lost? Under what circumstances can journal articles be treated as structured databases from which researchers can extract only the needed pieces of information? How do researchers maintain links between extracted pieces and the context for understanding those pieces that is created by the rest of the article? Some study participants claimed to use only extracted components, while others provided evidence of reliance on context: I would want to see the graphic embedded in the article where it belongs, not all by itself. I read the abstract, then the introduction, then conclusions. I like to see the whole page; it tells me how they lay out their argument. Those interviewed about the process by which they moved from relevant bits extracted from articles to draft documents of their own described different approaches to maintaining the context of the source documents. Sara's compilation of quotes from the source documents offers the closest approximation to a commonplace book, with no attempt made to incorporate the original context except by appending a citation to the quote so that the original document could be consulted later, if desired. Prof. Lane internalized the context to the extent it was relevant and rarely returned to the fulltext of an article from which she had extracted a critical bit of information. Steven's complex literature database, on the other hand, kept a great deal of the context of the source document handy in the form of a summary, definitions, and other notes. It seems that the way in which researchers "let go" of the original context in moving toward developing their on ideas is as important as the way in which context is kept. In some cases, the nature of component use is determined by a researcher's level of knowledge about his or her field. In other words, context brought to the search is also important. An undergraduate stated in a focus group that "the author isn't important," suggesting perhaps that only the topic of the paper is important for completing undergraduate assignments. Or it may be that undergraduates do not know enough about their fields for the author name or affiliation to provide meaningful clues about an article's topic or more subtle aspects of relevance, like authority or perspective. A graduate student, on the other hand, said: "I look for author names, because I know who's important in my field, and who writes in a way I can understand." In our DLI investigation, we've also heard about ways that digital journal article disaggregation is challenging how research is done and new knowledge is created. An undergraduate was surprised that his professor insisted upon the retrieval and reading of entire articles, as opposed to "those little summaries on the computer" (i.e., the abstracts) when writing papers. A graduate student noted that his work group was weighing the trade-offs between mounting papers on their website and publishing in peer-reviewed print journals. Website papers had the advantage of allowing the incorporation of nontraditional (e.g., video clips, data sets) and revisable document components. Another issue in the use of document components by researchers is the danger that a restricted view of the field may result from searches that are, in effect, too precise. One study participant commented, for example, that "reference links are more important than keywords, but just following references could give too narrow an outlook on the field." Similarly, restricting a search to institutions that represent research "hot spots" or papers reporting the use of a particular method could limit new knowledge and ideas flowing to the researcher. This study also raises interesting methodological issues related to the study of document disaggregation and provides some insight into how different methods might work together in attempts to study DL use. How can we effectively observe practices spanning online and offline environments? How can insights into document use habits be translated into DL design specifications? How can we make sense of the relationship between underlying needs and practices and use of a prototype system? Each of the data collection techniques used in this investigation added something valuable to the description of component use; each reinforced, augmented, or elucidated the other. For example, without going back to the usability test results, it would have been difficult to know what to make of the disconnect between what people said in focus groups about how they used document components and what the transaction logs revealed about actual component use in the DLI testbed. The descriptions of process that emerged in the last round of in-depth interviews was in some sense unanticipated; without incorporating more open and in-depth querying about component use, the nature of the process of using document components would not have been revealed. CONCLUSIONThis study has been conducted during a period that is just starting to see a transition to the web and other forms of digital infrastructure as the basic work environment (a space for both working and thinking) for some researchers. This transition is bringing new capabilities and rules, new ways of identifying and assessing information, and new means of creating personal collections of usable scraps and full documents. This paper presented preliminary findings on how individual document components are used by researchers to identify, assess, read, and use journal articles. It also identified several issues associated with the disaggregation of knowledge in documents and pointed to several implications for the design of DL features related to searching, viewing, and manipulating individual journal article components. Clearly, the nature of journal article disaggregation is complex: work context, cognitive style, and the affordances of technology are all decisive in determining how an individual uses article components. The findings reported here are not conclusive, but exploratory. They offer a preliminary framework for studying component use and are based on substantial empirical evidence. Approximately 40 people participated in interviews (whether in focus groups, workplace interviews, or usability tests) for this study, while transaction log data derived from about 160 more. I plan to continue this study by interviewing undergraduates and several additional faculty members working in the area of human-computer interaction, before moving on to interview students and faculty in other disciplines. Transaction log analysis will continue, and DLI user surveys will include questions related to component use. The document use practices described in this study can be viewed in the context of current discussion and standards activity related to the development of metadata in the digital environment: How might customized document representations emerge as journal articles become more malleable? Metadata help people identify, describe, collocate, locate (and re-locate) documents, according to objectives of the library catalog originally set forth by Cutter in 1904 [29]. Metadata are traditionally thought of as bits of information that describe the document (e.g., author's name, title, subject descriptors). But it is clear that components of the document itself can also function in these roles. A figure can summarize the content of an article better than the subject headings do, or can help one remember the content better than the title can. What emerges from this study is a different view of what counts as metadata as well as a fuller picture of the functioning of metadata. Several people commented in a faculty focus group that the process of stepping from "a little information" to "more information" (e.g., from titles and authors, to introductions and section headings, to fulltext) was crucial to their review of documents and that it would help if digital systems supported this process and made it more customizable to individual needs. Further, we have yet to see what new practices will emerge in digital libraries to support the full lifecycle of component use, from document identification, to reviewing and filtering, to reading, and ultimately to the application of existing knowledge to ongoing research. ACKNOWLEDGEMENTSThe research reported in this paper was conducted under grant no. NSF IRI 94-11318COOP. I'm grateful to other members of the University of Illinois DLI Social Science Team--Leigh Star, Laura Neumann, Emily Ignacio, Bob Sandusky, Cece Merkel, Eric Larson, and Rebecca Baldwin Engsberg--for their role in collecting the data reported here and for sharing their ideas on component use. I also appreciate the support and insights provided by Andreas Paepcke, Cathy Marshall, David Levy, Susan Anderson, Boyd Rayward, and Les Gasser. REFERENCES1. University of Illinois at Urbana-Champaign NSF/DARPA/NASA Digital Libraries Initiative (DLI) project homepage. [http://dli.grainger.uiuc.edu/] 2. Brown, John S. and Duguid, Paul. The social life of documents. First Monday, Issue 1 (1996). [http://www.firstmonday.dk/issues/ issue1/documents/index.html] 3. Agre, Philip E. (1995). Designing genres for new media: Social, economic and political contexts. The Network Observer 2, 11 (1995). [http://communication.ucsd.edu/pagre/tno/november-1995.html#designing] 4. Orlikowski, Wanda and Yates, J. Genre repertoire: The structuring of communicative practices in organizations. Administrative Science Quarterly, 39, 4 (1994), 541-574. 5. Levy, David and Marshall, Catherine C. Going digital: A look at assumptions underlying digital libraries. Communications of the ACM, 38, 4 (1995), 77-83. 6. Bazerman, Charles. Shaping Written Knowledge: The Genre and Activity of the Experimental Article in Science. University of Wisconsin Press, Madison, 1988. 7. Latour, Bruno and Woolgar, Steve. Laboratory Life: The Social Construction of Scientific Facts. Sage, Beverly Hills, 1979. 8. Rayward, Boyd. Madness, hype, or vision of hope: The World Brain and the organization of the knowledge in all of the world. Draft manuscript, 1997. 9. Moss, Ann. Printed Commonplace-Books and the Structuring of Renaissance Thought. Clarendon Press, Oxford, 1996. 10. Ong, Walter J. Interfaces of the Word: Studies in the Evolution of Consciousness and Culture. Cornell University Press, Ithaca, 1977, pp. 147-188. 11. Buckland, Michael K. What is a document? Journal of the American Society for Information Science 48, 9 (1997), 804-809. 12. Burbules, Nicholas C. & Bruce, Bertram C. This is not a paper. Educational Researcher 24, 8 (Nov. 1995), pp. 12-18. [http://www.ed.uiuc.edu/EdPsy-387/This-is-Not-a-Paper-folder/orig.This-is-Not-a-Paper.html] 13. Erkes, J.W., Kenny, K.B., Lewis, J.W., Sarachan, B.D., Sobolewski, M.W., and Sum, Jr., R.N. Implementing shared manufacturing services on the World-Wide Web. CACM 39, 2 (Feb. 1996), pp. 34-45. 14. Paepcke, Andreas. Information needs in technical work settings and their implications for the design of computer tools. Computer-Supported Cooperative Work 5, 1 (1996), pp. 63-92. 15. Crane, Gregory. Building a Digital Library: The Perseus Project as a Case Study in the Humanities, in Proc. 1st ACM International Conference on Digital Libraries (Bethesda, MD, March 20-23, 1996), ACM Press, pp. 3-10. 16. Phelps, Thomas A. and Wilensky, Robert. Toward Active, Extensible, Networked Documents: Multivalent Architecture and Applications, in Proc. 1st ACM International Conference on Digital Libraries (Bethesda, MD, March 20-23, 1996), ACM Press, pp. 100-108. 17. Wilensky, R. and Phelps, T. Multivalent Documents: From Presentation to Collaboration. Presented at the DLI 98 Project-wide workshop (Berkeley, Jan. 5-6, 1998). [http://HTTP.CS.Berkeley.EDU/ ~wilensky/ucb-mvd.ppt] 18. Buckland, Michael K. and Plaunt, Christian. Selecting Libraries, Selecting Documents, Selecting Data. To appear in Proc. International Symposium on Research, Development, & Practice in Digital Libraries (ISDL 97) (Tsukuba City, Japan, Nov. 18-21, 1997). [http://bliss.sims.berkeley.edu/ papers/isdl97/isdl97.html] 19. Line, Maurice B. The death of Procrustes? Structure, style and sense. Scholarly Publishing (July 1986), pp. 291-301. 20. Stewart, David W. and Shamdasani, Prem N. Focus Groups: Theory and Practice. Sage, Newbury Park, CA, 1990. 21. University of Illinois at Urbana-Champaign NSF/DARPA/NASA Digital Libraries Initiative (DLI). DLI Social Science Team Home Page: Internal Reports -- Focus Group Summaries. [http://anshar.grainger.uiuc.edu/ dlisoc/socsci_site/internal-reports.html] 22. Flanagan, J.C. The critical incident technique. Psychological Bulletin 54, 4 (July 1954), pp. 327-358. 23. Barry, Carol L. and Schamber, L. User-Defined Relevance Criteria: A Comparison of Two Studies, in Proc. 58th ASIS Annual Meeting (Chicago, Oct. 9-12, 1995), Information Today, pp. 103-111. 24. Breton, Ernest J. Why engineers don't use databases: Indexing techniques fail to meet the needs of the profession. ASIS Bulletin 7, 6 (Aug. 1981), pp. 173-177. 25. Entlich, Richard, Garson, Lorrin, Lesk, Michael, Normore, Lorraine, Olsen, Jan, and Weibel, Stuart. Testing a digital library: User response to the CORE Project. Library Hi Tech 14, 4 (1996), pp. 99-118. 26. Monk, Andrew, Wright, Peter, Haber, Jeanne, and Davenport, Lora. Improving your Human-Computer Interface: A Practical Technique. Prentice-Hall, New York, 1993. 27. University of Illinois at Urbana-Champaign NSF/DARPA/NASA Digital Libraries Initiative (DLI). DLI Social Science Team Home Page: Internal Reports -- Usability Reports. [http://anshar.grainger.uiuc.edu/ dlisoc/socsci_site/internal-reports.html] 28. Marshall, Catherine C. Annotation: From Paper Books to the Digital Library, in Proc. 2d ACM International Conference on Digital Libraries (Philadelphia, July 23-26, 1997). ACM Press, pp. 131-140. 29. Cutter, Charles Ammi. Rules for a Dictionary Catalog. 4th ed., rewritten. Government Printing Office, Washington, D.C., 1904.
Table 1. Movement from Source Journal Article to Creation of Own Document, as Described by Four Researchers |