To date, most transaction log analysis (TLA) has been done with OPACs and CD-ROM databases (Peters, 1993). However, Peters (1993) points out that "aggregate usage patterns of new types of IR systems, such as Gopher, are useful and enlightening" (p. 46). Peters goes on to say that "studies of reference service providers' use of IR systems still need to be undertaken" (p. 57) and that TLA studies need to move into examining the use of IR systems over the Internet.
Why use TLA? Kaske (1993) Crawford (1987) identifies the two main purposes of TLA to be performing statistical analysis of system performance and use, and undertaking analysis of searching behavior and problems. Sandore (1993) identifies many ways in which the results of TLA can be applied to improve information systems. These include anticipating the evolution of system use and demands, determining user preference for experimental changes, monitoring the use of help systems, determining instructional needs, and monitoring user searching patterns. Wallace (1993) demonstrated how TLA can identify bibliographic instruction needs and point out weaknesses in information system design. Young (1992) illustrated the use of TLA as a collection management tool.
The unobtrusive nature of TLAs, while in many respects a strength, can also be a weakness. Kurth (1993) states: "Transaction log data effectively describe what searches patrons enter and when they enter them, but they don't reflect, except through inference, who enters the searches, why they enter them, and how satisfied they are with their results" (p. 98). Kurth further goes on to explain that errors in TLA can arise through limitations of the online system, the inability to isolate and characterize individual users, and decisions and biases of the researcher analyzing the logs. To account for some of the shortcomings in TLA, Cochrane & Markey (1983) suggest combining TLA with another type of analysis (either questionnaire or protocol) to provide a more complete picture which can draw on the strengths of both types of studies.
Polly & Cisler (1994) point out two weakness of the use of the Web as an information system: slowness and "chaotic disorganization [sic]" (p. 34). While the issue of speed will have to be taken up by computer scientists and engineers, the disorganization of the Web is a prime target for librarians to tackle. Powell (1994) was one of many to identify the uses to which libraries could put the WWW in the creation of library information systems. To date, hundreds of libraries around the world have created and mounted various documents on the Web, from simple informational 'handouts' to Internet resource subject guides to a library (the Internet Public Library) which exists solely on the Web (Goldberg, 1995 ). It is hoped that applying TLA to WWW systems will go a step or two towards evaluating and improving library WWW information systems.
The Internet community is also starting to recognize the need for Web server TLA (though they may not know the term). Cutler & Hall (1995) point out that business with Web sites "want to know the answers to some relatively simple questions:
Goals & Methodology
This project had three goals:
1. Devise a program to extract transaction logs from NCSA HTTP server access logs, containing the following information:
All programming was done in Perl, a widely used, relatively easy, cross-platform, free language (Potter, 1995 ). The program was designed to use the access logs generated by NCSA's HTTP server, one of the most popular and widely used Web servers; this access log format has been adopted as a common log format for many other HTTP servers. Clark has been tested on access logs from the Internet Public Library and the University of Michigan Engineering Library's WWW sites. The program works well, though is slow and still has a few minor things that need to be ironed out. See the online documentation for Clark for more information.
2. Devise one or more programs which use the transaction log to analyze the information contained therein. Ideas include:
3. Develop documentation so that others can use the programs, write analysis programs of their own, and modify the transaction log generating program as their own research needs require.
The documentation includes this report, manual pages for Clark and statanal, and comments within the code, all of which can be found at the Clark distribution site on the World-Wide Web. The clark.pl and statanal.pl Perl scripts can also be downloaded from this site.
Limitations
Besides the limitations inherent in transaction logs, the transaction logs I develop will be hindered by two additional limitations inherent in WWW systems:
Future Work
Clark is in a good, usable state right now, but can certainly use more work. Here's what I'd like to accomplish in the near future:
Andreessen, M. & Bina, E. (1994). NCSA Mosaic: a global hypermedia system. Internet Research, 4(1), 7-17. return to text
Cochrane, P. A. & Markey, K. (1983). Catalog use studies--since the introduction of online interactive catalogs: impact on design for subject access. Library & Information Science Research, 5(4), 337-363. return to text
Crawford, W. (1987). Patron access: issues for online catalogs. Boston: G. K. Hall. return to text
Cutler, M. & Hall, D. (1995). Sizing 'em up. Internet World, 6(8), 22-24. return to text
Fielding, R. (1994). wwwstat -- distribution information. http://www.ics.uci.edu/WebSoft/wwwstat/ return to text
Goldberg, B. (1995). Virtual patrons flock into the Internet Public Library. American Libraries, 26, 387-388. return to text
Kaske, N. K. (1993). Research methodologies and transaction log analysis: issues, questions, and a proposed model. Library Hi Tech, 11(2), 79-85. return to text
Kerr, E. (1995). Personal Communication, June 21, 1995. return to text
Kurth, M. (1993). The Limits and limitations of transaction log analysis. Library Hi Tech, 11(2), 98-104. return to text
Nickerson, G. (1992). World Wide Web: hypertext from CERN. Computers in Libraries, 12(11), 75-77. return to text
Ottaviani, J. S. (1995). Archimedes: analysis of a HyperCard reference tool. College & Research Libraries, 56(2), 171-182. return to text
Peters, T. A. (1993). The History and development of transaction log analysis. Library Hi Tech, 11(2), 41-66. return to text
Peters, T. A., Kurth, M., Flaherty, P., Sandore, B., & Kaske, N. A. (1993). An Introduction to the special section on transaction log analysis. Library Hi Tech, 11(2), 38-40. return to text
Peters, T. A., Kaske, N. K., & Kurth, M. (1993). Transaction log analysis. Library Hi Tech Bibliography, 8, 151-183. return to text
Polly, J. A. & Cisler, S. (1994). What's wrong with Mosaic? Library Journal, 119(7), 32-34. return to text
Potter, S. (1995). comp.lang.perl.* FAQ. http://www.cis.ohio-state.edu/hypertext/faq/usenet/perl-faq/top.html return to text
Powell, J. (1994). Adventures with the World Wide Web: creating a hypertext library information system. Database, 17(2), 59-66. return to text
Sandore, B. (1993). Applying the results of transaction log analysis. Library Hi Tech, 11(2), 87-97. return to text
Wallace, P. M. (1993). How do patrons search the online catalog when no one's looking? Transaction log analysis and implications for bibliographic instruction and system design. RQ, 33(2), 239-252. return to text
Young, I. R. (1992). The Use of a general periodicals bibliographic database transaction log as a serials collection management tool. Serials Review, 18(4), 49-60. return to text
Copyright 1995 David S. Carter, All rights reserved