Testing the Usability of Synchronous Computer-Supported Cooperative Work Products Lynellen Perry July 25, 1994 Dr. Carter CS 9253 Topics in Software Engineering: Usability 10 week Summer Term "The needs of a group using a tool collaboratively, are different from those of an individual user" John C. Tang "The needs of the many out-weigh the needs of the few, or the one..." Star Trek II: The Wrath of Khan Introduction Computer-supported Cooperative Work (CSCW) goes by many names: groupware, computer-supported collaboration, workflow, group decision-support systems (Palmer, 15), electronic meeting systems (Valacich, 261), and probably several others. There are nearly as many definitions of CSCW as there are authors on the subject, and include the following. Palmer et al. define CSCW as "people working together on a product, research area, topic, or scholarly endeavor with help from computers" (Palmer, 15), but also as "A system that integrates information processing and communications activities to help individuals work together as a group" in the same paper (Palmer, 16). Palmer does not make a distinction between the term "CSCW" and any of the other terms mentioned above. Greenberg, however, states that 'groupware' is merely software that "supports and augments group work" while 'CSCW' is "the scientific discipline that motivates and validates groupware design . . . the study and theory of how people work together, and how the computer and related technologies affect group behavior" (Greenberg, 133). In this view, CSCW collects research from scientists in the Computer Science, Cognitive Science, Psychology, Sociology, Anthropology, Ethnography, Management, and Management Information Systems fields. Many software products can fit into the 'groupware' concept: email, bulletin boards, asynchronous conferencing, group schedulers, group decision support systems, collaborative authoring tools, screen-sharing software, computer equivalents to whiteboards, video conferencing (Greenberg, 133), multigroup decision-support systems (Palmer, 16), computer-assisted design/computer-assisted manufacturing (CAD/CAM), computer-assisted software engineering (CASE), concurrent engineering, workflow management, distance learning, telemedicine, real-time network conferences (MUDs and MUSHs) (Grudin, 20), and even spreadsheet programs (Nardi, 161). Each of the software types above fits its users into one of the space/time categories which are shown in the table below (Grudin, 25). There are many research papers describing the testing of distributed systems, where the users are not in the same place and/or not working at the same time. However, this research review focuses of products where the intended users are in the same place (same room), working together at the same time on the same project. This is called synchronous work.Grudin's Space/Time Categories same time; same place different and predictable time; same place different but unpredictable time; same place same time; different and predictable place different and predictable time; different and predictable place different but unpredictable time; different and predictable place same time; different but unpredictable place different and predictable time; different but unpredictable place different but unpredictable place and time Research on the Usability of Synchronous CSCW Systems Now that some of the terms in the title of this paper have been defined, let us turn to the usability testing aspect. Systematic, formal, scientific usability testing is still a rather new research area. The basic methods of testing usability include heuristic evaluation, user testing, and cognitive walkthroughs. Nielson describes five usability attributes that testing should measure: learnability, efficiency, memorability, errors, and satisfaction (Nielson, 26). From the literature, it appears that user testing is the most widely used form of usability testing and that efficiency and satisfaction are the usability attributes most often used as measures. All papers reviewed below implement some variety of a user test to study the usability of various groupware products. Cognoter Tatar et al. performed usability tests on Cognoter, a software package designed to aid small work groups (two to five people) in the creation of a plan or outline. To test the usability of this software, Tator and colleagues ran several user tests. These user tests took the form of a series of two hour working sessions of two groups, each of which consisted of three users who were experienced with computers. The test room contained a workstation for each test user, and a large screen which was configured to display to the group the shared work area available in the software. The Cognoter software is divided into a private editing area and a shared work area. Any individual may type in a brief note in the private area and then release it to the shared area. These notes appear randomly in the shared area as icons with a keyword. Any icon may be clicked upon to reveal the full annotation that was entered. The experiment was run two times. The first time, a pilot test, the experimenters had severe problems observing the user groups as they worked. Given the way the experiment was set up, it was impossible for the observers to see the details of work because each user had a separate machine. Also, the observers were people who were very familiar with the performance characteristics of the Cognoter software, and tended to compensate for any problems the users had, thus biasing the results. So for the actual experiment, Tatar et al. videotaped each test session and also logged all messages sent between machines. This solved the problem of not being able to observe everything that went on during the test, but the other problem mentioned above was not corrected. The three users chosen to be tested in each group were expert users of Cognoter. They were long-term collaborators who were familiar with the editor, window system, and mouse conventions used in the Cognoter software. Three developers were available to help the test users with any problems that arose. The tasks for each of the two groups tested were not the same. Each group was asked to use the Cognoter software to brainstorm about a subject of their own choosing that would be useful for their own work. By not specifying the task for both groups, another variable has been introduced into this experiment, yet the authors did not comment on this fact or analyze what effect this had on the test. There were no usability goals set for this test, and no methods of measuring the usability were described. The goal appears to have been simply the discovery usability problems, though problems found were not rated by severity after the test. The results reported were that users expressed "extreme frustration and reduced efficiency" (as compared to working with traditional paper and/or whiteboard). Though classed as "experts", neither group understood the software well enough to use its full potential. In Group A, everyone first worked on their own, using the private editing area but not looking at each other's work or talking or sharing ideas in any way. They then left the computer and worked together on paper. The effect of this was that the software was used as a sort of word processor, not as a tool for group interaction. Group B, on the other hand, figured out how to use the video capability of the software so that the whole group could see the work of whoever was typing. They didn't understand though, that there could be more than one typist at a time. Despite this bit of success in using the software for group interaction, Group B also expressed their frustration visibly, with people putting their head in their hands, raising their voices, and threatening to walk out. This rather disastrous user test of the Cognoter software indicated that the users were experiencing two types of problems. First, users wanted to see things in the workspace that the system would not let them see. Second, users mistook references in one another's speech or actions (pointing to individual screen and saying "there", "this", "that", etc.) and could not resolve the difficulty satisfactorily. These problems lead to the decision that eight design decisions were at fault for the user difficulties (Tatar, 198). 1. Separate screens -- gaze and gestures were missed by group members because each was busy looking at their own screen. 2. Lack of sequentiality -- there was no way to know where the next icon would appear, or to know in what order the icons had been created. 3. Short labels on the icons limited the information the group could see. 4. Anonymity. 5. Private editing allowed someone to change a previous contribution, thus losing information. 6. There was unpredictable delay between the release of a privately edited item and the time it appeared on other user's screens. 7. Private moving of icons caused an icon to change position suddenly on other user's screens and thus lose its identifiable position. 8. Individually tailorable windows caused confusion when attempting to reference a particular item on the screen. These design decisions "made Cognoter items more difficult both to create and to use than whiteboard objects" (Tatar, 203). In a redesign of the software, only the last four of the above design decisions were changed. In addition to the above software usability problems, Tatar's paper also discusses what they learned about groups and modes of conversation. These topics, while interesting and necessary to understanding groups in order to build worthwhile software to support groups, are beyond the scope of this research review. GroupSystems Another electronic meeting support (EMS) system, GroupSystems, is described in Valacich et al. EMS systems can be used to support distributed groups. However, The University of Arizona (where this research was conducted) has focused on face-to-face (synchronous) meetings. Tasks that can be accomplished with the GroupSystems facilities at The University of Arizona include "communication, planning, idea generation, negotiation, conflict resolution, systems analysis and design, and collaborative group activities such as document preparation and sharing" (Valacich, 261). Valacich et al. measure the usability of the facility in terms of the productivity of the meeting, as manifested by the reduction or elimination of the "dysfunctions of the group interaction (i.e. process losses), so that a group reaches or exceeds (i.e. process gains) its task potential" (Valacich, 262). There are many process losses, but Appendix A discusses the ones relevant to the GroupSystems environment at the University of Arizona. Variables researched that affect the productivity of a group include group size, group task, anonymity, and proximity. The hardware setup at the GroupSystems facility is as follows: each participant has a work area, all of which are arranged to focus on the front of the room. Each work area has a separate color graphics microcomputer networked to the others. There is a facilitator's console to control the EMS, at least one large screen video display, and other audio- visual support such as white boards and overhead projectors at the front of the room. A control room next door to the meeting room has a laser printer and a copier. Valacich et al. summarize seven laboratory studies conducted in the GroupSystems environment where the task for the group was idea generation. They also review two laboratory studies where the group task was decision making. Six GroupSystems field and case studies are then presented. All of these studies are conducted via user testing. Results of these user tests validated the high usability of the GroupSystems software and hardware environment. Participants in the studies, which included several groups from real-world corporations, stated that meetings supported by GroupSystems were much more satisfying, effective, and productive than traditional meetings. IBM has even installed more than 36 electronic meeting rooms around the world, using the software and hardware environment of GroupSystems. In addition to validating the usability of GroupSystems for synchronous group work, these studies uncovered and confirmed data about how groups work. Again, details on this subject are interesting but are beyond the scope of this paper. Briefly, the studies indicate that CSCW software should support large groups (9+ people) over small groups (2-5 people), and that group members should be anonymous for the highest productivity and satisfaction ratings. Amsterdam Conversation Environment Dykstra and Carasik discuss the "theory and concepts in designing a synchronous shared workspace to support human interaction" (Dykstra, 419). They describe an implementation of such a system, the Amsterdam Conversation Environment, or ACE. In agreement with Palmer's definition of CSCW (above), the authors feel that technology should support groups rather than replace or automate activities. To this end, ACE is not task specific. It is meant to provide users with "a common workspace through which they can share and manipulate individual products, where the focus is on stimulating interaction rather than on producing a product" (Dykstra, 420). Dykstra describes how ACE evolved via several iterations of user tests. The current prototype of ACE runs on a network of Macintoshs, with a main server maintaining links between objects, keeping track of users, and controlling the simultaneous update of the user's screens. However, it was designed and user tested first by a physical model, and then on overhead transparencies. If done carefully, these can be cheap ways of performing user testing because no software has been written at this point. Mistakes and redirections are much cheaper when they occur in the design phase rather than after code has been written. The authors were still developing testing procedures for the ACE software prototype at the time they wrote the paper, so no results are available on that phase of user testing. Dykstra does mention, though, the usability attributes they wish to measure. These include user satisfaction (especially in light of the fact that the software provides so few restraints on group process), productivity (as indicated by the amount of "process paralysis" that occurs during use), and learnability (they hope to avoid needing a specially trained facilitator for the software and that the documentation needed will be minimal) (Dykstra, 433). Spreadsheets Nardi and Miller observe that spreadsheets are actually developed by the cooperative work of several people most of the time. They use a simple definition of cooperative work, "multiple persons working together to produce a product or service" (Nardi, 162), and state that there are two forms of cooperative work central to CSCW that have not received much attention. Most CSCW research, they argue, is focused on computer systems that encourage communication between group members. By having this focus, researchers have overlooked the fact that collaboration in programming itself is very common. The sharing of programming expertise and the sharing of domain knowledge are obvious in real-world uses of spreadsheets. Though spreadsheets are usually considered to be a single-user application, Nardi and Miller have found that "spreadsheet co-development is the rule, not the exception", and thus include spreadsheets in the category of synchronous groupware. As further justification of this categorization, they note that spreadsheet users: "1) share programming expertise through exchanges of code; 2) transfer domain knowledge via spreadsheet templates and the direct editing of spreadsheets; 3) debug spreadsheets cooperatively; 4) use spreadsheets for cooperative work in meetings and other group settings; and 5) train each other in new spreadsheet techniques" (Nardi, 163). Nardi and Miller did not perform usability testing of synchronously developed spreadsheets in a traditional way. They did not set up their own experiment, find test users, and then evaluate the results. Instead, they tape recorded interviews with experienced spreadsheet users in their own offices and homes. In these interviews, they asked a fixed set of open-ended questions in whatever order that they came up during the conversation. Through this process, they found that spreadsheet programs are easier to learn cooperatively (spreadsheet experts share their programming knowledge with novices), spreadsheets are developed more efficiently when developed cooperatively (domain knowledge is shared), and there are fewer errors in cooperatively developed spreadsheets (the collaborator can often spot a programming or logic mistake faster than the author). In addition, novice spreadsheet programmers feel more satisfied when they develop spreadsheets cooperatively. These results cover four of the five attributes that Nielson says are a part of usability. Using this rather unorthodox method of experimenting, Nardi and Miller thus found that a "single-user" application could be used in a groupware manner, and they give a few suggestions to software developers of "single-user" applications to help make those products capable of being used effectively by small groups as well as individual users. Summary and Conclusions Each of the research papers reviewed contains a slightly different idea of the term CSCW, and uses different methods in testing the usability of products that support groups. CSCW and Usability testing are still relatively new fields of research, so the vocabulary and the methods of testing have not yet solidified. In addition, groupware products can be divided into nine space/time areas, making the research arena rather large. This research review has focused on research that tests the usability of synchronous computer-supported cooperative work products. It appears that the most popular method of conducting usability tests is the user test. User tests have been used to validate the usability of rather different types of groupware products, from spreadsheet applications originally intended for use by a single user, to electronic meeting rooms that provide both a software and hardware environment to support decision making, brainstorming, and other typical group activities. In all of the above research, the authors show that usability testing of CSCW or groupware products has many side benefits besides the discovery of usability problems of a particular implementation. We do not yet fully understand how groups work and how computers can best support group processes. Researchers are trying to understand, among many other variables, how group members communicate among themselves, how shared work areas (computer-supported and traditional) are used, how important are gestures and other visual and social cues to getting work done, and how proximity and anonymity affect the productivity and satisfaction of the group. Usability studies have, and probably will continue to provide many insights into this area at the same time that they show developers specific problems with particular packages. Appendix A -- Valacich's Process Losses Production blocking. "Refers to the fact that only one member of a group can speak at a time during verbal communication" (Valacich, 262). This has three effects on the meeting: 1. others waiting to speak may forget or suppress their ideas because they eventually seem less relevant or original (attenuation blocking) 2. while waiting to speak, group members may not be truly paying attention to the speaker, rather they are focusing on trying to remember their own idea (concentration blocking) and 3. while listening to a speaker, other group members are not generating their own new ideas (attention blocking). Unequal air time. As groups get larger, the amount of time that each person could possibly use for verbal communication gets smaller. Evaluation apprehension. Individuals may shy away from sharing ideas and comments for fear of negative evaluation by the others present. Free-riding. Individuals may try to make the other group members accomplish the task without any contributions from themselves. Free-riding may be caused by social loafing, but can also be greater when individuals think that their contributions are not as necessary for the success of the group (i.e. it is a large group, so surely someone else will think of the same things that I would have). Cognitive inertia. This is the "tendency of discussions to move along one line of thought without deviating from the current topic" (Valacich, 263). Socializing. Chatting, drinking coffee, eating refreshments, and other non-task related activities. Domination. "Occurs when some group member(s) exercise(s) undue influence or monopolize(s) the group's time in an inefficient manner" (Valacich, 263). Failure to remember. Individuals do not pay attention to and/or remember comments that others have said. Incomplete analysis. This occurs when the group does not use all the information available to it, or fails to challenge assumptions. WORKS CITED Dykstra, E. A., and R. P. Carasik. 1991. Structure and Support in Cooperative Environments: the Amsterdam Conversation Environment. International Journal of Man- Machine Studies 34:419-434. Greenberg, S. 1991. Computer-Supported Cooperative Work and Groupware: An Introduction to the Special Issues. International Journal of Man-Machine Studies 34:133-141. Grudin, J. 1994. Computer-Supported Cooperative Work: History and Focus. Computer 27:19-26. Nardi, B. A., and J. R. Miller. 1991. Twinkling Lights and Nested Loops: Distributed Problem Solving and Spreadsheet Development. International Journal of Man-Machine Studies 34:161-183. Nielson, J. Usability Engineering. Boston: AP Professional, 1993. Palmer, J. D., and N. A. Fields. 1994. Computer-Supported Cooperative Work. Computer 27:15-16. Tang, J. C. 1991. Findings from Observational Studies of Collaborative Work. International Journal of Man-Machine Studies 34:143-160. Tator, D. G., G. Foster, and D. Bobrow. 1991. Design for Conversation: Lessons from Cognoter. International Journal of Man-Machine Studies 34:185-209. Valacich, J. S., A. R. Dennis, and J.F. Nunamaker, Jr. 1991. Electronic Meeting Support: the GroupSystems Concept. International Journal of Man-Machine Studies 34:261-279.