Why is GIS Difficult?
Sally E. Goldin and Kurt T.
Rudahl Abstract Goldin-Rudahl Systems, Inc. University Drive, #213, Amherst Maximum 01002 Usa FAX: +1-413-549-6401 E-mail :seg@goldin-rudahl.com This project used knowledge-engineering techniques to identify obstacles to more successful and less effortful GIS use. We conducted in-depth, structured interviews with sixteen GIS users working in a variety of environments. We then analyzed the content to identify fundamental knowledge elements, common problem solving strategies, and areas of perceived difficulty. Results indicated that institutional and environmental constraints, management of details, and a mismatch between GIS software structure and users' conceptual models are significant factors that interface with effective GIS use. Introduction The use of computer-based Geographic Information Systems (GIS) technology in the government, business, and non-profit sectors has expanded tremendously in the last decade. GIS has become a pivotal component for decision making and planning in government agencies and in business, and already had significant impact in applications ranging from facilities management through marketing analysis to the monitoring of global change and environmental degradation. The audience of GIS Technology continues to diversify, and it is expected that GIS will be adopted by millions of new users in the years ahead. While many of these new users will be seeking simplified tools custom tailored to their applications (Jordan, 1993), a significant fraction will need the rich repertoire of data development, data management, and spatial analysis functions provided by a full-scale GIS software environment. Despite this growth, obstacles to the widespread use of GIS technology remain. One of the most serious is the apparent complexity of GIS analysis techniques and systems. Because GIS commonly uses graphical presentation methods, almost anyone can understand and appreciate its results. However, highly-trained personnel are usually are required to produced these results. GIS systems are actually becoming more complex due to the more integration of Remote Sensing and other functions (Dobson, 1993). Their complexity remains an obstacle preventing many organizations from reaping the benefits of GIS (Gordon & Subra, 1992). Our objective in this research was to identify barriers to rapid learning and effective use of GIS, with the long-term goal of creating knowledge-based software tools to assist experienced GIS users in being more productive. The results were also expected to provide guidelines for GIS education and training. The research used knowledge-engineering methodologies originally developed within the cognitive science and artificial intelligence research communities. These techniques have been applied primarily in the development of experts systems, but have broad potential for more general task analysis and modeling. Knowledge Engineering Knowledge engineering (Waterman, 1986) is the process of creating a formal description of human knowledge or exercise and implementing a computer system that incorporates that knowledge in a usable form. The content of the knowledge derives from interviews with and/or observations of human experts in the problem domain. Techniques exist for analyzing records of experts' behavior and inferring rules and other informations structures. This activity is known as protocol analysis (Newell and Simon, 1972). Knowledge engineering traditionally consists of four stages:
Methodology Sixteen subjects participated as experienced GIS users in the data gathering phase of this research. We attempted to include individuals from a variety of applications areas, and institutional categories (Table1). GIS experience ranged from two years to more than twelve years, the majority of subjects indicated that they had been working with GIS for at least four years. Although the subject population was moderately diverse, they shared one common characteristic: nearly all of the subjects had worked primarily with ERSI's ARC/InfoTM GIS products. This reflects the dominant position of this software in the current GIS market. Six subjects indicated that they had worked with other GIS packages or system. Only one subject did his primary work using a different package. Another common feature was that all the subjects were involved in natural resources or environmental applications, in the broadest sense. However, given their varying institutional affiliations, their perspectives on environmental issues differed considerably.
Each subject was interviewed for a period of 1.5 to 3.0 hours. With one exception, the interviews took place in the subject's office, work area, or place of business. A number of subjects referred to and explained maps and other analysis products as part of their interviews, while's others produced laboratory notebooks and other aids they aids they use in working with the GIS. Interviews were tape-recorded. The structure of interviews was somewhat flexible, in order to encourage subjects to follow associative connections and " think out loud ". The interview had a checklist of topics to be explored, and attempted to each of the following questions:
Approximately 34 hours of interview data were collected. The taped interviews were transcribed for further analysis. The transcribed interviews represent nearly 600 pages of single-spaced text. Hardcopy transcripts were then coded by hand, using a system of colors and line-styles. The coding process identified words, phrases, or sentences in the following categories:
These coding categories were designed to capture of users' discussion of GIS, and represented our initially theory concerning important primitive elements in the GIS knowledge base. After each interview had been coded, the coding results were transferred to several summary files. These files listed each distinct item within a data coding category (e.g. each GIS operation or action mentioned), with a frequency count. References to obviously equivalent concepts were grouped together, retaining the users' wording. Table 2 presents a section of the summary file for data layers. This is intended to provide some idea of the types of items coded, as well as the frequencies and the variation in user terminology. (Note that the full summary file for data layers and data objects included about 380 separate items.)
The coding phase produced nine summary files, one for each of the coding categories listed above, which served as input to the knowledge modeling phase of our research. Knowledge modelling is an inductive process. Working from a set of examples, the knowledge engineer attempts to drive a small set of primitive knowledge categories that systematically account for the example data. We approached our large but diffuse set of sample data in several steps. First, worked with each content category independently, to devise a taxonomy, hierarchy, or set of organizational principles that reduced the data set volume while retraining the data content. Second, we examined the interview transcripts from the perspective of these categories, looking for interactions or relationships among them. In the process, we concluded that some of the content categories, were more important than others in our users' conceptualization of the GIS domain. Results The study reported here produced an enormous amount of data, which has been only partially analyzed, and which can be applied to a different research questions. In this paper, we address only the primary issue in the tittle, namely, why is GIS difficult. Openness of the GIS Domain GIS is difficult because institutional and environmental constraints restrict the analyst's activities or interface with his/her objectives. In the real world, GIS is not distinguished from other activities that strictly might be considered to be unrelated to GIS. In particular, the general hardware and software environment that supports the GIS becomes part of the domain. Many of the problems that people cited as "GIS problems" actually stemmed from difficulties with hardware configurations, operating systems and so on. To the users that we interviewed, the GIS, the hardware, the operating system, the network, etc, are all one environment. Thus, strategies related to working with the GIS are intertwined with strategies related to other aspects of the environment. Similarly, activities related to assembling or compiling information (e.g. reading reports, contacting town planning boards, tracing out boundaries on paper maps, doing ground checks) also appear to be viewed as part of the GIS domain. Problems of source data accessibility or quality, as well as organizational barriers related to data ownership, were all identified by our subjects as "GIS problems". This finding is not terribly surprising. The subjects are professionals focusing on the problems that they need to solve. They use the GIS as a tool in the context of these problems; they derive no advantages from distinguishing GIS operations from other actions they may take in moving toward a solution. However, this observation does pose a problem for someone trying to model the knowledge used in GIS analysis. The domain of "strictly" GIS concepts and knowledge is large and complex in its own right. These interviews suggest that the real knowledge base needed for GIS tasks in significantly larger, and is open-ended. Management of details GIS is difficult it required recording, recall and use of many details, which are outside of the GIS database proper. Practically every subject reported using some kind of external aids for maintaining information about his or her GIS work. Many subjects keep notebooks or logs of their project work, recording a wide variety of items such as coordinates, tolerances, lists of data layers or maps, special cases or problems to be considered etc. They also use sketches and other graphics to record problems, procedures, or plans. In addition to these notes on project work, several subjects indicated that they keep notebooks on topics related to the GIS software itself. These notebooks are used to record problem, workarounds, detailed procedures, shortcuts, and so. In general, subjects view this information as (essential) supplements to the GIS product documentation. Subjects noted that many of the errors they made were due to the failure to record, or inability to recall, this type of detailed information. They had learned, through experience, the importance of maintaining and managing the specifics items of information that they, personally, tended to forget. These finds refute an initial hypothesis. We had expected to find that one of the most difficult aspects of using GIS is generating and expressing an analysis plan, an outline of the steps that will lead from the initial statement of objective to the desired GIS products. This hypothesis was based on observations made during GIS training courses taught by the authors, as well as other sources (Marble, personal communication). While high-level analysis planning may present problems to novices GIS users, results do not suggest that this is true of experienced users. To quote one subject. Data versus Operations as the Focus of Work GIS is difficult because users focus on data , while GIS software focuses on operations. GIS is typically described as a set of operations applied to data: overlying polygons, creating buffers, calculating viewshed. There have even been attempts to specify a formal "algebra" of spatial operations (e.g. Tomlin, 1990). Most commonly-used GIS software is also organized around operations. Originally, each operation was associated with a command verb. Although many GIS software packages now use a graphical, menu-based interface, the action-oriented organization has not changed significantly. First, the user selects an action from the menu; secondarily, he or she supplies information on the data sets to be processed, as operation parameters. The present research suggest that this organization does not match the way that experienced GIS users work. Users conceptualize GIS work in terms of data objects and relations, not in terms of operations or actions. This conclusion is supported by the frequency analysis of content categories as well as subject's direct statements. The number of distinct data items coded is more than twice the number of actions (380 versus 165); similarly, the total frequency of data items is close to double the frequency for actions (1588 versus 973). In short, subjects talked more about data than about operations. Looking just at the actions, 41% of the items and 45% of the frequency is accounted for by references to actions involved in creating or editing data. (Other categories of actions include presenting or visualizing data, manipulating data and software procedures.) One possible explanation for these statistics is that the subjects were more involved in building data bases than in manipulating and analyzing this data. Further examination does not support this argument, however. The coded results include 85 examples of non-trivial GIS applications described by the subjects. Nearly all these applications require some operations beyond simple data capture or editing, and some involved quite sophisticated modelling techniques. Similarly, there were 55 statements of selection or allocation criteria used in modelling. The subjects clearly are doing data manipulation and modelling, but they tend to think about it and talk about it in terms of building data layers or making maps, rather than in terms of applying particular GIS operations. Direct statements by the subjects indicate that the semantic attributes of specific data layers guide analysis strategy. To the GIS software, a wetlands dataset, a land use dataset, and a parcel-boundaries, all collections of polygons that can be operated upon in defined ways. The GIS analyst, though, considers the physical, logical, and social characteristics of these datasets, as well as their spatial structure. He or she knows that a lake "feature" is likely to have at least one stream "feature" physically connected with it, or that an agricultural polygon is more likely to be adjacent to a forest polygon than a high-density urban polygon. In addition, the subjects recognize and use relations between datasets that depend on the real-world meaning of data items; they know that a stream may serve as a part of a parcel boundary, and also delimit farmland from residential land. This rich knowledge about data appeats to be central in the strategies that experienced GIS users bring to bear on analysis problems. However, the GIS software that they use does not support these strategies. At best, the software is neutral; it simply provides no way to capture or utilize the meaning-based relationships inherent in the data. At worst, the software organization may interface with the analysis process, requiring a translation from a data-centered paradigm that loses information and introduces errors. Implications for GIS Training and Education This research, although conducted with experienced GIS users, has relevance to novices as well, and to the design of training programs to turn novices into experts. Assuming that the goal of GIS education is to produce professionals who can use existing GIS tools successfully, to solve actual problems, this research suggests the following recommendations.
The present research indicates that GIS is difficult even for experienced users because: 1) it subsumes a wide range of institute and environment issues, outside the formal definition of the domain; 20 its requires management of large amount detailed information, distinct from the GIS data; 3) the task structure enforced by current GIS softwares does not match the way experts approach GIS problems. Given these findings, we should be able to augment GIS educational programmes to better prepare practitioners to use the powerful capabilities of GIS effectively, thus enhancing the benefits of this technology for society. References
|