Management
Information and Program Evaluation Systems by Alan Walter Steiss
PROGRAM EVALUATION
Program evaluation has been a watchword in government for over three decades. The systematic assessment of public programs, however, has remained more a promise than a practice. Public goals and objectives often are nebulous and ill-defined. Consequently, the identification and measurement of program results is even more elusive. The first major task of evaluation is to decide what to evaluate and how to evaluate it. Not all programs or projects need to be, can be, or should be evaluated in depth. Less expensive, short-term programs or programs that may be politically vulnerable, for example, may not warrant a costly, multi-layered statistical analysis. As Wholey has noted: "From the point of view of decision-makers, evaluation is a dangerous weapon. They don't want evaluation if it will yield the 'wrong' answers about programs in which they are interested." [1] In such situations, political pressures frequently override empirical evidence available from formal evaluations. Neverthe-less, decision-makers, who may be operating in the dark, may welcome evaluations that provide systematic data--basic descriptive program information--on a consistent basis. Evaluation of program results should be a critical component of financial planning and control. Conducting such assessment is, in fact, what management control ultimately is all about.
Evaluation: A Many Splendored Thing
Evaluation activities range from simple to complex analyses and include: (1) program monitoring--analyses of data that count the number and/or frequency of activities and operations; (2) process evaluation--analyses of data to assess program processes and procedures and the links between various program activities; and (3) outcome evaluations--analyses of data. Evaluations may look at specific program aspects or at a whole program. Components may be compared across programs or a number of programs may be compared across sites. Such comparisons provide the basis for determining if a program worked, or if one program worked better than something else. Comparisons also can simply track program differences. Complex comparative outcome evaluations can be expensive to conduct, involving consultants, programmers, and statisticians who may not be readily available on agency staffs. Good, useful, credible evaluation research carried out on a more limited scale often can yield critical program data.
The term evaluation has been applied to many different activities. Perkins identifies six basic types of evaluations [2]:
(1) Strategic evaluations are concerned with underlying causes of social problems and focus on "implicit theories" as a basis for broad ameliorative programs.
(2) Intervention effect assessments attempt to establish the relation between program intervention and outcomes; or, in some cases, the processes involved in producing those outcomes.
(3) Compliance evaluations examine the consistency of program objectives with broader legislative aims and attempt to ensure that public funds are allocated in accordance with policy guidelines.
(4) Program design evaluations test the measurability of program assumptions, the overall logic of the program approach, and the assignment of responsibility and accountability for program results.
(5) Management evaluations focus on the efficiency and effectiveness by available resources are deployed to achieve program objectives.
(6) Program impact evaluations deal with program delivery systems and the relation between program results and the legislated goals and program objectives.
The last three types of evaluations are perhaps most relevant in the context of financial planning and management control.
A Working Definition of Evaluation
Some authors have suggested that the term evaluation should be reserved for relatively high-order assessments of the effectiveness of policy decisions. This focus characterized many early efforts at systematic assessments--what has been labeled evaluation research. "In its humble beginnings. . . evaluation research was much like the buzzard, attacking only dead programs. These postmortems were useful in developing a conceptual basis for evaluations but did little to improve policy formulation." [3] Since such full-blown scholarly research some-times evolved over a number of years, significant improvements to on-going programs often were impossible to achieve. Many of the programs chosen for such rigorous analyses were short-term pilot projects. Even when these programs continued, managers were unlikely to utilize the results of these evaluations because: (1) program evaluators were "outsiders"--academic types--often with different perceptions and opinions about the goals of the program; and (2) evaluators tended to focus on the negative aspects of a program and rarely offered constructive advice. [4]
The scale and time frame of evaluations must be such that manage-ment is assisted in formulating viable programs improvements. Moreover, such evaluations must specify program problems in a way that provides clear indications of alternative courses of action to resolve these prob-lems. As Clark has observed, unless evaluation is keyed to meeting specific information requirements and decision-making needs in a timely fashion ". . . it risks being irrelevant--a monument to what might have been." [5]
An evaluation can focus on process--the extent to which programs are implemented according to predetermined guidelines--or on impact-- the extent to which a program produces change in the intended direction. It also is necessary to decide whether the program or the organization responsible for the program is to be evaluated. A program may be evaluated in terms of its effectiveness and costs, but an organization should not be evaluated solely on the basis of its success (or failure) in carrying out a particular program. As Quade has observed, an organization should be judged not by an initial program failure, but by its capacity to learn from failure and to improve the operation of the program. [6]
For the purposes of this discussion, a program evaluation is: (1) an assessment of the effectiveness of ongoing and proposed programs in achieving agreed-upon goals and objectives and (2) an identification of areas needing improvement through program modification (including the possible termination of ineffective programs), which (3) takes into account the possible influence of external and internal organizational factors.
Efficiency Versus Effectiveness
The purpose of many program evaluations has generally been to improve efficiency. Questions of efficiency often are defined and answered strictly in least-cost terms, with minimal consideration of priorities or of the relative worth of the programs pursued. It is possible to do things very efficiently, but if they are the wrong things to do, they will have little positive impact on the problems to which a public program is directed. Improving efficiency may not require any drastic changes in program strategies. Increasing effectiveness, however, often entails radical program adjustments--one reason why evaluations that focus on effectiveness may not be fully utilized.
The notion of a criterion of efficiency, first formulated by Herbert Simon, asserts that a choice among alternatives should be made in favor of the course of action that produces the largest result for a given application of resources. [7] To guide this choice, however, Simon notes that it is necessary to determine appropriate levels of goal attainment or program adequacy (e.g., a minimum acceptable level of performance). In the absence of such definitive statements of goals and objectives, measures of efficiency cannot provide the insights necessary to make appropriate judgments about program achievements or benefits.
Recent development in cost-benefit analysis illustrate the shift in efficiency to effectiveness. Cost-benefit analyses require estimates of direct and indirect costs and of tangible and intangible benefits. Once specified, costs and benefits are translated into a common measure, usually although not necessarily a monetary unit. Comparisons are then made by computing a benefit-cost ratio; net benefits (benefits minus costs); or some other value, such as an internal rate of return. Such evaluations focus on issues of efficiency, that is, the greatest benefits for the lowest cost.
Cost-effectiveness analysis requires a model that can relate incremental costs to increments of effectiveness. Costs may be expressed in monetary terms, but program benefits or outputs are expressed in terms of the actual substantive performance associated with program objectives. A cost curve is developed for each alternative, representing the sensitivity of costs (inputs) to changes in the desired level of effectiveness (outputs).
Formative and Summative Evaluations
The highest priority in evaluations often is given to instrumental outcomes that are related to program goals and objectives and serve as indicators of program effectiveness. Other measurable outcomes can be critical, however. The key product of an evaluation may be knowledge about the implementation of the program (rather than the program itself) or the quality of the larger system in which the program is located. Evaluation also may produce understanding about a program or program among constituents at odds or factions of the system under scrutiny. This information, in turn, may make consensus-building another important outcome.
A comprehensive evaluation should be based on both formative and summative techniques (see Exhibit 1). Formative evaluations provide the information necessary to design and/or modify service delivery systems. Such evaluations include (1) an analysis of the needs to be met or the problems to be solved; (2) a determination of whether or not a public program should be initiated to meet such needs; and if so, (3) how the program should be designed. Summative evaluations measure performance and program impacts. These two types of evaluations are closely inter-related. Information derived from summative evaluations of program impacts provides input for continuing formative evaluative efforts.
At first glance, designing a measurement system capable of providing this evaluative information might appear to be an awesome undertaking. When seen in a historical context, however, practically all public services are provided as a result of decisions made over time, based directly on such formative and summative information. The mix of services provided by local government reflect a variety of commitments made by the governing body, regulations imposed by other levels of government, and administrative decisions made by appointed officials.
Formative decisions are expressed through local ordinances, budget documents, state statutes and regulations, intergovernmental contracts and agreement, federal laws, and so forth. While administrators can make important contributions to these decisions, it is more likely that formative evaluations will be useful in developing better decisions concerning the improvement of service delivery systems once these broader commitments are made. As Weiss notes: "The analysis of program variables begins to explain why the program has the effect it does. When we know which aspects of the program are associated with more or less success, we have a basis for recommendations for future modifications." [8] In short, effective evaluation not only describes what is happening, it also helps to determine which features of a program are successful and which are not.
In order to make such determinations, both input and intervening variables must be measured. Input variables include information that might be considered extraneous to the program itself. Analysis of input variables, however, can provide information necessary to identify more clearly why a program might or might not be successfully implemented in a particular jurisdiction. Data collection on input variables should be undertaken with the limitations of time and cost constraints in mind. As Weiss suggests, ". . . most evaluations have limited resources, and it is far more productive to focus on a few relevant variables than to go on a wide-ranging fishing expedition." [9]
Two kinds of intervening variables must be measured: (1) program operation variables; and (2) bridging variables, i.e., the intermediate steps selected as a means to achieve program objectives. A clear understanding of the causal relationships between intermediate activities and their consequences has a direct impact upon the ability of a government agency to meet its objectives. A poorly conceived program, no matter how effectively implemented, contributes relatively little to the overall effectiveness of an agency.
Organizational constraints again will limit the time and resources that can be devoted to the analysis of intervening variables. One approach is to involve program managers, either through formal or informal procedures, in seeking answers to such questions. Whether or not the connections between program design and objectives are formally determined, "there are almost always some prevailing notions, however unexplicit, that certain intermediary actions or conditions will bring about the desired outcomes." [10]
Clarifying Program Objectives
Complete clarity as to the anticipated program impacts seldom comes from an examination of the final statements of the program planning process. Therefore, before an evaluation can be initiated, it often is necessary to determine the exact character and intent of specific program goals and objectives. Shortell and Richardson have identified ten criteria for clarifying program objectives (see Exhibit 2).
Exhibit 2. Criteria for Clarifying Program Objectives
(1) Nature or content of the objective. It is important to determine the intended changes to be brought about by the program.
(2) Ordering of objectives. Objectives should be clearly presented at each level of abstraction, with corresponding operational indicators to determine if the objectives have been met.
(3) Target groups. The specific group(s) to which the program is directed should be identifiable in terms of age, sex, ethnic categories, geographic boundaries, etc.
(4) Short-term versus long-term effects. The short-term impacts and the long-term effects of any program should be documented.
(5) Magnitude of results. It is necessary to determine how large (or small) an effect will be acceptable as a positive indicator of success.
(6) Stability of outcomes. For many programs, the effects are meant to be lasting; for others, particularly programs involving behavioral changes, additional exposure (reinforcement) to the program may be necessary.
(7) Multiplicity of objectives. It is important to clarify objectives to the extent that possible conflicts among them can be identified and dealt with.
(8) Importance. While objectives often differ in importance, and individuals may disagree on their relative value, some attempt should be made to place objectives in some general priority order.
(9) Interrelatedness. Linkages should be identified especially when a set of lower-order objectives may serve as an important component in the achievement of higher order objectives.
(10) Second order consequences. It is important to identify possible side effects of the program--effects not intended but anticipated, or even unanticipated, by the initiators of the program.
The final products of the formative evaluation process should be: (1) a service delivery plan, based on an understanding of the causal relations between the activities to be performed and the desired results; (2) a set of goal statements, outlining a course of action in broad terms; and (3) supporting objectives, which provide for the quantification of progress toward goal achievement. The goals and objectives developed through formative evaluation techniques should represent the best available solution for a particular problem (within the constraints of available resources). They should also provide a foundation for the subsequent development of mechanisms with which to measure the actual performance of public programs and their impacts on the community. The complexities inherent in an analysis of the relationships that exist between government programs and desired results, and the difficulties surrounding the development of adequate goals and objectives represent a significant challenge to the program manager, however.
Traditional Performance Measures
It is important to pick and choose what is to be measured and how it is to be measured. The tendency to measure everything and them to attempt to sort the wheat from the chaff can result in considerable waste of time and money. In this regard, the computer can be a dangerous instrument. There may be a fine line between what is interesting and even useful and what really needs to be known.
Whenever possible, measures should be pared down and selected to tie in with the goals and objectives of the program and its implementation. Another selection criterion is accessibility of the data. Measures have to be selected that have a high probability of being generated by the program. This requires some advanced preparations.
Program objectives should included (or be capable of being translated into) explicit measures of performance. When a workload measure is related to a unit of input (e.g., cost), it is transformed into an efficiency measure. Output can be related to input costs, to units of labor (such as staff-hours), or to units of time. Thus, efficiency measures might include miles of street paved per unit of dollars expended, acres of park land mowed per staff-hour, or number of buildings inspected per month.
Traditional performance measures also include work standards, that is, measures of the amount of effort that should be required to complete specific tasks. In an evaluation system, performance is recorded relative to such standards. If the work standard for reading electric meters is 200 per day, for example, a meter reader collecting data from 175 meters would have met 87.5 percent of the standard. The use of such measures requires service outputs that are characterized by fairly routine procedures, which themselves are standardized. Minimum quality standards also are required.
Utilization statistics provide another kind of performance measure--e.g., percentage of total capacity utilized, equipment downtime, unbillable hours or nonproductive staff time, etc. Utilization statistics should be integrated into the program evaluation system in a manner similar to other indicators; that is, they should be linked directly to goals and objectives.
Finally, performance measures may include some effectiveness criteria. For example, rather than stating police patrol performance in terms of arrests made per officer, the number of arrests might be further qualified in terms of those clearing the initial judicial screening. Instead of measuring the number of households provided with a given service, an assessment might be made in terms of those households satisfied with the service. Such information can also be related to units costs; for example, the number of households satisfied per input dollar.