Basic Approaches to Evaluation
Ideally, program evaluation seeks to compare what actually happened to what would have happened if the program had not been initiated. Since it often is difficult, if not impossible, to determine exactly "what would have happened if. . . ," the problem is to apply evaluative procedures that can approximate this state. Standard approaches for conducting an evaluation include: (1) before-and-after comparisons; (2) time-trend-data projections; (3) with-and-without comparisons; (4) comparisons of planned versus actual performance; and (5) controlled experimentation.
The First and Final Steps
Each approach begins and ends with the same procedural steps. The first step is to identify the relevant objectives of the program or activities under evaluation and the corresponding evaluative criteria or effectiveness measures. The final step should include an explicit and thorough search for other plausible explanations for the observed changes and, if any exist, an estimate of their effects on the data.
The major purpose of evaluation is to identify changes in those criteria that can be reasonably attributed to the program or activities under study. A major problem, however, is that other factors--such as external events or the simultaneous introduction of other related pro-grams--may have occurred during the time period covered by the evaluation. One of these factors may have been the significant cause of the observed changes and not the program under evaluation. Explicit provisions for controlling at least some of these exogenous factors are included in the second, third, and fifth approaches described below.
Rossi and his colleagues have identified a number of "competing processes" that may influence program effects [11]:
(1) Endogenous change: The condition for which the program is seen as a remedy or enhancement may change of its own accord. In medical research, the phenomenon is known as "spontaneous remission."
(2) Secular drift: Relatively long-term trends in the target population or in the broader community may produce changes that enhance or mask the effects of the program.
(3) Interfering events: Short-term events also may produce enhancing or masking changes.
(4) Program-related effects: The actual evaluation effort may contribute to a bias in the program results--the problem for the evaluator is to maintain the role of the "uninvolved observer."
(5) Stochastic effects: Chance or random fluctuation in any measure-ment effort may make it difficult to judge whether a given outcome, in fact, is large enough to warrant attention. Sampling theory can identify how much variation can be expected by chance.
(6) Unreliability: Data collection procedures are subject to a certain degree of unreliability. The measurement instrument itself may be a major source of the problem.
(7) Self-selection: Segments of the target population easiest to reach are those most likely to change in the desired direction for other reasons. Similar processes in the opposite direction may lead to differential attrition. Dropout rates vary from project to project, but are always troublesome in evaluations.
(8) Maturation trends: Programs directed toward changing persons at various stages in their life cycle must cope with the fact that considerable changes also are association with the process of maturation.
The outcome of any program is a function of net program effects and these confounding elements. These competing processes must be isolated and addressed in each of the evaluation approaches described in the following sections.
Evaluators who are in touch with the program can spot common problems that can threaten the validity and accuracy of evaluation or limit the ability to generalize from the findings. These problems include: maturation effects or changes that occur naturally in individuals doe to the passage of time; history effects--outside changes that can affect dependent and independent variables; and selection effects resulting from biased selection of participants. Participants who drop out of a program also can skew the results. However, a credible evaluation must also deal with participants who did not complete the program.
Evaluators need to be aware of the history, trends, politics, policies, values, and philosophies behind service programs. For example, programs that deal with juvenile justice may reflect bias toward treatment or punishment or may focus on the youth or on the family. It is necessary to know how all of the forces impact on whatever services are being evaluated. Evaluators may have to deal with client groups that receive services from many different programs. Such multiple service systems may have competing and contradictory orientations.
Before-and-After Comparisons
Before-and-after comparisons are the simplest, least costly, and most common evaluative approaches. Such comparisons involve the exam-ination of conditions in a given target population at two points in time--immediately before a program is introduced and at some appropriate time after its implementation. The assumption is that any change in the "after" data, as measured by appropriate evaluation criteria, have occurred as a consequence of the new program. This approach is valid only in situations where program-related changes are clearly measurable and where comparisons are not likely to reflect short-term fluctuations.
The effectiveness of this approach can be increased is the evaluation is carefully planned prior to the implementation of the program. In this way, appropriate data can be collected as a basis for the evaluation criteria. Reliance on data available in established collection procedures seldom provides an adequate basis for such evaluations. Special data collection procedures will increase the cost of the evaluation, but this approach is still the least expensive of the methods outlined.
Time-Trend Data Projections
Time-trend-data projections draw comparisons between actual program data and extrapolated data that suggest conditions which would have prevailed without the program. Data on each evaluative criterion should be obtained at several intervals before and after the initiation of the program activities. Pre-program data are projected to the end of the evaluation period by means of standard statistical methods. Actual and projected estimates are then compared to determine the amount of change resulting from the introduction of the program.
This approach is most appropriate when an underlying trend can be identified over a period of time which would likely continue if the new program had not been introduced. The program objective is to change the direct of this trend--to dampen some undesirable condition or to amplify some desirable change. Statistical projections may be relatively meaning-less, however, if data for prior years are unstable. Likewise, if there is strong evidence that underlying conditions have changed in very recent times, data for prior years probably should not be used.
The time-trend approach adds two cost elements to the first method: (1) the cost of technical expertise to undertake the statistical projections; and (2) the added data collection for prior years. This latter requirement may become problematic in assuring that pre-program data are compatible with post-program or current data.
With-and-Without Comparisons
With-and-without comparisons examine a population to which a particular program has been applied and one or more "control groups" to which comparable programs have not been applied. This approach can be used, for example, if some segment of the population within a community is to be served by a given program while others are not, as is the case when a pilot program is tested. Changes in the values of the evaluative criteria (rates of change as well as amounts) for the "with" and the "without" groups form the basis for this comparisons. The characteristics determining the choice of comparative groups will vary with the types of programs under evaluation. The choice ultimately is based on the judgment of the evaluator as to what nonprogram-related factors might influence the effectiveness of the program under study. Although this approach controls for some important external factors, it generally is not a fully reliable measure of program effects. It is best applied in conjunction with other evaluative methods.
The identification of comparable communities or populations may require considerable effort. The cost may be reasonable if standard data categories are adequate (such as similar population size, proximity, and so forth). The costs may rise significantly, however, if communities are selected for particular combinations of characteristics or to ensure that a similar program effort does not exist in the "without" communities. Since the type of data collection and the precision of these data are likely to vary from community to community, the availability of comparable data may be severely limited. Thus, the cost of this approach will be considerably higher of special data-collection efforts are required.
Comparisons of Planned Versus Actual Performance
After-the-fact comparisons involve rather straightforward procedures and yet are surprisingly rare in their use. This approach requires that specific, measurable objectives or targets be established prior to the initiation of the program. Targets should be identified for a specific achievement within specific time periods (for example: "a reduction in the incidence of juvenile delinquency by 15 percent in two years," rather than: "the elimination of juvenile delinquency"). The actual performance (program outcomes) is then compared to these targets. Such evaluations can be readily undertaken if program targets are expressed in terms of effectiveness measures.
Like the before-and-after approach, this method provides no direct means of indicating the extent to which the changes in values of the effectiveness criteria can be attributed solely to the new program. As with other techniques, an explicit search should be made for other plausible explanations as to why the targets have been met, exceeded, or not met.
Appropriate, realistic objectives must be established as the basis for evaluation criteria. The task of setting objectives may not be taken seriously if the evaluations are not used seriously--a problem with all evaluation techniques. Targets may be overstated and, therefore, unattain-able, or they be understated to make the program achievements look better. If the evaluation findings are used seriously by decision makers, however, a valuable spin-off of this approach is that the establishment of targets is likely to become an important issue. Higher-level officials, as well as program managers, should participate in this process, and the targets should explicitly encompass all key program effects.
The after-the-fact approach can be applied more widely once provision is made for the regular collection of the data necessary for measuring effectiveness. This approach is particularly useful for annual program evaluations. Targets can be set each year for one or more future years. Much can be learned from a careful, systematic examination of the immediate, short-term consequences of a program, even if a more elaborate evaluation method is not applied.
This evaluative techniques is relatively inexpensive compared to other methods. Costs depend primarily on the expenditures necessary to gather additional data for the evaluation criteria selected. The setting of appropriate (measurable) objectives is likely to entail relatively small costs--at least in dollar terms.
Controlled Experimentation
Controlled experimentation is by far the most potent approach to evaluation. Unfortunately, it also is the most difficult and costly to undertake. The procedures may involve many steps of experimental design techniques and can become very complex with respect to a particular program evaluation. The basic steps, however, are as follows:
(1) Identify relevant objectives and corresponding evaluation criteria.
(2) Select target populations that have similar characteristics with respect to their likelihood of being effectively treated by the program.
(3) Assign target population (or a probability sample of that population) to control and experimental groups in a scientifically random manner.
(4) Measure the pre-program performance of each group using the selected evaluation criteria.
(5) Apply the program to the experimental group but not to the control group.
(6) Continuously monitor the operations of the experiment to determine if any actions occur that might distort the findings.
(7) Adjust any such deviant behavior, if appropriate and possible; if not, at least identify and estimate its impact on eventual findings.
(8) Measure post-program performance of each group using the selected evaluation criteria.
(9) Compare pre- and post-program changes in the evaluation criteria of the groups.
(10) Search for plausible alternative explanations for observed changes and if any exist, estimate their effects on the data. [12]
The controlled experiment is most appropriate for the evaluation of programs directed toward specific individuals, such as health programs, manpower training, and so forth, and for a variety of treatment programs such as those of drug and alcohol abuse, correction and rehabilitation, or work-release. It is not likely to be appropriate, however, for programs requiring large capital investments in equipment or facilities.
An important variation on this approach involves the comparison of different geographical areas. Many programs can be split geographically--introduced initially in some localities and not in others. For example, new crime prevention programs, solid waste collection procedures, programs of traffic control, and so forth often are tried out and evaluated in a few areas before receiving widespread application. If it is possible to identify areas with similar characteristics with respect to the program being tested, some of these areas might be designed as program recipients. If trends in the evaluation data before and after the new program was in operation show significant improvements in those areas with the program, then a basis would be provided for attributing the change to the introduction of the program.
This approach is not without its special problems that can bring observed results into question. Some of these are as follows:
(1) Members of an experimental group may respond differently to a program if they realize they are part of an experiment. This problem is known as the Hawthorne effect, after studies by Dickson and Roethlisberger in the late 1920s at the Western Electric Company's Hawthorne Works in Chicago. In these studies, the productivity of the test group increased even under adverse conditions as a con-sequence of their selection for evaluation. To help reduce this problem, it may be necessary to inform members of the control group that they too are part of an experiment.
(2) Results may differ significantly when the program is shifted from a pilot basis to full-scale application. For example, a new crime pre-vention program introduced on a pilot basis may merely cause a shift in the incidence of crime to other parts of the community without any overall reduction in the crime rate.
(3) In some situations, political pressures may make it impractical to provide services to one group, while withholding them from others. Such problems may be lessened by testing variations of a program in several locations rather than allocating program resources on an all-or-nothing basis.
(4) It may be considered morally wrong to provide a service temporarily if the service might cause dependency and leave individuals worse off after the program is withdrawn.
(5) If persons are permitted to volunteer to participate in the experiment-al group, the two groups are not likely to be comparable. A self-selected group will probably be more receptive to the program and thus may not be typical of the whole target population.
(6) Administrative problems may arise and may introduce a bias into the program results. For example, a specially trained staff may be able to deliver the pilot program at a level that cannot be sustained by regular agency personnel who will be called on to administer the full-scale program.
The use of the controlled experiment approach generally costs considerably more than the other evaluation techniques because of: (1) the greater time required to plan and conduct the experiment and to analyze the data; and (2) the higher level of analytical and managerial skills required. This approach implies certain indirect costs arising from the temporary changes made in the way the program operates in order to achieve differential benefits. Innovative projects can be evaluated more readily because pools of "unexposed" potential targets usually are available. Established projects, on the other hand, may require statistical methods that measure the effects in degrees of exposure, as well as by reflective controls that utilize time-series analysis. [13]
Combined Approaches
The selection of an appropriate approach will depend on the timing of the evaluation, the costs involved and resources available, and the desired accuracy. It should be evident that these approaches are not either/or choices. Some or all of the methods can be used in combination. The before-and-after method is relatively weak when applied alone, but becomes much more useful in combination with other approaches. The after-the-fact approach, involving comparisons of planned versus actual performance, is likely to be used more extensively once management information systems become more widely accepted and implemented in the public sector. Although the experimental approach provides the most precise evaluation, its costs and special characteristics result in its being applied on a very selective basis.
Decisions about public programs inevitably are made under conditions of considerable uncertainty. Evaluations can reduce this uncertainty but cannot eliminate it totally. Even though it may not be possible to isolate the effects of a program from other concurrent events, it may be unnecessary to be overly concerned if the evaluation indicates significant program benefits to the community or target population.
Applications of Evaluation Findings
The most comprehensive evaluations are little more than academic exercises if their findings have no impact on the processes by which policies are made and programs are developed. As Rossi has observed: "Evaluations cannot influence decision-making processes unless those undertaking them recognize the need to orient their efforts toward maximizing the policy utility of their evaluation activities." [14] At the same time, the need for evaluation must be recognized and accepted by those public officials with responsibility for the development and implementation of programs and policies. Management and performance audits, sunset legislation, and program reconstruction are examples of mechanisms for the further application of findings of evaluations.
Management and Performance Audits
The traditional emphasis of auditing has been on an assessment of fiscal transactions for accuracy, legality, and fidelity--on the issues of financial compliance. Gradually, more emphasis has been placed on audits that ask: "Were the program milestones achieved in the most efficient and economical way possible?" Management audits involve an assessment of resource utilization practices, including an examination of the adequacy of management information systems, administrative procedures, and organizational structure. A performance audit extends the focus of a management audit to include an examination of program result to determine whether (a) the desired benefits were achieved, (b) program objectives were met, and (c) alternatives were considered that might yield the desired results at a lower cost. A performance audit generally is undertaken when a program or project has been completed or has reached a major milestone in its funding.
The distinctions among three basic types of audits, as described by the U.S. Comptroller General, are shown in Exhibit 3. Regardless of the scope or emphasis, an audit must include the following elements:
(1) Audit criteria--appropriate standards that can be used to measure the actions of management, employees, or their delegated agents in any audit situation.
(2) Causes--actions that took place or that should have taken place to carry out assigned program responsibilities.
(3) Effects--results achieved as determined by comparing actions taken (causes) with the appropriate standards (criteria).
Exhibit 3. Types and Characteristics of Audits
(1) Financial and compliance--determines (a) whether financial operations are properly conducted, (b) whether the financial reports of an audited entity are presented fairly, and (c) whether the entity has complied with applicable laws and regulations.
Sufficient audit work must be carried out to determine whether the audit entity (a) is maintaining effective control over revenue, expenditures, assets, and liabilities, (b) is properly accounting for resource liabilities and operations; (c) is providing financial reports which contain accurate, reliable, and useful financial data that are fairly presented, and (d) is complying with the requirements of applicable laws and regulations.
(2) Economy and efficiency--determine whether the entity is managing or utilizing its resources (personnel, property, space, and so forth) in an economical and efficient manner and the causes for any inefficiencies or uneconomical practices, including inadequacies in management information systems, administrative procedures, or organizational structure.
A review of efficiency and economy shall include inquiry into whether the audited entity, in carrying out its responsibilities, is giving due consideration to conservation of its resources and minimum expenditure of effort. Example of uneconomical practices or inefficiencies include (a) procedures, whether officially prescribed or merely followed which are ineffective or more costly than justified; (b) duplication of effort by employees or between organizational units; (c) performance of work which serves little or no useful purpose; (d) inefficient or uneconomical use of equipment; (e) over-staffing in relation to the work to be done; (f) faulty buying practices and accumulation of unneeded or excessive quantities of property, materials, or supplies; and (g) wasteful use of resources.
(3) Program results--determine whether the desired results or benefits are being achieved, whether the objectives established by the legislature or other authorizing body are being met, and whether the agency has considered alternatives which might yield desired results at a lower cost.
The auditor should consider: (a) the relevance and validity of the criteria used by the audited entity to judge effectiveness in achieving program results; (b) the appropriateness of the methods followed by the entity to evaluate effectiveness in achieving program results; (c) the accuracy of the data accumulated; and (d) the reliability of the results obtained.
Adopted from: The Comptroller General of the United States. Standards for Audit of Governmental Organizations, Programs, Activities, and Functions (Washington, D.C.: General Accounting Office, 1974), pp. 2, 11, 12.
Audit evidence represents facts and information used by an auditor as a basis to come to a conclusion on the audit objective. The information must be relevant, material, and competent. The auditor cannot reach a conclusion from evidence unless fairly specific guidelines are available as to the nature of what is to be audited. Evidence should only be gathered relating to the specific objectives of the audit. The audit objective is a question or a statement at the start of the detailed examination concerning the results expected. The evidence gathered should permit the auditor to reach a conclusion on the statement or to answer the question.
Sunset Legislation
Added impetus for more systematic evaluation procedures has emerged with the adoption of sunset legislation by various states and localities. This mechanism of legislative oversight requires periodic evaluations of programs and the termination of those programs for which continuance cannot be justified. While differing from state to state, most sunset legislation provides for the following:
(1) Agencies and/or programs are assigned an mandatory termination date, and if the legislative body takes no formal action, the enter-prise is concluded (that is, the sun sets) on that date.
(2) The agency is given an opportunity is to justify its continued existence (or the continuance of certain programs) prior to termination. This justification may entail any number of evaluation indices (an may involve a performance audit or may be undertaken in conjunction with zero-base budgeting or service level analyses).
(3) The legislative body has the option to reinstate or to reconstruct the agency or programs, or to terminate it. Reinstatement may leave the agency/program unchanged, whereas reconstruction may lead to significant modifications in the mandate and responsibilities of the agency/program.
(4) If reauthorized or reconstructed, the agency or program will again be subject to review and possible termination at the end of the next cycle. [15]
As initially conceived (in Colorado and Florida), sunset laws were to be relatively selective in application, focusing for the most part on state regulatory agencies. Otherwise if applied across-the-board, legislators are likely to take the safe route and allow the agencies/program continue. While sunset laws can be a much more pervasive tool than experience to date has evidenced, their application remains highly dependent on previously constituted management decisions.
Program Reconstruction
The scale and time frame of evaluations must be such that the findings can assist in formulating program improvements. Moreover, evaluations must specify program problems in a way that alternative courses of action are clearly indicated.
The real art of program improvement is not bold guillotining of unpromising programs, but rather the reconstruction or renegotiation of the program developing process. The concept of program reconstruction is based on the feedback stage of the systems model, wherein initial program outputs are modified in response to the reactions of affected groups and sources of support. Reconstruction suggests a refining and retargeting of programs (and policies) rather than setting totally new directions.
Program terminations are rare; curtailment is likely to be a more common approach. A number of problems of organizational and socio-political inertia may be encountered. Complex organizations have an uncanny instinct for survival, and as a consequence, programs may be constantly adapted to emerging situations in order to avoid termination. Given the hard-fought battles necessary to obtain a policy or program in the first instance, public officials have a natural reluctance to consider the issue of termination. Significant political and/or client groups often support programs beyond their span of effectiveness, and programs have certain rights of "due process." Thus, mounting campaigns for termination often can be costly, both monetarily and politically.
Strategic reconstruction often is possible with public programs, particularly if such adjustments are amenable to entrenched interests. Peter de Leon offers several guidelines for program modification:
(1) Modification and/or termination should not be viewed as the end of the world; rather it is an opportunity for program improvement.
(2) Modification and/or termination should coincide with systematic evaluation.
(3) Policies and programs have certain "natural points"--times and places in their life spans--where reconsiderations are more likely and more appropriate.
(4) The time horizon for gradual change is a significant factor.
(5) The structure of incentives might be changed to promote modifica-tions; for example, agencies might be permitted to retain a portion of the program funding that they voluntarily cut.
(6) Agencies might employ a staff of "salvage specialists," trained in reallocating resources. [16]
Increasingly, government activities are constrained by impending fiscal crises, and thus, terminations, or at least reconstructions, are becoming more viable.
Summary
In applying the techniques of program evaluation, the effectiveness of ongoing and proposed programs is assessed in terms of agreed-upon goals and objectives and areas needing improvement through program modification are identified, including the possible termination of ineffective programs. A program evaluation must take into account the possible influence of external as well as internal organizational factors.
Formative evaluations provide information necessary to design or modify service delivery systems and to set goals and objectives for these systems. Summative evaluations measure performance and program impacts. Information derived from summative evaluations of program impacts provides input for continuing formative evaluative efforts.
The repertoire of evaluative techniques include: (1) before-and-after comparisons; (2) time-trend-data projections; (3) with-and-without comparisons; (4) comparisons of planned versus actual performance; and (5) controlled experimentation. An evaluation should begin with an identification of the relevant program objectives and the corresponding evaluative criteria. The major purpose of evaluation is to identify changes in those criteria that can be reasonably attributed to the program or activities under study. Other factors--such as external events or the simultaneous introduction of other related programs--may have precipitated the observed changes and not the program under evaluation. Thus, the final step in any evaluation should include an explicit search for other plausible explanations for the observed changes and, if any exist, an estimate of their effects on the data.
Endnotes
[1] Joseph S. Wholey, "What Can We Actually Get from Program Evaluations?" Policy Science, Vol. 3 No. 3 (1972), pp. 361-369.
[2] D.N.T. Perkins, "Evaluating Social Intervention: A Conceptual Schema," Evaluation Quarterly 1 (November 1977), pp. 642-645.
[3] Alan Walter Steiss and Gregory A. Daneke, Performance Administration (Lexington, Mass." Lexington Books--D.C. Heath and Co., 1980), p. 226.
[4] See: Rehka Agawala-Rogers, "Why Is Evaluation Research Not Utilized?" Evaluation Studies, Vol. 2 (Beverly Hills, CA.: Sage Publications, 1979); Carol Weiss and Michael J. Bucuvalas, "The Challenge of Social Research to Decision-Making," in Using Social Research in Public Policy Making, edited by Carol Weiss (Lexington Mass.: Lexington Books--D.C. Heath and Co., 1977), pp. 213-234.
[5] Robert Clark, "Policy Implementation: Problems and Potentials," (Paper presented at the Southern Political Science Association meeting, October, 1976).
[6] E. S. Quade, Analysis for Public Decisions (New York: American Elsevier Publishing Company, 1975), p. 235.
[7] Herbert A. Simon and C.E. Ridley, Measuring Municipal Activities (Chicago: International City Managers' Association, 1938).
[8] Carol H. Weiss, Evaluation Research (Englewood Cliffs, N.J.: Prentice-Hall, Inc., 1972), p. 46.
[9] Ibid., p. 47.
[10] Ibid., p. 50.
[11] Peter H. Rossi, Howard E. Freeman, and Sonia Wright, Evaluation: A Systematic Approach (Beverly Hill, Calif.: Sage Publications, 1979), pp. 172-175.
[12] Adopted from Harry P. Hatry, et al., How Effective Are Your Community Services? (Washington, D.C.: The Urban Institute, 1977), pp. 207-213.
[13] Rossi, op. cit., p. 224.
[14] Ibid., p. 283.
[15] See: Bruce Adams, "Guidelines for Sunset," State Government (Summer, 1976).
[16] Peter de Leon, "A Theory of Termination," a paper presented at the American Political Science conference, September, 1977; available as a publication from the Rand Corporation, Santa Monica, California.