Back to cover

Results and Performance of the World Bank Group 2020

3 | Part II: Assessing Outcome Levels

Introduction

Project and program ratings give a helpful picture of Bank Group achievement against stated objectives, but objectives set outcomes at different levels, and the line of sight to higher-order development goals varies considerably. Beneath every rating is a wealth of information about where the Bank Group is focusing its efforts and what these mean in relation to its outcome orientation. This part of the RAP classifies objectives according to their outcome levels and examines links between performance and outcome levels.

This chapter presents a theory of change framework to classify outcome levels. Because of a lack of data, the framework is by no means exhaustive, but it still offers a common lens to understand outcomes and outcome levels across sectors and Bank Group institutions, thus providing essential new information about the most typical types of outcomes that make up the project portfolio. The next section looks at the distribution of project outcome levels in samples of World Bank and IFC projects. This is followed by an assessment of the relationship between projects’ outcome levels and ratings, which may help shed light on some of the risk-return trade-offs when project teams are formulating project objectives. This part concludes by reviewing the outcome orientation of key thematic areas.

Outcome Classification Framework

The novel outcome classification framework uses a theory of change logic to define comparable and complementary outcome levels. IEG synthesized sectoral theories of change derived from World Bank and IFC projects, among other sources, to build the outcome classification framework and validated the classifications on World Bank and IFC projects. Box 3.1 describes the framework and the samples that IEG applied it to. The framework defines four outcome levels. Each level corresponds to a step in a theory of change for how the Bank Group’s work influences clients’ development outcomes, ranging from outputs at level 1 to early, intermediate, and long-term outcomes at levels 2 to 4. IEG defined shifters to distinguish one outcome level clearly from another (figure 3.1).1

For example, level 1 outcomes include project deliverables, but level 2 outcomes include changes in the development status quo that resulted from the level 1 deliverables. Hence, level 2 outcomes follow quite directly from project outputs and often focus on improved access, capacity, regulation, planning, provision, and quality of public services—all of which represent relatively immediate benefits to beneficiaries. Level 3 outcomes follow indirectly from project interventions and are beyond the direct control of the World Bank and its clients. At level 3, the level 2 outcomes have led to material improvements that solved development problems, causing sectorwide ripple effects that benefit end beneficiaries. The ripple effects of level 4 outcomes are even deeper and wider. These are outcomes with systemic effects nationally or across sectors that contribute to general well-being. Level 4 outcomes correspond to the Sustainable Development Goals, the twin goals, and other higher-level outcomes to which the Bank Group aspires. Figure 3.2 shows representative examples taken from World Bank project objectives.2

Figure 3.1. Steps in the Outcome Levels

Image

Source: Independent Evaluation Group.

Figure 3.1. Steps in the Outcome Levels

Source: Independent Evaluation Group.

The framework captures World Bank projects’ intended and achieved outcomes and IFC projects’ intended outcomes. IEG designed the framework to compare outcomes in a comparable manner across sectors (box 3.1). At project design, all project documents state a clear objective, called a project development objective at the World Bank and claims at IFC under its new Anticipated Impact Monitoring and Measurement framework. During project implementation, teams and clients manage projects to achieve these objectives. When projects close, self-evaluations review whether they achieved their stated objectives, with validation by IEG.

Figure 3.2. Representative World Bank Project Objectives

Image

Source: Independent Evaluation Group.Note: IT = information technology.

Figure 3.2. Representative World Bank Project Objectives

Source: Independent Evaluation Group.Note: IT = information technology.

Box 3.1. The Outcome Level Framework

This framework and its application have strengths and weaknesses. Comparability across sectors and countries is a key strength, which the Independent Evaluation Group (IEG) ensured by defining comparable yardsticks and applying internal and external quality assurance. For example, IEG validated the framework through pilots and expert consultations, defined key terms in project development objectives (PDOs) that are indicative of different outcome levels, developed detailed coding guidance, and tested for interrater reliability (the reliability of multiple coders to code the same outcomes) by having multiple team members independently code the same PDO to standardize coding scores. However, the framework is a blunt tool. It focuses on stated and measured objectives, which may not be the same as the actual outcomes. It simplifies outcomes’ complex social realities into four categories that do not factor in context, so one country’s simple achievement could be another’s ambitious outcome.

Coding focused on PDOs, which are summaries that approximate projects’ intended outcomes, but sometimes they are vague or may not comprehensively reflect all of the project’s objectives. To overcome this challenge, IEG consulted the project’s indicators when in doubt but did this less often for investment project financing than for development policy financing, which was harder to assess because of long PDOs with multiple parts. For composite PDOs with more than one subobjective, IEG chose the highest. For two samples, new World Bank projects and new International Finance Corporation (IFC) projects, IEG followed a different approach that reviewed PDOs and indicators’ outcome levels separately to compare them. Another challenge of the framework was differentiating between level 3 and level 4 outcomes. The difference between these outcomes is conceptually clear, but in practice, it can require the coder to subjectively judge the projects’ real objectives and then approximate how deep and how systemic are the outcomes to which these objectives aspire.

IEG applied the framework to four project samples:

Recently closed projects: all 989 Implementation Completion and Results Report Reviews completed from fiscal year (FY)17 to FY20 (April). This sample is large enough to allow Global Practice comparisons.

Older projects with IEG field evaluations: all 42 Project Performance Assessment Reports from FY19 and FY20 available in March 2020. Analysis focused on achieved outcomes in the sample’s 114 component objectives. This sample shows actual project outcomes that IEG verified in the field.

New World Bank projects: a statistically representative sample of 161 projects approved in FY19, indicative of recent approvals.

New IFC projects: a random sample of 29 recently approved IFC investment projects. Analysis covered the 100 project and market claims in this sample.a

Source: Independent Evaluation Group.

Note:

a. The Independent Evaluation Group assessed project objective statements and related indicators in International Finance Corporation projects approved in FY20 to understand the types and levels of outcomes in projects processed under its new Anticipated Impact Monitoring and Measurement (AIMM) system. IEG identified all AIMM claims in the project summaries for 29 randomly selected investment projects and indicators in 21 of these projects. This was not an evaluation of AIMM as a tool. IEG did not review AIMM scores or the underlying methodologies for calculating AIMM scores or review projects’ actual outcomes. The 29 sampled projects made 100 claims, of which 61 were project claims and 39 were market claims. All project and market claims were clearly formulated objective statements. Sampled projects contained 142 indicators, of which 58 percent were project indicators, 18 percent were market indicators, and the remaining 24 percent required corporate and other indicators unrelated to AIMM claims.

Project OutcomesThis section analyzes the distribution of projects’ objectives to understand what types of outcomes most projects intend to achieve and measure. It does so partly in response to Board members’ demands for more evidence on outcomes. Until now, the understanding has been that projects pursue diverse objectives across diverse sectors, contexts, and instruments, with limited room for generalization. For the first time, this research shows that project objectives cluster in clear patterns depending on sector and lending instrument. Most IPF objectives cluster at level 2 around quality and access to services. A few sectors, most notably agriculture and environment, state IPF objectives at level 3 with clearer focus on end beneficiaries, and most DPFs state their objectives at level 3 with a focus on policy reform outcomes. Recently approved IFC projects often state their objectives at level 3, particularly in relation to market creation objectives.

Most IPFs have project development objectives that aim for level 2 outcomes. IEG classified 72 percent of IPF projects’ objectives in the recent Implementation Completion and Results Report Review (ICRR) sample at level 2 (figure 3.3).3 By far, the most common level 2 IPF objectives improve quality or access to social- or infrastructure-related public services. Most IPFs—and by extension most of the World Bank’s work—intend to strengthen public sector capacity. This reflects the strong emphasis in World Bank operations on improving public sector capacity and performance as an enabler of higher-level change. The prevalence of service access objectives also reflects the relatively easier measurement and attribution to World Bank support of such objectives.

Figure 3.3. Outcome Levels in IPF and DPF Projects

Image

Source: Independent Evaluation Group.

Note: DPF = development project financing; IPF = investment project financing.

Figure 3.3. Outcome Levels in IPF and DPF Projects

Source: Independent Evaluation Group.

Note: DPF = development project financing; IPF = investment project financing.

IPFs in a few sectors pursue level 3 outcomes more often. Level 3 outcomes were found in 26 percent of all IPF project objectives and level 4 in 1 percent (figure 3.3). However, there is clustering in some sectors: half of Agriculture and Environment GP projects, 35 percent of Transport GP projects, 31 percent in the Water GP, and 27 percent in Energy and Extractives. The share of level 3 and 4 objectives is far lower in other GPs, ranging between 10 and 14 percent. Common examples of IPF level 3 outcomes include improved agricultural productivity, yields, and incomes; improved management of protected areas; climate resilience; and transport connectivity. A focus on sectorwide change and end beneficiaries characterizes these types of level 3 outcomes. Such outcomes are different from most IPFs’ focus on level 2 service access and capacity. The reason for the variation across GPs in objectives’ outcome levels is not entirely clear, though the ability to define suitable indicators plays a role in how teams set objectives.

DPFs’ objectives cluster around yet other types of outcomes. Objective statements at outcome levels 3 and 4 were found in 54 and 19 percent, respectively, of World Bank DPFs (figure 3.3). DPFs seek to induce change through policy, institutional, and governance reforms. DPFs achieve their objectives less often, resulting in lower ratings compared with IPFs, as seen in part I. Representative examples of level 3 DPF objectives include macrofiscal stability, improved transparency and accountability, and increased domestic tax revenue. However, 26 percent of DPF outcomes in the ICRR sample were at level 2. Examples include technical support to policy and regulatory reforms and the first of a series of planned DPFs, with higher intended outcomes for subsequent DPFs.

Level 1 outputs are rare in project objective statements. Only 1 percent of objectives in recent ICRRs and 9 percent of objectives in recent approvals had level 1 outputs in the objective statement. No DPFs or IFC projects in the samples had output objective statements. It is established good practice to focus on outcomes, so it is positive that so few projects have level 1 objective statements.

Project objectives in FCV-affected countries are not distributed differently. In FCV projects, 71 percent of objectives are at level 2, 25 percent are at level 3, and 4 percent are at level 4. This compares with 64 percent at level 2, 31 percent at level 3, and 4 percent at level 4 in non-FCV-affected countries. The similarity of outcome levels in FCV and non-FCV countries is surprising because of the higher contextual risks in FCV-affected countries and the need for quick and simple attainable goals, as discussed in part I. IEG also observed that results frameworks in FCV projects did not commonly capture conflict drivers or outcomes on fostering the country’s resilience to conflict and violence. FCV countries need agile responses to their unique challenges, an aspect that falls outside the outcome level classification framework.

IEG also compared projects’ indicators to their objective statements to assess whether the indicators’ levels matched the objectives’ outcome level. It found that 14 percent of recently approved projects have no indicator of the same level as the objective, which could suggest that there are no indicators able to measure the objective’s achievement.

Recently approved IFC projects under the Anticipated Impact Monitoring and Measurement system often state their objectives in relation to higher-level outcomes because they are aligned with IFC’s goals of creating markets and fostering private sector development (box 3.2). Outcome level 3 and 4 objective statements were found in 67 and 15 percent, respectively, of recent IFC market claims, and in 39 and 13 percent, respectively, of project claims (figure 3.4). Level 2 objective statements were found in 18 percent of market claims and 48 percent of project claims. Figure 3.5 shows representative examples of IFC claims. Twelve percent of IFC project and market claims did not have indicators that matched the claim’s outcome level, which could suggest that none of the selected proxy indicators are able to measure the objective’s achievement.

Figure 3.4. IFC Project and Market Claims’ Outcome Levels

Image

Source: Independent Evaluation Group.

Note: IFC = International Finance Corporation.

Figure 3.4. IFC Project and Market Claims’ Outcome Levels

Source: Independent Evaluation Group.

Note: IFC = International Finance Corporation.

Box 3.2. IFC’s AIMM System for Setting Project Objectives

The Independent Evaluation Group (IEG) analyzed objectives in recent International Finance Corporation (IFC) projects to understand how IFC’s new Anticipated Impact Monitoring and Measurement (AIMM) system articulates intended outcomes.a Under AIMM, IFC projects include multiple project claims and market claims, but the World Bank sets only one objective per project. Project claims are defined as a project’s direct and indirect effects on stakeholders, the economy, and the environment and are comparable to World Bank projects’ project development objectives. Market claims are derived effects, defined as a project’s ability to catalyze systemic changes beyond those effects brought about by the project itself. IEG did not review AIMM scores, an index number for a combination of the depth and likelihood of project outcomes and contribution to market creation.

Overall, IEG found that the system of project and market claims contained clear objective statements that aligned well with IFC’s higher-level goals of creating markets and fostering private sector development. AIMM ensures that project objectives align with IFC’s goals. Although IFC can ensure such alignment because of its focused business model and goals, the World Bank operates with objectives that are more diverse because of its diverse sector and country contexts. It is too early to tell what impact AIMM will have on outcome achievement, ratings, evidence, and incentives because no project under AIMM has been evaluated yet.

Source: Independent Evaluation Group.

Note: a. This sample is different from the rated sample analyzed in part I, which did not include projects with AIMM claims.

Figure 3.5. Representative Examples of IFC Claims

Image

Source: Independent Evaluation Group.

Note: IFC = International Finance Corporation.

Figure 3.5. Representative Examples of IFC Claims

Source: Independent Evaluation Group.

Note: IFC = International Finance Corporation.

Project Outcome Levels and Ratings

This section combines the outcome level classification and ratings to examine the relationship between projects’ outcome levels and projects’ performance. This analysis was motivated by World Bank management’s efforts to better identify the risk-return trade-offs when formulating project objectives and related questions about whether the rating system influences project teams’ incentives when setting objectives. The analysis also aimed to explore potential explanations for key performance patterns identified in part I. Efficacy ratings (which assess to what extent projects achieve their stated objectives) and outcome ratings (which consider the project’s relevance and efficiency) were used.4

The relationship between objectives’ outcome levels and projects’ performance is only modest and becomes insignificant when controlling for other factors. Specifically, ratings for projects with level 3 and 4 outcomes are modestly lower than for projects with level 2 outcomes, and the difference in ratings is insignificant when controlling for instrument and other factors such as M&E quality (box 3.3). This finding runs counter to a key assumption before doing the analysis: one reason for not setting higher-level objectives is the risk of a lower rating on outcome and M&E. Instead, the finding shows no systematic trade-off between projects’ outcome level and ratings. This implies that many projects with higher-level objectives manage to achieve good outcome ratings, in part by having strong results frameworks to measure outcome achievement. Although the model does not provide any more detail on the causal relationship between objectives set at design and projects’ eventual performance—both depend on specific country and sector contexts—it does point to larger questions about when it makes sense for projects to set higher-level objectives and what it takes for such projects to be successful in reaching their intended outcomes.

Box 3.3. Regression of Projects’ Performance on Outcome Levels and Other Factors

A regression analysis on the Implementation Completion and Results Report Review sample shows that projects’ outcome levels do not play a statistically significant role for these projects’ efficacy rating when controlling for lending instrument. Regressing efficacy ratings on outcome levels and a dummy for lending instrument shows that investment project financing projects have markedly higher efficacy ratings than development policy financing projects do, in line with the findings of part I (model 2 in table B3.3.1). The difference in efficacy rating between lending instruments is statistically significant at the 0.001 percent level, whereas the outcome level is not statistically significant in this model. The negative relationship between efficacy and outcome level in model 1 is driven by the fact that investment project financing, which have higher efficacy ratings than development policy financing, also have lower outcome levels. The results are also robust to including projects’ monitoring and evaluation quality rating (model 3). In this model, the lending instrument and monitoring and evaluation quality affect efficacy rating at the 0.001 significance level, and outcome level remains statistically insignificant. The results are the same when controlling for Global Practice as random effect (model 4).

Table B3.3.1. Regression Results

Variable

1

2

3

4

Outcome Level

−.0970**

(0.0323)

−0.0500

(0.0356)

−0.0305

(0.0298)

−0.0305

(0.0328)

IPF (vs. DPF)

0.1982***

(0.0538)

0.2151***

(0.0427)

0.2151***

(0.0304)

M&E rating

0.4793***

(0.0243)

0.4793***

(0.0231)

GP as random effect

Yes

Number of observations

949

946

944

944

Source: Independent Evaluation Group.

Note: DPF = development policy financing; GP = Global Practice; IPF = investment project financing; M&E = monitoring and evaluation.

**p < 0.01

Source: Independent Evaluation Group.

Note: DPF = development policy financing; GP = Global Practice; IPF = investment project financing; M&E = monitoring and evaluation.

**p < 0.01

***p < 0.001

Pairwise comparisons of efficacy and outcome ratings illustrate the same tendency of modestly lower ratings as outcome levels increase (figure 3.6 and table 3.1). DPFs at level 3 are rated modestly lower on outcomes than DPFs at level 2—true in both the ICRR and the Project Performance Assessment Report sample—and DPFs at level 4 are rated lower. Efficacy ratings are marginally lower for DPFs at higher outcome levels. IPFs tell a similar story. IPFs with level 2 outcomes have marginally higher efficacy ratings and somewhat higher outcome ratings than IPFs with level 3 outcomes—77 percent MS+ compared with 72 percent MS+. (The result for level 4 is not robust because of the small sample size.) Similar patterns are seen in many GPs and in a large sample of older projects (box 3.4).5 IEG next examines whether outcome levels help explain ratings differences reported in part I between GPs and project types.

Table 3.1. Ratings and Outcome Levels, by Instrument

Outcome Level

IPF

DPF

Projects (no.)

MS+ (percent)

Average efficacy ratinga

Projects

(percent)

MS+ (percent)

Average efficacy ratinga

Level 1

9

89

2.8

0

n.a.

n.a.

Level 2

580

77

2.7

45

71

2.5

Level 3

211

73

2.7

93

68

2.5

Level 4

7

57

2.6

33

61

2.3

Source: Independent Evaluation Group.

Note: DPF = development policy financing; IPF = investment project financing; MS+ = moderately satisfactory or above; n.a. = not applicable.

a. IEG uses a numerical conversion of the four-point efficacy rating.

Figure 3.6. Ratings and Outcome Levels, by Instrument

Image

Source: Independent Evaluation Group.

Note: Circles show efficacy ratings, and lines show percentage of projects rated MS+. DPF = development project financing; IPF = investment project financing; MS+ = moderately satisfactory or above.

Figure 3.6. Ratings and Outcome Levels, by Instrument

Source: Independent Evaluation Group.

Note: Circles show efficacy ratings, and lines show percentage of projects rated MS+. DPF = development project financing; IPF = investment project financing; MS+ = moderately satisfactory or above.

Box 3.4. Outcome Levels and Ratings over a Longer Period

The Independent Evaluation Group used machine learning to extend the outcome classification to older projects. It used the population of all 3,119 projects that were rated since 2009 for which the relevant information was readily available. The machine learning algorithm classified projects based on their objectives at either level 2 or level 3, with 92 percent accuracy on a test data set. Precision was lower for levels 1 and 4 because of small sample sizes, and therefore these results are not used here. Looking at only investment project financing (IPF), the outcome levels were broadly constant across the years. The algorithm coded 75 percent of projects at level 2 and 25 percent at level 3. Development policy financing projects had higher outcome levels than IPF projects in the machine-coded data. Furthermore, consistent with the other findings of this section, the outcome ratings for level 3 IPF projects were only marginally lower than level 2 IPF projects—71 percent moderately satisfactory or above compared with 73 percent.

Source: Independent Evaluation Group.

When revisiting the key performance patterns identified in part I, IEG finds that projects’ outcome levels do not explain the low ratings for projects in the MTI and Governance GPs. IEG combined the Education; Urban, Resilience, and Land; and Transport GPs (which tend to deliver basic services and are among the highest-rated GP portfolios) and combined the MTI and Governance GPs (both of which focus on policy and institutional reforms, often using DPFs, and are the lowest-rated GP portfolios). Table 3.2 shows that MTI and Governance projects have lower ratings compared with Urban, Education, and Transport projects, regardless of their outcomes level. The table also shows that MTI and Governance projects with level 3 outcomes achieved the same ratings as projects with level 2 outcomes (61 percent MS+ compared with 60 percent), and there was only a limited ratings decline for projects with level 4 outcomes (55 percent MS+). Looking only in FCV-affected countries, MTI and Governance projects with level 2 and 3 outcomes are again rated equally. Instead, the explanation for these key performance trends is related to the DPF instrument, which MTI and Governance GPs use much more often than other GPs.

Multiple factors help explain lower ratings for DPFs (and thus for MTI and Governance GP projects). Policy and institutional reform objectives are more prone to risk and uncertainty than service delivery objectives. Some of those risks relate to the longer time frame needed for DPFs’ policy reforms to lead to outcomes. Such reforms must successfully proceed through a long change pathway to arrive at desired outcomes. For example, a policy reform supported by a DPF must build from a prior action (for example, a parliamentary proposal for a legislative change) to approving and enacting the change and waiting for that change to achieve intended higher-level outcomes, such as people or firms behaving differently and spurring economic growth. Each of the links in this chain depends on actions by governments, parliaments, and economic actors outside of the project’s control. Some risks relate to the nature of the DPF instrument itself. For example, the World Bank has less room to make course corrections to achieve results in DPFs than it does in IPFs, especially in stand-alone DPFs. Evaluation methods also play a role because, in reality, they differ between IPFs and DPFs. For example, DPFs’ outcome ratings are based only on assessment of relevance and efficacy, with no assessment of efficiency as done for IPF, and furthermore with challenges in assessing DPFs’ relevance and efficacy.6

Table 3.2. Ratings and Outcome Levels for Select Global Practices and Project Types

Outcome Level

Education, URL, and Transport

MTI and Governance

Projects (no.)

MS+ (percent)

Average efficacy ratinga

Projects (no.)

MS+ (percent)

Average efficacy ratinga

Level 1

6

100

3.1

0

n.a.

n.a.

Level 2

237

84

2.8

53

60

2.5

Level 3

39

77

2.7

100

61

2.4

Level 4

3

100

2.8

28

55

2.2

Source: Independent Evaluation Group; recent Implementation Completion and Results Report Review sample.

Note: MS+ = moderately satisfactory or above; MTI = Macroeconomics, Trade, and Investment; URL = Urban, Resilience, and Land; n.a. = not applicable.a. IEG uses a numerical conversion of the four-point efficacy rating.

Projects with level 3 and 4 objectives appear to have adequate result frameworks and M&E systems as often as other projects do. It seems intuitive that it would be harder to design adequate result frameworks for projects with higher-level outcomes, yet the evidence suggests otherwise.7 Project M&E ratings decline little as outcome levels increase. Projects with objectives at level 2 were rated 45 percent high or substantial on M&E quality compared with 43 percent for level 3 and 44 percent for level 4 projects (table 3.3). A similar pattern emerges when looking at IPFs only. Similarly, IEG rates IPF projects low when there is insufficient evidence to confirm the projects’ achievement of objectives. This happened to at least 6 percent of all IPF projects with level 2 outcomes and 9 percent with level 3 outcomes (table 3.3).8

Table 3.3. Project M&E Rating and Evaluated Projects with Lack of Evidence, by Outcome Level (percent)

Outcome Level

Projects Rated High and Substantial on M&E

IPF Projects Rated High and Substantial on M&E

IPF Projects with Lack of Evidence

Level 1

n.a.

n.a.

n.a.

Level 2

45

45

6

Level 3

43

41

9

Source: Independent Evaluation Group.

Note: Values for sample sizes of 10 or fewer projects not shown. IPF = investment project financing; M&E = monitoring and evaluation; n.a. = not applicable.

The risk-return trade-off does not appear to be very pronounced in these data. Outcome levels vary across GPs and instruments, but this is not the reason for performance differences because IPF projects with level 3 objectives and DPF projects with level 3 and 4 objectives do not appear to have markedly higher risk of weak performance compared with projects with lower-level objectives. Half of agricultural and environmental IPF projects set their objectives at level 3 and still register mostly strong achievements. Differences in performance appear to be more closely associated with levels of risk and uncertainty and the time and complexity involved in pursuing policy and institutional reforms. Questions remain about what is required for projects to set and achieve ambitious objectives.

Thematic Area Outcomes

This section considers how the Bank Group aggregates project and program results in key thematic areas and the implications for its outcome orientation. The IEG team reviewed corporate strategies, Corporate Scorecards, and results measurement systems for three Global Themes: Gender; Climate Change; and Fragility, Conflict, and Violence (looking particularly closely at Climate Change).

The Bank Group has clearly articulated higher-level outcomes for its thematic work. Bank Group corporate strategy documents set out clear high-level outcome goals, most famously the twin goals on poverty and shared prosperity and the commitment to the Sustainable Development Goals. Furthermore, there are many other goals, targets, and policy commitments set in different sectoral and thematic areas through the 2018 Bank Group capital package; the IDA Replenishments; World Bank Group Climate Change Action Plan 2016–2020; World Bank Group Gender Strategy (2016–2023): Gender Equality, Poverty Reduction, and Inclusive Growth; and World Bank Group Strategy for Fragility, Conflict, and Violence 2020–2025, among others.

The Bank Group has extensive systems to track and aggregate its results, but these systems often operate at some distance from higher-level outcomes. All projects and country programs have results frameworks with objectives, indicators, and M&E systems to capture those indicators. These projects and programs undergo self-evaluations that IEG validates and rates, and these form the backbone of the Bank Group’s results measurement system. Aggregated data from projects and country programs appear in the Bank Group’s Corporate Scorecards, IDA’s results measurement system, and thematic results measurement systems, such as those for gender and climate change. Yet these data focus on internal processes and the number of people reached by health, water, financial, education, sanitation, electricity, and agricultural services. Such reach indicators correspond to level 2 outcomes, but they convey little about the service’s quality and impact on human well-being and, therefore, do not help staff manage to those outcomes. Only a few of the indicators in the Corporate Scorecards, IDA’s results measurement system, and thematic results measurement systems track higher-level outcomes.

The results measurement systems for thematic areas do little to support the Bank Group’s outcome orientation. The RAP defines outcome orientation as gathering credible evidence on outcome achievement; using this evidence to adapt interventions and portfolios, engage clients, and learn; and thus becoming more effective at achieving positive social change. This definition is not about encouraging staff to aim for any particular level of outcomes. Rather, strong outcome orientation requires collecting credible evidence on progress and achievements and ensuring that staff have the right incentives to use the evidence to pursue positive social change relevant to the context of countries and sectors. Outcome orientation is different from achieving targets and monitoring processes.

Instead, corporate results measurement systems help senior management track and incentivize operational fulfillment of corporate policy commitments. The Bank Group’s ability to track and report on its policy commitments confers legitimacy and credibility on the organization and has undoubtedly helped it secure strong IDA replenishments and IBRD and IFC capital increases. Corporate indicators incentivize operations to integrate these themes into their work streams and meet targets. For example, when the World Bank committed to engage citizens in all applicable projects and started tracking this, the share of projects with citizen engagement indicators in their results frameworks increased quickly, but there was limited evidence on the quality, influence, or outcomes of citizen engagement (World Bank 2018a). Box 3.5 examines how the climate change results measurement system has helped the Bank Group meet or exceed its climate action targets.

Box 3.5. The Climate Change Results Measurement System

The World Bank Group Climate Change Action Plan (CCAP) was adopted in April 2016 and lays out ambitious climate-related targets for 2016–20. The Bank Group has reported annually on progress for over 30 climate change–related actions and targets and is preparing a retrospective summary report. Through these targets, the Bank Group monitors how well it integrates climate change into operations and strategies. The vast majority of indicators, 90 percent, relate to actions under the World Bank’s direct control, including inputs, such as financing for climate action; internal processes, such as greenhouse gas accounting and risk screening; and outputs, such as the number of products that support countries and cities with climate-related policies, strategies, and capacity building.

The results measurement system used for tracking CCAP targets has a limited focus on projects’ and programs’ quality and higher-level outcomes. The system incentivizes operations to adhere to process requirements, for example, to adjust cost-benefit analysis for the shadow price of carbon and screen or assess projects’ potential climate risks. However, it is unclear if requiring risk screening influences projects’ and country programs’ design, quality, and outcomes. Furthermore, there is no evidence on how well projects address climate risks.

In the CCAP itself, the main commitment to increase the share of climate change–related commitments to 28 percent has driven all subsequent outputs and outcomes. At the level of the many institutional CCAP targets, only approximately 10 percent relate to outcomes, including level 2 outcome indicators, such as the amount of commercial funds mobilized for clean energy or the number of people covered by climate-adaptive social protection and early-warning services.

The CCAP reporting does not, however, assess the Bank Group’s contributions to greener or more resilient national development trajectories. On the whole, the CCAP results measurement system has driven accountability and internal incentives to mainstream climate action across the Bank Group and has tracked progress in meeting targets, but it does not guide operations toward key outcomes or assess the quality of those outcomes.

Source: Independent Evaluation Group.

Corporate mandates and indicators cascade down to operational departments and can potentially drive box-checking behaviors. If operations sought to maximize the reach indicators for service access in the Bank Group Corporate Scorecard, they could increase the number of people covered by water, health, electricity, and other services at the cost of service quality. However, if operations instead had evidence of the quality of services, the capacity of institutions, and beneficiaries’ productivity and well-being, they might be better able to manage for those outcomes. In another example, an emergency health project in an Ebola-affected country was held back at one point because it did not meet a minimum threshold for climate cobenefits. Overall, the challenge is to ensure that targets create incentives that are compatible with outcome orientation, as discussed in the next, concluding, chapter.

  1. These terms come from standard evaluation and results-based management literature. However, although these terms call attention to outcomes’ time dimension, the coding framework emphasized the sequential steps in the logic of how interventions lead to outcomes.
  2. This is a departure from the IEG’s traditional project ratings. These are determined by IEG after projects close, based on achieved outcomes relative to intended objectives. IEG project development objective ratings consider only the declared project development objective to a limited extent, namely, in the assessment of the project’s relevance.
  3. The share was similar in the recently approved sample, at 68 percent of investment project financing objectives at level 2.
  4. IEG uses a numerical conversion of the four-point efficacy rating. Efficacy ratings are sometimes given for subobjectives. In that case, the average of the subobjective ratings was calculated.
  5. For example, Transport projects with level 3 outcomes have somewhat lower outcome ratings (74 percent moderately satisfactory or above) than projects with level 2 outcomes (83 percent moderately satisfactory or above), though efficacy ratings are identical for both outcome levels: 2.7 out of a maximum of 4.
  6. It is hard to assess how and how much development policy financing contributed to overall reform outcomes, given that development policy financing’s prior actions are part of broader reform plans. Instead, evaluators can focus on the relevance of the prior actions and the results indicators. Planned reforms of the evaluation methodology for development policy financing aim to strengthen these dimensions.
  7. Recall that the quality of projects’ monitoring and evaluation is important for ratings, according to the regression analysis and the analysis presented in part I.
  8. These figures are a lower bound estimate based on Implementation Completion and Results Report Reviews, in which the IEG reviewer explicitly noted weak evidence as a reason for the rating decision.