Back to cover

Results and Performance of the World Bank Group 2023

Chapter 2 | World Bank Results and Performance

This is the first Results and Performance of the World Bank Group with a substantial number of closed projects affected by the COVID-19 pandemic during implementation; however, these projects still had a limited exposure to the pandemic, and successful projects are likely overrepresented in the sample.

The World Bank’s overall project performance was not undermined by the effects of the COVID-19 pandemic and the Russian invasion of Ukraine. World Bank projects encountered pandemic-related and other obstacles that hindered implementation, despite those projects being at an advanced stage of implementation before the onset of the pandemic.

The World Bank’s adaptive management and project restructuring during the pandemic contributed to improved project performance.

Improvements in the World Bank’s monitoring and evaluation quality facilitated project adaptation and contributed to providing sufficient evidence of projects’ achievements.

This chapter presents the World Bank performance rating trends of projects that were closed in FY12–22 and evaluated by June 30, 2023. This chapter also analyzes the factors that affected the implementation and performance of IPF, including project restructuring patterns. In addition, it explores the evolution of intended project-level development outcomes and assesses the validity of results framework indicators for measuring these outcomes. It also examines the associations between the validity of indicators and efficacy ratings across projects’ intended development outcomes.

Project Exposure to the COVID-19 Pandemic and Sample Selection Bias

This is the first RAP with a substantial number of projects under implementation during the COVID-19 pandemic. Previous RAPs had limited findings on COVID-19 because very few projects in those RAPs took place during the pandemic. For example, RAP 2020 found that the pandemic did not disrupt the World Bank’s self-evaluation or IEG’s validation processes (World Bank 2020). RAP 2022 found that among the 10 closed projects specifically designed in response to the pandemic, 7 received satisfactory outcome ratings and 3 received moderately satisfactory outcome ratings (World Bank 2022a). The RAP 2023’s analysis of rating trends contained projects that operated during the pandemic, including 684 lending operations that were closed in FY20–22 and evaluated by IEG by June 30, 2023. This RAP’s in-depth analyses focused on 273 IPF projects (RAP 2023 cohort) that were closed between March 2020 and April 2022 and evaluated by IEG by December 2022—all of which operated during the pandemic. The in-depth analyses also included a prepandemic cohort of 398 projects closed in FY18–20 before the pandemic began for comparison purposes. Figure 2.1 illustrates the difference in composition among the overall World Bank portfolio, the FY20–22 RAP 2023 cohort, and the FY18–20 prepandemic cohort.

Projects in the RAP 2023 cohort still had limited exposure to the COVID-19 pandemic. Many projects were already at an advanced stage of implementation when the pandemic began, minimizing the severity of their exposure. On average, only 14 percent of the total life span of the cohort’s projects was during COVID-19.1 Approximately half of the cohort’s projects were exposed for less than 12 percent of their project life span, and 90 percent were exposed for less than 23 percent of their life span. In fact, some projects reported that the pandemic had a limited impact on the quality, nature, or extent of implementation because these projects were already nearing completion when the pandemic began (see box A.1).

In addition, the RAP 2023 cohort is susceptible to a sample selection bias. This overrepresentation of successful projects with higher ratings arises because projects that complete ICRs and ICRRs shortly after closing tend to have higher ratings than projects with delayed ICRs and ICRRs. In other words, the longer it takes to complete the ICR and ICRR, the lower the project ratings. This pattern also applies to the rating trends and occurred in previous years (figure A.3, panel a). However, the RAP 2023 cohort is even more likely to have an overrepresentation of projects with higher ratings because of the evaluation cutoff date of December 2022. The early cutoff date was set to accommodate the time required for the RAP’s in-depth analysis and its data collection. An exploration of the latest Implementation Status and Results Report ratings on progress toward achieving project development objectives (PDOs) in FY22 shows that projects with completed ICRRs—which were therefore included in this RAP’s analysis of rating trends—have higher average Implementation Status and Results Report ratings than projects with completed ICRs but in-progress ICRRs and even higher ratings than projects with uncompleted ICRs (figure A.3, panel b). Therefore, rating trends should be interpreted carefully as they are likely to change downward in the future. This is especially true as more projects with extended exposure times to COVID-19 are incorporated into the project rating trends (see appendix A for more details on the limitations of the data).

Figure 2.1. Composition of the Overall Portfolio in Rating Trends, the Prepandemic Cohort, and the RAP 2023 Cohort

Image
The theory of action diagram shows three interrelated areas that define the quality of the World Bank’s early response.

Figure 2.1. Composition of the Overall Portfolio in Rating Trends, the Prepandemic Cohort, and the RAP 2023 Cohort

Image
The theory of action diagram shows three interrelated areas that define the quality of the World Bank’s early response.

Source: Independent Evaluation Group.

Note: FY = fiscal year; DPF = development policy financing; IPF = investment project financing; PforR = Program-for-Results; RAP = Results and Performance of the World Bank Group.

Project Performance Rating Trends

The World Bank’s project outcome ratings remained high in FY22, despite the effects of the COVID-19 pandemic and the Russian invasion of Ukraine.2 The average outcome rating of 181 IPF and Program-for-Results (PforR) projects in FY22 remained at 4.3 on a 6-point scale as in FY20 and FY21—the highest average since FY12—with the share of projects rated moderately satisfactory or above also staying constant at 83 percent between FY21 and FY22. Moreover, there was a slight improvement in the share of IPF and PforR projects rated satisfactory or above, increasing from 47 percent in FY21 to 49 percent in FY22 (figure 2.2, panel a). This pattern in outcome ratings indicates projects’ resilience to the adverse global context across a large share of project subgroups—including by Region, Global Practice, country income level, and others—rather than being solely influenced by the portfolio’s shift toward highly rated project subgroups (figure B.3). The average outcome rating for 17 development policy financing (DPF) projects in FY22 stayed at an average of 4.0 on a 6-point scale, with the share of projects rated satisfactory slightly increasing from 33 percent in FY21 to 35 percent in FY22. There was a fairly similar decline in the share of projects rated moderately satisfactory, decreasing from 45 percent in FY21 to 41 percent in FY22. Therefore, the share of DPF projects rated moderately satisfactory or above slightly declined from 79 percent to 76 percent (figure 2.2, panel b). This decline in DPF project ratings should be interpreted with caution because of the limited sample size—in FY21, there were only 33 DPF projects, and in FY22, the number decreased further to just 17 projects.

World Bank projects also maintained or improved their average Bank performance ratings. Bank performance ratings for IPF and PforR projects also stayed flat, with an average rating of 4.3 on a 6-point scale in both FY21 and FY22. Although the share of projects rated moderately satisfactory or above declined marginally from 87 percent in FY21 to 86 percent in FY22, the share of projects rated satisfactory or above actually increased from 39 percent in FY21 to 43 percent in FY22 (figure 2.3). The average quality-at-entry rating for IPF and PforR projects—a subcomponent of the Bank performance rating—also remained constant at 4.2 on a 6-point scale, with an increase from 42 percent of projects rated satisfactory and above in FY21 to 44 percent in FY22 but also a decrease from 82 percent of projects rated moderately satisfactory and above in FY21 to 75 percent in FY22. A decomposition analysis shows that, across World Bank Regions, the negative shift in quality-at-entry ratings is largely explained by a drop in project ratings in the South Asia Region, from 85 percent in FY21 to 60 percent in FY22. These quality-at-entry ratings in FY22 were not linked to project preparation challenges caused by COVID-19 because the vast majority of FY22 projects had been approved before March 2020 (figure B.6). Quality-of-supervision ratings—the other subcomponent of the Bank performance rating—also stayed constant at 4.6 on a 6-point scale, with the share of projects rated highly satisfactory increasing from 4 percent in FY21 to 8 percent in FY22 and the share of projects rated moderately satisfactory or above slightly decreasing from 92 percent in FY21 to 91 percent in FY22. Conversely, Bank performance ratings for DPF projects improved from an average rating of 4.3 in FY21 to 4.6 in FY22 on a 6-point scale, and the share of projects rated moderately satisfactory or above went up from 94 percent in FY21 to 100 percent in FY22 (figure 2.4). Design and implementation ratings, which replaced quality-at-entry and quality-of-supervision ratings in DPF projects, exhibited similar patterns, with design ratings increasing from 91 percent in FY21 to 100 percent in FY22 and implementation ratings increasing from 94 percent in FY21 to 100 percent in FY223

Figure 2.2. World Bank Project Outcome Ratings

Image
The theory of action diagram shows three interrelated areas that define the quality of the World Bank’s early response.

Figure 2.2. World Bank Project Outcome Ratings

Image
The theory of action diagram shows three interrelated areas that define the quality of the World Bank’s early response.

Source: Independent Evaluation Group.

Figure 2.3. Bank Performance, Quality at Entry, and Quality of Supervision for Investment Project Financing and Program-for-Results

Image
The theory of action diagram shows three interrelated areas that define the quality of the World Bank’s early response.

Figure 2.3. Bank Performance, Quality at Entry, and Quality of Supervision for Investment Project Financing and Program-for-Results

Source: Independent Evaluation Group.

Figure 2.4. Bank Performance, Design, and Implementation for Development Policy Financing

Image
The theory of action diagram shows three interrelated areas that define the quality of the World Bank’s early response.

Figure 2.4. Bank Performance, Design, and Implementation for Development Policy Financing

Source: Independent Evaluation Group.

The World Bank’s monitoring and evaluation (M&E) quality ratings have consistently improved. The share of IPF and PforR projects rated substantial or high in M&E quality increased from 60 percent in FY21 to 63 percent in FY22 (figure 2.5). This increase was driven by the improved ratings of the Infrastructure Practice Group, up from 37 percent to 56 percent, and the portfolio expansion of the high-performing Human Development Practice Group, which grew from 20 percent to 27 percent of the overall portfolio. M&E quality ratings in IDA fragile and conflict-affected situation (FCS) countries also significantly increased, from 48 percent to 60 percent. M&E ratings declined notably for Western and Central Africa, where the share of projects rated substantial or high dropped from 67 percent to 53 percent. The South Asia Region and the Europe and Central Asia Region had the most pronounced increase, with a growth of 15 percentage points in South Asia and of 11 percentage points in Europe and Central Asia (figure B.9).

Figure 2.5. Monitoring and Evaluation Quality Ratings for Investment Project Financing and Program-for-Results Projects

Image

Figure 2.5. Monitoring and Evaluation Quality Ratings for Investment Project Financing and Program-for-Results Projects

Source: Independent Evaluation Group.

Note: Monitoring and evaluation quality ratings for development policy financing are not reported because they have been dropped from Implementation Completion and Results Report Reviews under the new methodology.

Factors Affecting Project Implementation and Performance

The COVID-19 pandemic was the single most salient challenge facing projects during FY20–22. Despite the limited exposure time to the pandemic, 212 projects, or 78 percent, experienced implementation obstacles caused by the pandemic,4 as reported by ICR documents (figure 2.6).5 Lockdowns and mobility restrictions had adverse effects on countries’ economic activity, leading to disruptions in services and public institution operations. Most projects reported implementation delays caused by supply chain shortages and other logistical challenges, which had an impact on civil works components of projects. The pandemic also led to the postponement of in-person project-related activities and, in some cases, the reallocation of project funds (box 2.1).

Box 2.1. The Impact of the COVID-19 Pandemic on Project Implementation

This Results and Performance of the World Bank Group uncovered the pandemic’s specific underlying effects on projects’ implementation. We conducted a content analysis of 443 extracts of Implementation Completion and Results Report text corresponding to the 212 projects identified with the epidemics factor (see figure 2.6). The underlying effects include the following:

Lockdowns, mobility restrictions, and economic downturn. The outbreak of the COVID-19 pandemic had significant repercussions in projects’ implementation as countries declared states of emergency, imposed nationwide lockdowns, and implemented mobility restrictions, including border closures to curb the virus’s spread. These measures had adverse effects on countries’ economic activity, particularly for informal workers and poor households, and contributed to job losses and even permanent firm closures.

Disruption of services. The implementation of education projects was particularly affected, with widespread school closures disrupting ongoing and planned academic activities and leading to learning losses among students. The closure of technical and vocational education and training institutions in North Macedonia and Afghanistan affected practical training for students, making it challenging for small and medium companies to absorb them. Health and transport projects saw reduced service use, affecting the delivery of services supported by World Bank projects. Preventative health services witnessed a decline as people avoided health care facilities because of contagion fears. Transport projects experienced disruptions, with decreased travel and railway services leading to lower demand and interruptions. Colombia, for example, had an 85 percent drop in public transport demand.

Disruption of operations of institutions. In addition, across all Global Practices, World Bank projects reported that government agencies at the national and local levels faced temporary disruptions to their work schedules and operations, hampering interactions and active engagement with project stakeholders.

Slow-paced activities. The pandemic presented obstacles during the final stages of project implementation across all Global Practices, thus slowing down the pace of activities that in many cases led to the extension of project closing dates to compensate for the lost time during lockdowns.

Shortages in supply chain and logistics challenges delayed civil works. Supply chain and logistics challenges, resulting from lockdowns, border closures, and travel restrictions, caused delays in civil works, particularly in energy, transport, water, and urban projects.

Difficulty of in-person activities. Travel and mass gathering restrictions had a significant impact on project activities requiring physical interaction and mobility (such as training programs, workshops, and technical meetings), leading to the postponement, cancellation, or shift to virtual formats across all Global Practices. This also affected supervision and verification activities, including field missions, making it challenging for technicians and World Bank staff to monitor project progress. Nine projects reported that the pandemic hindered the collection of primary data and field visits, resulting in delayed project reports and the exclusion of certain result indicators from monitoring and Implementation Completion and Results Report preparation.

Reallocation of project funds. In addition, 15 projects reported that the COVID-19 crisis exerted pressure on government budgets and shifted priorities toward pandemic response efforts, leading to cancellation or redirection of project funds to mitigation measures.

Sources: Independent Evaluation Group.

Countries’ institutional capacity, procurement, and conflict and instability were other common challenges during project implementation. The low technical capacity of implementing agencies to execute and supervise work quality hindered the implementation of 39 percent of projects. Such weak institutional capacity was common in the South Asia and Eastern and Southern Africa Regions and in IDA and FCS countries.6 About 31 percent of projects reported challenges with procurement management systems, including delays and inefficient contract management. Procurement challenges were more prevalent in low-income countries, in the South Asia and Europe and Central Asia Regions, and in the Infrastructure and Sustainable Development Practice Groups. Conflict and instability, more prevalent in the Western and Central Africa Region and in IDA and FCS countries, also hindered the implementation of 27 percent of projects in the RAP 2023 cohort.

By contrast, project scope, ex ante risk identification and mitigation, and adaptive management facilitated project implementation. Among project-related factors, 35 percent of projects highlighted that a realistic scope for objectives or strong overall project design had facilitated implementation. Project teams underscored adaptations to unforeseen circumstances as helping implementation in 35 percent of projects and ex ante risk identification and mitigation measures as helping in 27 percent. While across Regions projects tended to report on the adequacy of risk identification and mitigation measures for project implementation, projects in the Latin America and the Caribbean Region, in contrast, tended to report on the inadequacy or insufficiency of teams’ risk identification and mitigation measures for successful project implementation (see figure D.6).

The inadequate identification and mitigation of institutional capacity risks emerged as a challenge in project implementation. Twenty-one of the 56 projects, or 38 percent, acknowledged the failure to adequately identify and mitigate risks and reported that weak implementing agency capacity was the most important implementation risk (figure 2.7; table 2.1). These projects commonly reported that the initial risk assessments conducted before project implementation were overly optimistic given the complexity of the project. Consequently, the proposed mitigation measures proved insufficient, leading to delays in project implementation. Moreover, 15 out of 21 projects also encountered implementation obstacles caused by the low technical capacity of implementing agencies, which is captured by the skilled human resources and organizational capacity subcategory. Consistent with this finding, RAP 2022 also found that World Bank country programs were less adept at assessing institutional capacity risks (World Bank 2022a).

Figure 2.6. Factors Affecting Project Implementation: A Comparative Analysis

Image
The theory of action diagram shows three interrelated areas that define the quality of the World Bank’s early response.

Figure 2.6. Factors Affecting Project Implementation: A Comparative Analysis

Source: Independent Evaluation Group.

Note: Negative = the identified factor was reported as a constraint to project implementation. Positive = the identified factor was reported as facilitating implementation. Both = at the project level, there were positive and negative factors in the same category. This is more prominent in categories that were not disaggregated, such as coordination and engagement. For example, the Implementation Completion and Results Report showed that there was a clear allocation of roles and responsibilities (positive), but the bureaucratic structure created challenges to project implementation (negative). FY = fiscal year; RAP = Results and Performance of the World Bank Group.

Table 2.1. Risks Insufficiently Identified and Mitigated

Risk Types

Projects (n = 56; %)

Implementation capacity

38

Not specified

16

Political

13

Fiduciary

7

Environmental

5

Governance

5

Safeguards

5

Operational

4

Legislation

2

Economic

2

Stakeholders

2

Source: Independent Evaluation Group.

Figure 2.7. Inadequate Risk Identification and Mitigation of Weak Institutional Capacity and Low Technical Capacity of Implementing Agencies

Image
The theory of action diagram shows three interrelated areas that define the quality of the World Bank’s early response.

Figure 2.7. Inadequate Risk Identification and Mitigation of Weak Institutional Capacity and Low Technical Capacity of Implementing Agencies

Source: Independent Evaluation Group.

Note: The figure shows that the majority of projects that failed to adequately identify and mitigate capacity risks also reported the low technical capacity of implementing agencies as a challenge for implementation. Positive only = skilled human and organizational capacity was reported as facilitating project implementation. Negative = skilled human and organizational capacity was reported as a constraint to project implementation. No data = skilled human and organizational capacity issues were not reported by the project.

Factors that affected project implementation in the prepandemic cohort had a more adverse impact in the RAP 2023 cohort. This RAP’s machine learning exercise, which expanded the analysis of factors to the prepandemic cohort, revealed that a larger share of projects experienced obstacles during implementation compared with previous years (figure 2.6). Among contextual factors, implementation challenges linked to conflict and instability increased from 8 percent of projects in the prepandemic cohort to 27 percent in the RAP 2023 cohort. Similarly, natural disasters negatively affected the implementation of 23 percent of projects compared with 9 percent in previous years. Among stakeholders’ dynamic factors, coordination and engagement challenges increased from 8 percent of projects in the prepandemic cohort to 26 percent in the RAP 2023 cohort. Challenges caused by commitment and leadership changes among stakeholders undermined a larger share of projects than in previous years, increasing from 9 percent to 27 percent. Project finance–related challenges, particularly procurement, were also more frequently reported in the RAP 2023 cohort (31 percent of projects) compared with the prepandemic cohort (15 percent of projects). It is important to highlight that these challenges cannot be fully attributed to the COVID-19 crisis. This is because the implementation phase of RAP 2023 projects goes all the way back to 2003, making it impossible to determine if specific factors occurred during the pandemic or before. Furthermore, previous studies have identified similar challenges to project implementation, indicating that these are not unique to the COVID-19 pandemic.7

Project performance remained resilient to these implementation challenges. Overall, projects in the RAP 2023 cohort performed better than those in the prepandemic cohort across all project ratings (figure 2.8).8 Moreover, the World Bank’s efficacy ratings in pursuing intended development outcomes have consistently improved in the long run (box 2.2). Only a few factors that affected implementation were statistically associated with project performance ratings, and their influence was moderate (figure 2.9). For example, 65 percent of projects that reported skilled human resources and organizational capacity as critical factors had an average outcome rating of 4.3, which is moderately satisfactory, compared with 4.6, which is satisfactory, for projects that did not report such issues. Previous studies, including those by Denizer, Kaufmann, and Kraay (2013) and Ortega Nieto, Hagh, and Agarwal (2022), have also identified the negative association between human and organizational capacity weaknesses in both project outcomes and Bank performance. In addition, projects that identified key risks during the project preparation phase and outlined mitigation measures for them had an average outcome rating of 4.5 compared with 4.3 for projects that did not identify such risks (see box D.1 for other factors exhibiting a mild association with project performance ratings).

Figure 2.8. World Bank Project Ratings: The Prepandemic Cohort Compared with the RAP 2023 Cohort

Image

Figure 2.8. World Bank Project Ratings: The Prepandemic Cohort Compared with the RAP 2023 Cohort

Source: Independent Evaluation Group.

Note: H = high; HS = highly satisfactory; M&E = monitoring and evaluation; MS = moderately satisfactory; MS+ = moderately satisfactory or above; RAP = Results and Performance of the World Bank Group; S = substantial or satisfactory; S+ = substantial or above (satisfactory or above).

Box 2.2. Development Outcomes Underlying Efficacy Ratings

Ratings increases in fiscal years (FY)20–22 are not a result of a systematic difference in the projects’ intended development outcomes compared with previous years. The analysis of outcome types indicates that the top three development outcomes pursued by the World Bank, as observed in the Results and Performance of the World Bank Group 2021 across FY12–14 and FY17–20 (second quarter), continue to be increasing institutional capacity, improving service quality, and expanding access to services (figure B2.2.1; see appendix A for methodology and appendix C for more details).

Figure B2.2.1. Top Three Development Outcomes in the RAP 2023 Cohort

Image

Source: Independent Evaluation Group.

Figure B2.2.1. Top Three Development Outcomes in the RAP 2023 Cohort

Source: Independent Evaluation Group.

The efficacy ratings have shown a consistent improvement over time, and this upward shift is statistically significant in the long run. The comparison between FY12–14 and FY17–20 (second quarter), as well as between FY12–14 and FY20–22, demonstrates statistically significant improvement. This indicates the World Bank’s ongoing efforts to enhance project efficacy and effectiveness, which are reflected in the improved performance ratings observed in FY20–22.

Table B2.2.1. Average Efficacy Rating by Objective Outcome Type

 

Percentage of Objectives

Average Efficacy Rating

Statistical Significance in Difference in Average Efficacy Rating

Outcome Type

FY12–14

FY17–FY20 (Q2)

FY20 (March) –FY22

FY12–14

FY17–FY20 (Q2)

FY20 (March)–FY22

FY12–14 vs. FY17–FY20

FY17–FY20 (Q2) vs. FY20 (March)–FY22

FY12–14 vs. FY20 (March)–FY22

Capacity

of institutions enhanced

37

40

33

2.43

2.72

2.70

Yes

No

Yes

Quality

of services improved

40

47

36

2.59

2.77

2.83

Yes

No

Yes

Source: Independent Evaluation Group.

Note: The periods of FY12–14 and FY17–20 (second quarter) include only a sample of projects that represent 29 percent and 31 percent of the population, respectively. FY = fiscal year; Q = quarter.

Figure 2.9. Relationship between Implementation Factors and Project Ratings in RAP 2023 Cohort

Image

Figure 2.9. Relationship between Implementation Factors and Project Ratings in RAP 2023 Cohort

Image

Source: Independent Evaluation Group.

Note: Differences in average ratings between projects that identified the implementation factor and those that did not were statistically significant, as determined by both t-test and Mann-Whitney U test. M&E = monitoring and evaluation; RAP = Results and Performance of the World Bank Group.

Adaptation and Restructuring for Results

More project adaptation and restructuring during implementation may explain the improved project performance. The adaptive and learning capacity of project teams enabled them to overcome implementation challenges (which helps explain the limited impact of these challenges on project performance). RAP 2020 anticipated that projects would require more frequent course corrections to adapt and respond to unexpected shocks, including those related to the pandemic (World Bank 2020). Indeed, IEG’s evaluation on the World Bank’s early response to COVID-19 showed that repurposing existing projects allowed the World Bank to rapidly adapt to the pandemic (World Bank 2022b).9 This RAP’s analysis confirms that there was a notable change in project restructuring patterns during the pandemic (figure 2.10). An examination of restructuring dates revealed that restructurings occurred more frequently after March 2020, which coincides with the onset of the pandemic (table D.2). Overall, the number of restructurings increased from an average of 1.9 per project in the prepandemic cohort to 2.6 in the RAP 2023 cohort.

Figure 2.10. Occurrence and Reasons for Restructuring: The Prepandemic Cohort Compared with the RAP 2023 Cohort

Image

Figure 2.10. Occurrence and Reasons for Restructuring: The Prepandemic Cohort Compared with the RAP 2023 Cohort

Source: Independent Evaluation Group.

Note: RAP = Results and Performance of the World Bank Group.

Extensions of project closing dates seem to have helped projects achieve their intended outcomes. As expected from the delays caused by the pandemic, the share of projects that changed their closing dates increased from 78 percent in the prepandemic cohort to 86 percent in the RAP 2023 cohort. Among these, project extensions accounted for 79 percent of these changes and accelerated closing dates accounted for the rest.10 Project extensions compensated for the time lost during lockdowns, likely providing the time needed to achieve their intended development outcomes. At the same time, project extensions can increase costs, with implications for projects’ cost-effectiveness and cost benefit. However, project extensions did not affect project efficiency ratings. The share of projects rated with substantial or high efficiency was higher in the RAP 2023 cohort (62 percent) than in the prepandemic cohort (48 percent), and the correlations between project extensions and average efficiency ratings were not statistically significant.11

Changes in results frameworks helped projects achieve their intended outcomes. Restructuring data show a notable increase in the share of projects that revised their results frameworks, rising from 61 percent in the prepandemic cohort to 70 percent in the RAP 2023 cohort. These changes in results frameworks entailed replacing indicators for better measurements; adding new indicators to account for changes to a project’s scope, for example, in a project that expands into a new geographical area; and changing indicator targets to respond to unexpected changes in the project’s circumstances, such as changes caused by the pandemic or made because targets at project appraisal were no longer, or had ever been, realistic, among other changes. An in-depth review of a sample of 54 ICRRs with modest M&E quality shows that revising results frameworks during implementation helped these modestly rated projects improve to have substantial efficacy ratings. In the sample, 93 percent had shortcomings in the initial design of their results frameworks (table C.9). These shortcomings included (i) inadequate selection of indicators, (ii) a lack of a data collection methodology, (iii) unrealistic targets, and (iv) attribution issues. However, many project teams were able to rectify these shortcomings during implementation by refining their M&E methodology, revising indicators, or adjusting targets through project restructuring. Gathering additional evidence on projects’ achievements to supplement results frameworks—such as qualitative information, impact evaluation findings, or beneficiary survey data—resulted in a good efficacy rating of substantial (see appendix C). That said, more analysis is needed on the type of revisions that project teams made to project results frameworks to understand how specific changes influence efficacy.12

Timely course corrections to results frameworks also helped projects achieve their intended development outcomes. Some of these restructurings led IEG to adopt a split rating methodology, which can occur when both (i) teams revise project objectives or associated outcome targets during implementation and (ii) project achievements of original objectives or targets differ from revised objectives or targets.13 Indeed, the share of projects with split ratings increased from 3 percent in the prepandemic cohort to 22 percent in the RAP 2023 cohort. In addition, the evidence shows that the earlier these revisions occur in the project cycle, the greater the likelihood that projects will achieve their intended development outcomes. Figure 2.11 shows that the earlier revisions occur during the project life, the higher the project’s efficacy ratings are compared with what they would have been without the revision of the original objectives or key associated outcome targets (see also table D.3).

Figure 2.11. Timing of Project Revisions and the Shift in Efficacy Rating in Fiscal Years 2019–22 When a Split Rating Is Applied

Image

Figure 2.11. Timing of Project Revisions and the Shift in Efficacy Rating in Fiscal Years 2019–22 When a Split Rating Is Applied

Source: Independent Evaluation Group.

Note: The percentage disbursed at the split indicates the timing of project revision. The shift in efficacy rating refers to the difference between the final efficacy rating and the efficacy rating applied when considering the original target. The blue line is a regression line showing the correlation between disbursement at split (%) and the shift in efficacy rating.

Monitoring and Evaluation for Adaptation and Results

Improvements in M&E quality facilitated project adaptation and helped provide sufficient evidence on projects’ achievements. Several studies describe M&E as an early warning mechanism that enables effective adaptive management (Denizer, Kaufmann, and Kraay 2013; Ika, Diallo, and Thuillier 2012; World Bank 2016, 2020, 2021). Strong M&E frameworks equip teams with a deep understanding of project challenges, allowing them to address weaknesses, make timely course corrections, and achieve desired development outcomes. Previous evidence shows that World Bank projects with strong M&E frameworks have higher (and statistically significant) outcome ratings (Raimondo 2016; World Bank 2020, 2021). Similarly, projects in the RAP 2023 cohort with higher M&E quality ratings had higher efficacy ratings (table 2.2). This is not surprising because efficacy ratings take into account both the validity of the results framework to measure the intended development outcomes and the actual achievement of those outcome measures. Furthermore, ICRR data indicate that projects with modest or negligible efficacy ratings mostly failed to achieve well-defined target indicators, or had low achievement, rather than failed to define appropriate results framework indicators, or had insufficient evidence (figure 2.12).14

Table 2.2. Overall Efficacy and Monitoring and Evaluation Quality Ratings (percentage of projects)

Overall Efficacy

M&E Quality

Negligible

Modest

Substantial

High

Low

0.4

1.4

0.0

0.0

Modest

1.4

13.0

23.0

0.0

Substantial

0.0

1.4

45.0

7.0

Source: Independent Evaluation Group.

Note: M&E = monitoring and evaluation.

Figure 2.12. Reason for Low Efficacy among Projects Rated Negligible or Modest

Image

Figure 2.12. Reason for Low Efficacy among Projects Rated Negligible or Modest

Source: Independent Evaluation Group.

Note: In the Results and Performance of the World Bank Group 2023 cohort, only 15.4 percent of projects received ratings of negligible or modest efficacy.

Results frameworks with well-aligned and adequate indicators contributed to improved efficacy ratings. In assessing the validity of results frameworks in measuring intended development outcomes and their associations with efficacy ratings, we examined 4,808 indicators corresponding to the 273 projects included in the RAP 2023 cohort. Indicators were classified according to their (i) outcome type; (ii) adequacy—fully, partially, or not adequate—in accurately measuring the individual objectives;15 and (iii) level (including output, intermediate outcome, outcome, or high outcome; see more details on these classifications in appendixes A and C). Our analysis found strong alignment between indicators and outcome types, with 97 percent of objectives having indicators of the same outcome type; moreover, objectives with indicators well aligned to them tend to have higher efficacy ratings (see table C.10). The analysis also found that 85 percent of development objectives had at least one fully adequate indicator to measure a project’s intended development outcome. On average, individual objectives had 65 percent fully adequate, 35 percent partially adequate, and 0 percent inadequate PDO indicators.16 The adequacy of indicators also matters for both objective efficacy and overall project efficacy. Objectives with more fully adequate indicators tended to have higher efficacy ratings.

However, the outcome orientation of results frameworks does not explain efficacy ratings. Our analysis found that most results framework indicators were at the outcome and intermediate outcome levels. Forty percent of PDO indicators that measured the achievement of individual project objectives measured outcomes, 46 percent measured intermediate outcomes, 12 percent measured outputs, and a mere 2 percent measured high outcomes. Most intermediate results indicators for projects were even less outcome oriented because they were mainly lower-level indicators designed to gain insights on the project’s progress toward completing project activities. These indicators mostly measured outputs (54.0 percent), followed by intermediate outcomes (38.0 percent), outcomes (8.0 percent), and high outcomes (0.2 percent). This RAP, however, found no significant associations between a project’s indicator level and its efficacy ratings. One explanation for this is that objectives without outcome-level indicators may still yield a substantial efficacy rating provided that other lower-level indicators demonstrate that the project completed the intended activities and that these activities would plausibly contribute to the achievement of intended development outcomes, as outlined by the project’s theory of change. Another explanation for the lack of correlation is related to the nature of intended development objectives. Development objectives that aim for intermediate outcomes do not need high outcome indicators to measure achievement. This is typically the case for development objectives that aim to increase access to services (box 2.3).

Box 2.3. Development Outcomes Underlying Efficacy Ratings and Validity of Results Frameworks

Among the 16 types of development outcomes classified, the World Bank has been more successful in expanding access to services than in improving quality of services or enhancing institutional capacity.

Expanding access to services was the intended development outcome with the highest efficacy rating (average of 3.1 on a 4-point scale, which is substantial). Objectives pursuing this type of outcome also outperformed others in the adequacy of indicators. Many objectives aiming at expanding access to services were stated as lower-level results, thus not requiring high outcome indicators to measure and demonstrate achievements. On average, 74 percent of project development objective indicators measured outputs and intermediate outcomes, whereas only 26 percent measured outcomes.

Objectives aiming at improving the quality of services also had on average a substantial efficacy (average of 3.0 on a 4-point scale), along with a high adequacy of indicators. Indicators measuring the quality of services were found at all four levels, in line with the specific dimensions of quality that the project focused on. For example, objectives addressing quality of services had on average 15 percent of indicators at output level that measured improvements in structural quality, such as rehabilitating or upgrading infrastructure and training service providers. They had 48 percent of outcome-level indicators that measured, for example, time savings and user satisfaction with services provided and only 5 percent of high outcome–level indicators that measured, for example, fatality rates.

Enhancing the capacity of institutions to perform remains a particularly challenging outcome to achieve. Objectives targeting this outcome type received statistically significant lower efficacy ratings, with an average of 2.8 on a 4-point scale. Consistent with the findings of Results and Performance of the World Bank Group 2021, the attainment of these development outcomes was measured predominantly by intermediate outcome or lower-level indicators (67 percent). For further details, see appendix C.

Table B2.3.1 Individual Objective Efficacy, Level, and Adequacy of Results Framework Indicators

Top Three Development Outcomes

Average Individual Objective Efficacy Rating

(4-point scale)

PDO Indicator-Level Score (4-point scale) and Share of PDO Indicators by Level (%)

PDO Indicator Adequacy Score

(4-point scale)

Access to services expanded

3.06 (ßβ)

2.20 (aα)

High outcome 0

Outcome 26

Intermediate outcome 68

Output 6

2.78 (ßβ)

Quality of services improved

2.95

2.43 (αa,yγ)

High outcome 5

Outcome 48

Intermediate outcome 32

Output 15

2.70 (yγ)

Source: Independent Evaluation Group.Note: Statistical significance at least 0.05 based on Student t-test and Mann-Whitney U test. aα = statistically significant difference between access to services expanded and quality of services improved; ßβ = statistically significant difference between access to services expanded and capacity of institutions to perform institutional functions enhanced; PDO = project development objective; yγ = statistically significant between quality of services improved and capacity of institutions to perform institutional functions enhanced.

Source: Independent Evaluation Group.

  1. A project's exposure time to COVID-19 was calculated as the period from March 2020 until the project’s closing date divided by the project’s overall duration. For a small number of still-active projects that had an Implementation Completion and Results Report Review (ICRR) completed, the exposure measure was calculated as the period from March 2020 until the project’s ICRR completion date divided by the project’s overall duration.
  2. These two major overlapping shocks to the global economy over the past three years had a significant impact on economic growth across regions, by stoking uncertainty and disrupting global trade and supply chains. The resulting increases in energy, food, and fertilizer prices also amplified the inflationary pressures (World Bank 2023a).
  3. The evaluation methodology for development policy financing projects changed in mid-2020. In the old methodology, Bank performance overall rating was based on quality at entry and quality of supervision, whereas in the new methodology, it is based on design and implementation (see appendix A).
  4. In addition to the COVID-19 pandemic, seven projects also reported encountering multiple concurrent outbreaks, including Ebola, cholera, and measles. However, the COVID-19 pandemic emerged as the most frequently cited among them.
  5. The content analysis of self-reported factors affecting implementation of this Results and Performance of the World Bank Group (RAP) identified both challenges and enablers faced by projects, as stated in the Implementation Completion and Results Report narrative, specifically focusing on the Factors Affecting Project Implementation and Performance section. For factor classification purposes, an adapted version of the Delivery Challenges in Operations for Development Effectiveness taxonomy was used (see appendix A for methodology and appendix D for more details on factors that affected implementation).
  6. See figures D.4 through D.9 for details on the distribution of factors affecting implementation by project subgroups.
  7. Ortega Nieto, Hagh, and Agarwal (2022) used data from the Delivery Challenges in Operations for Development Effectiveness developed by the Global Delivery Initiative. Their study examined project performance and the attainment of development objectives across 42 specific delivery challenges, drawing from a data set of over 5,000 lending projects spanning the period from 1995 to 2015.
  8. Performance rating improvements between the prepandemic and the RAP 2023 cohorts are not attributed to a systematic difference in the composition of the portfolio. The decomposition analysis shows that the primary factor contributing to the overall increase in performance ratings is not portfolio changes but rather rating increases within various subgroups (including Global Practice, Region, project size, country income level, lending group, and fragile and conflict-affected situation status; see figures C.11 through C.15).
  9. Approximately 60 percent of World Bank country programs underwent a significant reorientation of their portfolios to adapt to the changing needs caused by COVID-19, involving extensive repurposing of projects, additional support through advisory services and analytics, and the introduction of new initiatives.
  10. However, this does not mean that project duration became longer. There was no statistically significant difference in the length of the extensions because both cohorts had a mean and median extension period of 15 months and 12 months, respectively (see figure D.14).
  11. The December 2020 guidance note, developed by Operations Policy and Country Services in collaboration with the Independent Evaluation Group, titled “Preparing an ICR for a Project Impacted by COVID-19,” acknowledges that certain delays in project implementation may not necessarily indicate inefficiency, particularly if there was an ongoing active response. This helps to explain the lack of correlation between project extensions and efficiency ratings.
  12. An in-depth analysis of project restructurings papers that described the changes made in results frameworks and the assessment of the extent to which targets set at the outset were realistic are outside the scope of this RAP. A preliminary review of available guidelines on setting indicators’ targets, conducted at the Concept Note stage, suggests that the assessment of the adequacy of target levels needs to consider the historical trends of that particular indicator, benchmarking (results achieved by similar projects), expert judgment, and stakeholder expectations. Moreover, setting targets will depend on context-specific factors such as available resources, institutional capacity, environmental and political concerns, the duration of the project, the complexity of the intervention, and the contribution of other donors’ inputs. Such a detailed assessment cannot be realistically undertaken at scale by the RAP product, which uses other Independent Evaluation Group micro products as the main sources for evidence.
  13. According to ICRR guidelines, Independent Evaluation Group staff independently assess the appropriateness of applying a split rating versus assessing the entire project. A split rating typically applies when (i) the project objectives or key associated outcome targets were revised during implementation and (ii) the project’s achievements based on original objectives and targets differed from those based on revised objectives and targets. For example, if the project expanded its scope, and the targets for the original geographical areas were achieved, but the targets for the new geographical areas added at restructuring were not achieved, then a split rating is applied. When the project’s scope has decreased through a downward revision of targets, and the original target was not achieved, but the revised target was achieved, a split rating is also applied. When deriving the project’s overall efficacy and outcome ratings, the split rating takes into account the project’s achievements against both the original and the revised objectives and targets, weighted by the disbursement rate at the time of the revisions. See section 9 of Guidelines for Reviewing World Bank Implementation Completion and Results Reports: A Manual for IEG ICR Reviewers (World Bank 2017).
  14. Since the introduction of the reasons for including a low efficacy rating in the ICRR system in 2017, there have been no significant changes in the share of unsuccessful projects attributed to either of these reasons over time (see figure C.16).
  15. For example, in a Transport project, the indicators of reopened project roads in good to fair condition and roads in good and fair condition as a share of total classified roads were fully adequate because they can demonstrate the achievement of the individual objective to reestablish lasting road access between provincial capitals, districts, and territories in the project impact area. The indicator of number of condoms distributed, instead, was not adequate because it did not provide evidence toward the improvement of roads conditions, and the indicator of action plan to develop the road construction industry implemented was partially adequate because it contributed to demonstrating the achievement of the individual objective to some extent, but it is not sufficient (see appendix A for methodology and appendix C for more details on the analysis).
  16. When considering all indicators included in project results frameworks (that is, project development objective and intermediate results indicators), individual objectives have on average 30 percent of fully adequate indicators, 69 percent of partially adequate indicators, and just 0.4 percent of not adequate indicators.