Results and Performance of the World Bank Group 2020
This Results and Performance of the World Bank Group (RAP) assesses the World Bank Group’s performance by analyzing the achievement of projects and program objectives through ratings and by classifying project objectives according to their outcome levels.
This report examines performance and outcomes from different perspectives using evidence from the Bank Group’s results measurement systems. Previous RAPs have relied on the project and country program ratings that these systems collect. However, this report breaks with tradition and analyzes the results measurement systems’ larger evidence base beyond ratings to classify outcome levels for World Bank and International Finance Corporation (IFC) projects. It also reviews how results measurement systems for select corporate priorities add up results and derives implications for the Bank Group’s outcome orientation. Shifting the focus beyond ratings was partially done in response to the Board of Executive Directors’ request for more evidence on the Bank Group’s development outcomes and outcome orientation.
The data in this report cover a period ending in 2019 and do not show the coronavirus (COVID-19) pandemic’s consequences for outcomes and performance, though the report identifies some implications for the Bank Group’s COVID-19 response.
Part I: Assessing Performance through Ratings
World Bank Projects and Country Programs
Independent Evaluation Group project data for fiscal year (FY)19 show that 79 percent of World Bank lending operations were rated moderately satisfactory or above (MS+) at completion. This compares with 81 percent in FY18. Looking back over a longer period, the share of closed projects rated MS+ was 71 percent in FY09, declining to 63 percent in FY13 and rising since then. Measured by volume, 82 percent of lending operations were rated MS+ in FY19, staying relatively constant since FY13.
When results for FY12–14 and FY17–19 are compared, outcome ratings for investment project financing (IPF) show improved performance, from 68 percent MS+ to 81 percent MS+, and the share of development policy financing (DPF) operations rated MS+ decreased modestly from 72 to 69 percent.
Ratings increased in nearly all Regions and Global Practices. The Middle East and North Africa Region had the largest outcome ratings increases and now has the highest rating, at 93 percent MS+ in FY17–19. Among the two Africa Regions, Western and Central Africa increased from 52 percent MS+ in FY12–14 to 71 percent MS+ in FY17–19. Eastern and Southern Africa was also at 71 percent MS+ in FY17–19 and had remained stable at that level over the period. Project outcome ratings in countries affected by fragility, conflict, and violence (FCV) show improvement but continue to lag those in non-FCV countries. Between FY12–14 and FY17–19, the share of MS+ projects in FCV-affected countries increased from 69 to 77 percent compared with an increase from 69 to 81 percent in non-FCV-affected countries.
Other types of project ratings also increased over the past decade. Bank performance improved from 69 percent rated MS+ in FY13 to 84 percent in FY18 and 82 percent in FY19. Quality at entry ratings increased from 58 percent MS+ for projects that closed in FY14 to 75 percent for projects that closed in FY18 and FY19. Monitoring and evaluation quality ratings increased from 31 percent of projects rated substantial or above in FY09 to 51 percent rated the same in FY19. The improvement across these aspects of World Bank project performance, together with broadly conducive economic and institutional conditions in many larger countries during project implementation, helps explain the overall positive outcome ratings trends.
Beyond the project level, ratings for country program outcomes reached 72 percent MS+ in FY19, up from 51 percent MS+ in FY09. This increase occurred in International Bank for Reconstruction and Development countries, while country program outcome ratings stayed flat in International Development Association (IDA) and FCV-affected countries.
Country program performance was particularly low in FCV-affected countries because of external challenges, including large shocks for which country programs are often not sufficiently prepared. Weak political and technical capacity of governments in FCV-affected countries also explains the lower performance rating for projects focused on institutional and governance reform compared with those focused on service delivery.
Responding to COVID-19
Bank Group teams are preparing COVID-19 response projects under tight deadlines amid complex economic and public health contexts. Projects’ quality at entry could suffer because the teams have less time and opportunity to conduct foundational work, client dialogues, and relationship building. Consequently, more frequent project and country program course corrections might be needed during implementation to respond to shocks and unforeseen circumstances and to mitigate issues associated with shorter project preparation time. Simpler procedures for restructuring and canceling projects could enable course corrections. Additionally, in low-capacity settings, teams could consider reducing country program scope when adding COVID-19 response components to avoid overtaxing countries’ low implementation capacity.
IFC Investment and Advisory Projects
IFC investment project ratings for calendar year (CY)18 are the first to show a slight improvement after a 10-year decline. The CY18 data show that 43 percent of IFC investment projects were rated mostly successful or better on development outcome, down from a peak of 75 percent in CY08 but slightly up from 40 percent in CY17. Measured by net commitment volumes and three-year moving averages, IFC’s development outcome ratings declined from 83 percent rated mostly successful or better in CY07–09 to 43 percent in CY16–18 and 48 percent in CY17–19. Over this longer period, performance declined for all Regions, industry groups, country categories, and equity and loan instruments.
A combination of internal work quality issues, external risk factors, and broader market trends help explain IFC’s performance trends. Issues with IFC staffing, incentives, accountability, and focus on volume targets over development results affected work quality. Market, country, and sponsor risks often distinguished higher-rated projects from lower-rated projects. Those with strong sponsors and business fundamentals coped better with market risks than projects without those characteristics. Additionally, projects that were better prepared to cope with currency devaluations and political and regulatory risks improved the likelihood of higher ratings. Broader market trends may have made IFC’s business model more exposed to risk and weakened the pool of available projects with attractive risk-reward profiles. IFC has taken steps to improve work quality, focus on development results, grow the pool of bankable investment projects, and identify risks and market opportunities better.
Development effectiveness ratings began to improve for IFC advisory services projects evaluated in FY17–19, when 50 percent of them were mostly successful or better. The share rated mostly successful or better declined from 65 percent in FY12–14 to 38 percent in FY15–17. Measured by funding amounts, development effectiveness ratings declined from 70 percent mostly successful or better in FY12–14 to 33 percent in FY15–17 but then increased to 49 percent in FY17–19. Successful advisory projects often had strong client commitment, flexible and proactive supervision, and robust project monitoring and evaluation.
Multilateral Investment Guarantee Agency Projects
Multilateral Investment Guarantee Agency (MIGA) projects’ development outcome ratings have continued on an increasing trend. These ratings increased from 64 percent satisfactory or better (S+) in FY07–12 to 69 percent S+ in FY13–18 when calculated by number of projects, and from 61 percent to 75 percent S+ when calculated by gross issuance amounts. MIGA projects in IDA and FCV-affected countries achieved high ratings—for example, 77 percent of MIGA projects in IDA countries were S+ compared with 63 percent in non-IDA countries in FY13–18. An analysis of MIGA projects in IDA countries found that MIGA promoted private sector investment by deterring political risks and resolving issues such as arrears payments by governments, for example.
Part II: Assessing Outcome Levels
This RAP uses a theory of change framework to classify outcome levels, thus providing new information on the most common types of Bank Group project outcomes. The framework captures the intended and achieved outcomes of World Bank projects and the intended outcomes of IFC projects. The four outcome levels include the following:
- Outputs from Bank Group projects and activities
- Early outcomes such as a new capacity or better access to public services
- Intermediate outcomes such as a meaningful change in policy outcomes or beneficiaries’ lives
- Long-term outcomes with systemic effects nationally or across sectors that contribute to general well-being
Project objectives cluster in clear outcome patterns depending on the sector and lending instrument. The patterns show that most IPF objectives focus on quality and access to services and cluster at level 2. However, IPF objectives in a few sectors (most notably agriculture and environment) have a clearer focus on end beneficiaries and cluster at level 3. Most DPFs, which focus on policy reform objectives outcomes, cluster at level 3, and recently approved IFC projects, which often focus on market creation objectives, cluster at level 3.
The relationship between projects’ outcome levels and their performance rating is only modest and becomes insignificant when controlling for other factors. Ratings for projects with level 3 and 4 outcomes are modestly lower than for projects with level 2 outcomes, but the difference in ratings is insignificant when controlling for instrument and monitoring and evaluation quality. Many projects with higher-level objectives manage to achieve good Independent Evaluation Group ratings, in part by having strong results frameworks to measure outcome achievement. This finding suggests that there is no systematic trade-off between projects’ outcome level and ratings, though it would not be realistic or desirable to expect all World Bank projects to have objectives at outcome level 3 or 4.
Differences in rating performance between IPFs and DPFs and between the lowest-rated Global Practice and other Global Practices appear more closely associated with levels of risk and the inherent difficulty in achieving policy and institutional reforms compared with service delivery improvements. Evaluation methods differ in reality between IPFs and DPFs, which may also play a role.
Thematic Area Outcomes
This RAP finds that the Bank Group clearly articulates higher-level outcomes for its global and thematic work in key thematic areas such as FCV, gender, and climate change. Results measurement systems in these thematic areas serve an essential accountability function by assuring that business units meet output and process targets, which are under the Bank Group’s direct control. Yet a strong focus on monitoring targets can cause a risk-averse corporate culture and lead to box-checking behavior, meaning perfunctory rather than substantive compliance. Overall, systems that measure thematic area results do little to orient the Bank Group toward achieving higher-level outcomes.
Conclusions: Getting to Outcomes
This RAP concludes that the Bank Group often has limited evidence of its higher-level outcomes and can improve how its incentives and results measurement systems support outcome orientation. Projects’ objectives need to balance realism and ambition, and therefore, one should not expect all projects to have higher outcome levels. There are more opportunities to gather evidence on broader outcomes at country program level.
Confronting trade-offs related to the purposes of the Bank Group’s results measurement systems is necessary for improving outcome orientation. The Bank Group’s results measurement systems collect evidence needed for ratings and for process and compliance monitoring. Systems collect little evidence on the Bank Group’s contributions to higher-level outcomes, partly because such outcomes are hard to monitor and combine. At the project level, setting objectives and assessing achievements that can be attributed to Bank Group support continue to be important for the institution’s accountability. Beyond the project level, there is a need to rethink the approach to collecting outcome evidence. A suitable approach would downplay ratings-based accountability, focus on contribution rather than attribution, and help stakeholders understand how different projects and types of Bank Group engagements collectively contribute to country-level outcomes over a longer period.