Are measurement practices delivering an accurate picture of World Bank performance?
A closer look at project-level targets and indicators reveals room for improvement in results measurement.
In view of their importance for understanding how effectively a project is achieving development results, the choice of targets and indicators receives surprisingly little scrutiny. The recent Results and Performance of the World Bank Group 2021, also known as the RAP, found that targets and indicators differed widely across World Bank projects in their adequacy for measuring development results directly and objectively. Yet even when targets and indicators were not fully adequate for measuring results, projects could still achieve high outcome ratings.
The RAP found that the steady increase in project outcome ratings and general improvements in the quality of monitoring and evaluation was not always matched by higher quality indicators or more ambitious targets. In fiscal year 2020, almost one-third of World Bank projects with outcomes rated moderately satisfactory or above had only modest or negligible monitoring and evaluation quality ratings. For the sample studied in depth, the report further found that not all individual project objectives had indicators to measure them, and many relied on weak evidence, such as anecdotal evidence or measuring outputs only.
Ratings and Outcomes
Ratings are rubrics for assessing performance relative to a project or program’s objectives.
For World Bank investment projects, the “outcome” rating brings together three underlying dimensions: relevance, efficacy (achievement of objectives), and efficiency. IEG validations assign ratings for a project’s efficacy in achieving each of its individual objectives and for overall efficacy in achieving the project development objective. Other key ratings are quality at entry (which, together with the quality of supervision rating, determines the Bank performance rating) and monitoring and evaluation quality.
To dig deeper into how measurement practices affect ratings and potentially skew the understanding of performance, the RAP studied the choice of targets and indicators in World Bank projects whose objectives included support for ‘institutional strengthening’, an objective that spans all sectors and is critical for both achieving and sustaining development results. The study found several examples of robust approaches to measurement—in which indicators appropriately measured the outcome of strengthening institutions. An example isa project to improve the sustainability of municipal services in Mozambique’s Maputo Municipality that systematically measured the change in public perceptions of municipal services using a citizen report card. There were also examples of less robust approaches to measurement.
The RAP found that 7% of the projects reviewed had no defined indicators to measure whether their institutional strengthening objectives had been achieved and a little over half had indicators that measured only outputs or relied on anecdotal evidence. Output indicators measure whether teams completed actions toward achieving an outcome and not whether the outcome itself has been achieved. Anecdotal evidence is considered a weak form of evidence when it relies on personal observations collected in a non-systematic manner. A project that meets its output targets could be rated highly, even if it has not achieved the development outcomes it was aiming for, and the use of weak evidence creates the risk of an arbitrary space for deciding whether intended results were achieved or not.
The study also found that for almost one-third of all institutional strengthening–related objectives, the corresponding project did not directly measure institutional strengthening. Rather, these projects measured the potential ultimate consequences of having stronger institutions, such as reduced travel times and reduced emissions, or decreased rates of maternal deaths and improved health behaviors. It is possible that institutional strengthening activities did contribute to these outcomes, but this contribution was not measured. A project with indicators that only measure institutional strengthening indirectly could achieve high ratings, even if the changes measured had no direct link to its activities.
This contrasts with the many projects that adequately measure institutional strengthening: another one-third of objectives had indicators that used either a direct or a plausible measurement approach. A direct approach measures the performance of the institutions that the project strengthened, with direct indicators such as increased ministry revenues or expenditures, or decreased time for the institution to process licenses or disseminate annual statistics. A plausible approach measures results that were plausibly attributable to institutional strengthening activities. This approach can measure demand-side factors, with indicators such as beneficiary satisfaction with an institution or training program, or supply-side factors by assessing a ministry’s capacity or training participants’ skill levels.
A closer look at targets revealed that many were set in absolute rather than relative terms, and many had a zero baseline. This approach is problematic because setting a target without reference to country context and needs means that achievement of the target does not demonstrate achievement of context-specific objectives. For example, about 10% of project development objective indicators measured the number of direct project beneficiaries, and all but 13 of the 60 indicators measuring beneficiaries had baselines of zero. In these cases, the information on the absolute number of beneficiaries measures the size of the project’s activities but does not measure how well the project met development needs. The indicator for the proportion of female beneficiaries was also often set to zero at baseline, rather than being expressed as an improvement in gender inequalities. A zero baseline can mean that indicators only measure outputs or lack data to build a baseline.
These findings suggest room for improvement in ensuring adequate attention to results measurement and to the quality and appropriateness of indicators. In validating World Bank project self-evaluations according to the objectives-based approach at the foundation of the self-evaluation system for World Bank projects, the Independent Evaluation Group does not assess the quality of targets, only if they have been achieved. Both the Implementation Completion and Results Reports (ICRs) and IEG ICR Reviews (ICRRs) for projects rarely discuss or justify how targets are set or revised. Yet if ratings strongly depend on targets being achieved, it is legitimate to question whether the achievement of targets is being measured appropriately and if they represent a sufficient improvement over baselines. Closer attention to target setting and the choice of indicators could help the World Bank and the Independent Evaluation Group better assess the actual achievements of projects and provide information about how well projects are meeting country needs.