The World Bank Group has a proud history of empirically-based decision-making and advice to client countries, leadership in results measurement and aid effectiveness, ground-breaking work in impact evaluation and Open Data, and the longest standing independent evaluation office in international development.

Against this backdrop, it is surprising that IEG finds that results frameworks and monitoring and evaluation in WBG operations are inadequate more often than not. Are the results frameworks to blame or is something else causing the problem? And, how will the 2013 World Bank Group Strategy succeed in embracing evidence, measurement, real-time learning, and evaluation to ensure the transition to a ”solutions bank?”

Here are some highlights from recent IEG evaluations about lessons learned – or not – from self-evaluation systems.

At the Intervention and Project Level. The World Bank Group is the largest producer of impact evaluations (IEs) among all development institutions – by the end of 2011, 460 were either completed or under way.  Our 2012 evaluation of IEs highlighted positive trends: increasingly, IEs are incorporated into project design and used to test variations in project components and how they produce different outcomes, or to assess distributional and welfare impacts. However, the evaluation also showed that only between 22 and 33 percent of IEs were used in operational decisions, and only around 50 percent had an influence on policy dialogue. Why?  To get your IE used, here are some important factors that play a role: ask the right questions, actively involve local stakeholders, apply good practice quality standards, timeliness, and appropriate dissemination.  The quality dimension is particularly important -- our 2013 Systematic Review of Mother and Child Mortality highlighted that only 68 of some 7,000 studies met international standards, among them 18 in which the World Bank had some involvement.  A culture of evidence-based decision-making and a functioning system to link evaluation results to decision-making processes help as well.

At the Country Level. Any of you who have read country program evaluations and reviews of country strategy completion reports know that producing high quality results frameworks in country strategies is difficult.  Together with indicators and measurement systems, they often result in downgrading ratings for performance and results.  In a number of cases, country strategy progress reports did not result in changes, even when they flagged that all expected risks had materialized.  To address any problems that might result from using different yardsticks, IEG agreed with Bank management on a common rating system. In addition, we are deepening our analyses to explain why results frameworks are weak and how these shortcomings can be overcome in the new Country Partnership Frameworks, and are working on a prototype learning product that will integrate lessons from independent evaluations into country strategy formulation.

At the System Level. The 2013 evaluation of the International Finance Corporation’s (IFC) monitoring and evaluation systems traced poor results to poor quality at entry. In particular, it observed that in 60 percent of unsuccessful IFC investment projects, appropriate and relevant lessons had been identified at the early review stage, but had not been integrated into project design.  IFC advisory services, likewise, suffer design shortcomings such as a lack of proper indicators and baseline data, unclear objectives, and unrealistic expectations in outputs and outcomes.  This occurs despite a wealth of lessons from project completion reports, external impact evaluations, and IFC’s SmartLessons.  To incentivize behaviors that focus on development outcomes and learning, in 2004 IFC developed a long-term performance awards program, and incorporated IFC’s Development Goals into the corporate scorecard.  The evaluation also flagged tensions between corporate indicators and those that make sense in the context of a country or project.

At the Corporate Level. The 2012 Results and Performance (RAP) report commented on the Bank’s corporate scorecard, recognizing it as a significant improvement in capturing corporate results. The RAP nonetheless flagged a number of areas for attention, including capturing the results of knowledge work. It also suggested that indicators needed to be expanded to capture the Bank’s results at the country level and that the system needed to allow disaggregation to generate meaningful information and lessons.

The WBG’s change process involves developing one integrated score card and results framework – an exciting opportunity to overcome these challenges and demonstrate global leadership in measuring and reporting development results.


Submitted by Niraj Kumar; e… on Fri, 03/07/2014 - 21:28

For the second point raised - At the Country Level - I would like to add some discrepancies. In a poor country with non existent State and skeleton infrastructure, if any project is delivering same quantifiable result as in a developing nation with a proper democracy and well developed infrastructure; is it justifiable to equate both the evaluations (outcome and impact studies) on similar numerical data. Do we not need a more qualitative appraisal based on the baseline surveys and the other socio-cultural factors for a specific nation which could be later "shuffled" in a unified manner by giving proper weightage to various differentials. An old man dying of hunger in Los Angles and a pauper dying in some suburbs of India - can both be equated as mere numbers.

Submitted by Caroline Heider on Mon, 03/10/2014 - 04:40

In reply to by Niraj Kumar; e…

Niraj, you are right: if we just boil it all down to aggregate numbers we will lose a lot of the richness we need in order to understand change processes, and I would argue enjoy life. You might have seen my blog "Embrace all evaluation methods – can we end the debate?" (Nov 2013) in which I argue for using a variety of methods to understand complex development phenomena better and hence be able to make better choices. The other point of my blog above is that if policy-makers make the right choices, namely those appropriate for the country at a given point in time -- something country strategies are supposed to do -- and progress is measured against those yardsticks, we measure the degree of progress that each country makes in achieving its own goals and compare that with the degree of progress of other countries. This is rather different to ranking countries in terms of their absolute outcomes. Does this make sense?

Add new comment