We are less than a decade from delivering on the Sustainable Development Goals (SDGs). The SDGs touch on some of the most profound challenges facing humanity, such as ending poverty and reducing inequality. What does it mean to assess results on this massive agenda? Senior evaluator Estelle Raimondo and host Brenda Barbour talk about what they’ve learned about results measurement in global development. How do we rethink Results Based Management to better support countries achieve their development outcomes? And what can the COVID-19 pandemic tell us about current results measurement practices?

Listen on  Spotify, Apple Podcasts, Google Podcasts, or Stitcher podcasts.

Related resources


Brenda: The world is less than a decade from the 2030 target for delivering on the Sustainable Development Goals. The SDGs are complex. They span all sectors and touch on some of the most profound challenges facing humanity, like ending poverty in all its forms and reducing inequality. What does it really mean to assess the result of work on this massive agenda?

We'll explore that question in this episode of What Have We Learned? The Evaluation Podcast. I'm Brenda Barbour, your host from the World Bank Group's independent evaluation arm, IEG. Welcome.

Today, we'll be talking to Estelle Raimondo, a senior evaluator with IEG. She has been looking at evaluation and results systems for many years and is grappling with how to find better methods to measure the results of complex interventions.

She is the co-author of the book, “Dealing with Complexity in Development Evaluation.” Welcome Estelle.

Estelle: Hi Brenda. Great to be here.

Brenda: Let's begin at the beginning. Why does results measurement matter?

Great question. So, results measurement matters because without that, we don't really know where we're going and we don't really know if what we've tried, what we've experimented with has worked or not worked and why. So, without that we would be navigating without a map basically.

Brenda: And in results measurements, results based management has been a big pillar of this system. Why should we change?

Estelle: Results based management has been indeed an anchor of how we go about really thinking through what we are doing, how it's working, how we learn from it and how we keep ourselves accountable for it. It has a long history. It really started in the early 2000s and since then the word has changed quite a lot. And development has changed a lot.

There are, I would say, two main reasons for why RBM needs to evolve. The first is that we don't pursue the same goals, the same objectives, as when RBM was born.

RBM was… came to fruition in the early 2000s in the era of the Millennium Development Goals, the MDGs, where what we were pursuing wasn't easy, but also was fairly clear. The objectives were measurable. They were mostly being achieved within specific sectors. And it was doable to track whether we were making progress towards them.

Today we are in the era of the Sustainable Development Goals, the SDGs, and these goals look very different from the MDGs. They are complex. They require interventions that go beyond one sector, that require coordinating multi-sectoral approaches.

The second main reason I would say is that we've also evolved a lot in how we pursue these goals, the pathways we take to achieve them.

So, in the era of the MDGs and even before that, the model was primarily the project. It was a very well-defined entity with clear boundaries. It was a lot revolving around financing, clear outputs or services, which we were providing in a kind of direct manner to our clients. This remains a model and it hasn't disappeared. It's still very important in certain sectors and certain outcomes we are pursuing.

But for the most part, we are taking more indirect pathways. We are helping our clients and the development organizations are really engaging into more institution building and capacity building. Trying to support the government entities and private sector in being able to themselves deliver services and outputs.

Brenda: So, the nature of support that development agencies provide has changed. But how about the systems in place to measure the results of this support? Have our ways of assessing results adapted?

Estelle: So, I think there has been incremental change in results measurement, but not to the magnitude that we need it to in order to really rise up to the challenge of informing pretty complex stuff in some ways.

I would say we still have blind spots around these indirect pathways. We know very little about the effect of knowledge work, capacity building. We know even less about the combination of multiple approaches, over a pretty long period of time.

Our results systems remain quite instrument specific. So, they are still designed around project as the main unit of analysis. They are still, you know, trying to understand change in a fairly short time span and the timespan that's directed by administrative timelines. So, you know, when a project ends, when a country's strategy ends, not necessarily the timeline of change patterns themselves. So, you know, it might take two, three, country strategy periods to make progress on a goal like education quality. And we would want to know across these periods whether things are going well, whether we've adjusted, et cetera, et cetera.

The third blind spot I would say is that most of our systems, not only the World Bank, but donors and development actors in general, has a bias towards what can be quantified easily and what can be measured easily.

And so we retain quite a lot of knowledge gaps around key interventions that don't lend themselves to measurement.

Brenda: So, let's talk a little bit about these indirect pathways, because I think getting governments to improve their functioning is critically important for all the other work we're doing to help achieve the outcomes. So how do you measure the results of these indirect pathways… of the work on these indirect pathways?

Estelle: Yeah, that's a tricky task. Especially given the type of systems we currently have in place, which are a little bit inadequate to do that.

Just to give you a simple example – because what I've talked about might be a little bit abstract – in the era of the MDGs, we were pursuing access to education for all. In the era of the SDGs, we are pursuing learning outcomes and education quality.

And to put students in the classroom and have teachers teach them and building schools is not easy and it's not easy in certain settings, especially fragile states, but there is a fairly clear path. A logic model will grasp with, this, this particular outcome. And measuring, you know, number of children attending school, number of teachers who show up is doable and these indicators are good proxy for the goals that we are trying to achieve.

On the other hand, education quality is a very different goal, and it requires a lot more coordination between different actors. Measuring learning is doable but is itself more difficult. It is more costly because it requires going beyond administrative systems. It requires understanding, you know, the behavioral side, but also the institutional barriers, whether there are good systems for teacher training and what these systems need to be well-funded to be efficient.

So, as you can see, it becomes much more complex, and measuring we'll need to adjust.

Brenda: Well, let me ask two questions because you raised some interesting points. So, I get that measuring the number of teachers trained is a lot easier than measuring learning outcomes. But one of the things you mentioned is that for these more complex development goals, multiple actors will have to contribute to achieving the outcome. So then how do you envision those systems working across multiple development actors?

Estelle: Yeah, that's an important point. I would say that's one area where we've actually maybe gone backwards. We had a big effort around the time of the 2008 Paris agreements* where the concept of mutual accountability for development outcomes came to fruition and the idea of setting up evaluation systems and measurement systems that would serve the clients primarily and not the donors.

So, where the goal was to set up a system that would be able to measure outcome, and the subsidiary question was who was contributing, but we've actually not really delivered on that. Then we've fallen back into a model where each agency wants to attribute the change to their own efforts but then as a result, we have a much more fragmented system.

So, in order to really understand whether change has taken place, whether the trend on outcome is positive or not, and then understand how each contributes, we almost have to kind of set the system on its head and change quite fundamentally the way we organize it.

And we need to embrace this concept of “contribution is enough” rather than trying to really establish whether a particular change can be attributed to a particular actor, which when we think in terms of complex system is not even a relevant question in some ways.

Brenda: Thank you. So, I'll come back to that second part, but first I want to hear more about setting the system on its head. How do we need to set the system on its head?

Estelle: I would say that our first objective as, you know, development actors who are trying to support a particular government or set of actors in countries to achieve outcomes that matter for them, that is country outcomes, we need to think about whether they have the systems to measure progress on their outcome.

And so that means, you know, thinking about whether they have the right health measurement system, education measurement system… they can, have enough capacity to conduct evaluations of their own development programs and have only, also very importantly, the capacity to use the data, to make decision and inform their pathway. Only after that, we can worry, we should worry about whether our own contribution. You know, how we learn from our own contribution.

And so, if we had strong country systems, then it would be much easier for each agency to determine whether they've contributed, how they can learn from their own, you know, comparative advantage and niche.

And so trying to have, for instance, a system that tells you exactly what part of this education quality improvement in – I don't know, I'm picking a country – in Zambia can be attributed to the Bank versus to the other bilateral donors versus this particular foundation is not really a helpful construct.

If you are trying to think about a collective endeavor to improve education quality, and then seek to understand whether on the whole we're making progress and finally understand, you know, in a much more qualitative way I would say, what has been the specific contribution of a particular entity, this is where I think we need to go.

So that the knowledge we generate with our results systems is actually useful to the people implementing programs. And utility to different actors mean different features. I would say, it's useful maybe to step into the shoes of people that I haven't been in.

So, if I were, you know, a country team trying to deliver a country program, what I would want from a results system is to give me feedback. Both in the short-term and in the long-term. So, in the short term, it would be a system that can tell me if I'm on track and if there is progress towards the outcomes that I am pursuing, and whether, you know, I’m meeting my goals along the way, which is a much more rapid set of feedback than we have now.

The second thing is I would want a results system to help me step back, look back to inform the future. And that means having moments where I can reflect on my strategy in over the past 5, 6, 7 years: “We've tried this particular strategy to achieve an outcome to improve education quality – has it really been the right pathway?”

Brenda: So, in a perfect world, I think what I hear you saying is that the country system would gather the data and have the capacity to not only allow the country officials to make decisions, but maybe also to feed the result systems of the various other development actors that are working in that country. Have you seen that be the case?

Estelle: So absolutely, that's the model I think we should aim towards. I would say it's the case in some specific sector in specific countries that already have managed to build a really good data repository, something that can collect data frequently.

And I would say instruments like the Program for Results, for instance, that the World Bank Group is now using quite often is geared towards doing that. So, you would see that the monitoring and evaluation framework of a P for R really tries to rely on government data, and then tease out, you know, a contribution story based on the specific data that don't necessarily need to be collected for these particular interventions but are data that are routinely collected by the government. So, I would say by and large this is still not the norm, but they are a good practices. And I think we should continue to aim for that. The vision is that data can be fed, and evaluation information can be fed at a time when the decision needs to be made.

So, when there is a major reform attempt, the decision makers would want to know, what has been tried out before? How has it worked if they were some failures or challenges? Why was it the case? What can I learn from this experience? and so having, evaluation that can speak to that reform rather than to whether a particular donor has hit their targets would be much more informative.

Brenda: Thank you.

So, we can't talk about development without talking about the COVID-19 pandemic. I mean, this pandemic has been such a sad reminder of how quickly contexts can change. And so many countries are facing more uncertainty than ever now. So how can the results measurement system address that kind of huge uncertainty?

Estelle: The word uncertainty is really key. And I think that's where, again, we need the big paradigm shift in results measurements. Because COVID, I think has expanded our awareness of uncertainty. Even without COVID we operate in certain environments where uncertainty and risk mitigation is the day to day.

A plan can be really nice and drafted and in the next day or two will not be relevant at all. And so, some of our more rigid result systems, that almost assume that in any context you can measure and in any context you can achieve specific indicators, need to embrace the fact that this is not true.

So, in fragile states, for instance, or even in some particular sectors where we're doing something completely new and we are uncertain, and you know, there can be turnaround in agencies, et cetera, uncertainty is there. With COVID I would say we've reached an almost different kind of uncertainty where there are more “unknown unknowns” than “known knowns” or any of the other categories.

And so here, the key for results measurement is to be, again, much more attuned to experimentation, to fast data collection and a lot of adjustments along the way. And maybe trying to forecast too far or to have really clear... I go back to this issue of targets because it can be kind of the instruments that helps us measure. And, I would say that being close to operation is even more important in those times where “navigation by judgment” is the key, just to cite a good friend Dan Honig.

The other thing I would mention is this idea of iteration, because for the goals that we are pursuing, for the most part, we don't have a blueprint. There is no recipe. And even if something has worked many times in different places, it's not sure that it will work in this particular context.

So, you know, evaluation systems or results systems that can embrace that and embeds feedback loops both for the teams implementing and the government who might be interested in scaling up or understanding whether something can be replicated, is very important as well.

Brenda: Yes, more adaptive management. But now I'm going to take you to the other side. And so, donors still want to know is their money well spent. They still want to know is my investment having impact. So how do we balance that need?

Estelle: And so, I would say, and maybe I'm a lone voice on this, but this is not antithetical in some ways. I think again, that's why I went, at the beginning when we started talking, I mentioned this concept of mutual accountability. I would say, performance auditing and safeguards on efficiency of spending and making sure that what we fund is going in the right place is very, very important. But what we expect from results systems is not that.

Evaluation systems are distinct from audit systems and they are supposed to answer different kinds of questions, which in my opinion, should be more around, are we working on relevant and the most important challenges that these countries are facing? Are we doing it in a way that ensures me that we are informed by the most updated evidence? Have we embedded learning and, you know, monitoring, course correction, processes and behaviors in what we are doing and are we, you know, being very open, about our successes and our failures? And the system that is geared towards hitting targets, meeting a hundred percent, and indicators, does not necessarily deliver that accountability.

Brenda: So, are you advocating, throwing out indicators and targets altogether?

Estelle: No, not at all. No, but I can see that, you know, my discourse could lead us into this conclusion. No, I really think that indicators and targets have their place. Just to give you other examples, there are specific sectors or specific outcomes where measurable indicators are very good proxy as to whether we've actually succeeded. I'm not an expert in energy, but, you know, my colleagues who are keep telling me that energy access, energy efficiency, if they are very well measured can really tell us whether we've achieved those more country outcomes.

In other sectors and other contexts where there is much more instability or what we can quantify is not a good proxy for whether we've succeeded, then, you know, need to be enriched by other things and indicators. And so, again, institutional reform capacity, building the governance areas that are so cross-cutting to so many things we do. I have a colleague who keeps saying, even in the water sector, you know, we don't build the pipes. We build the systems that build the pipes. And so that is much more difficult to quantify with simple indicators. And there we need to expand that toolkit and that measurement framework beyond metrics.

So, to wrap up, I would say: more opportunities to collect fast data and get feedback, data that can be quantitative or qualitative, more capacity to stop and go, stop and reflect on what the data tells us, be willing to change plan quite quickly, and keeping your eyes on the ultimate outcomes, but with the understanding that the path is not going to be straight, and it might be much more, two steps back, and one step forward and then these evaluation needs to be much more embedded in that life, daily routine, of implementing programs and projects.

Brenda: Many thanks Estelle for laying out how monitoring and evaluation systems can better serve international development.

Check out her report that delves deeper into how those working in development can better learn from experience. It's IEG evaluation of The World Bank Group Outcome Orientation at the Country Level.

Thank you for listening to What Have We Learned? The Evaluation Podcast brought to you by the World Bank’s independent evaluation arm, IEG. What did you think of the episode? What would you like to hear independent evaluators unpack about lessons in global development? Let us know, by going to ieg.worldbankgroup.org/contact.


* The Paris Declaration on Aid Effectiveness and the Accra Agenda for Action

Submitted by kmilhollin on