This is an edited and condensed version of a recent discussion between Caroline Heider, Director General, Evaluation of the World Bank Group and Rakesh Nangia, Evaluator General of the African Development Bank (the full video is also available below). In an engaging and wide-ranging conversation, Caroline and Rakesh explore the role of independent evaluation in their respective institutions, and some of the key issues they have encountered – from how to best engage the Boards of Directors, Management and other stakeholders, to determining what to evaluate and when, how to build evaluation capacity in client countries, and the pros and cons of assigning performance ratings, among others.
"I think an incredible change has happened over the last 15 years. When I started in evaluation 30 years ago, evaluation was a donor-driven activity. Donors wanted to know what happened to our money. That's where the accountability side has come in. Over the last 10, 15 years, the partner countries have increased their demand, their appetite, their hunger for evaluation, and they want their own capacities."
- Caroline Heider, Director-General, IEG
Rakesh: What I thought would be very helpful is to have us go into your brain and pull out some of the nuggets of evaluation that you've been gathering all these years, and therefore benefit all of the evaluation community, rather than just the World Bank.
Caroline: Thank you very much, Rakesh. We've been talking for many years now about evaluation issues that matter to both of our institutions, the African Development Bank, the World Bank Group here, and I think this exchange will be an excellent way of sharing both of our experiences and challenges.
Rakesh: Great. One of the more recent issues that came up was our executive directors asking for real-time evaluations, so essentially what they want us to do is not just go ex-post, but also say, "Okay, here is a project or a country strategy program ongoing, and we want you to check it out during the process, mid-term, for instance, and then help if it needs tweaking so that it can achieve its overall objectives." What has been your experience?
Caroline: The term real-time evaluation actually comes from the humanitarian sector, where the response goes in, does what it needs to do, and then it gets pulled out. Say, for example, the earthquake in Haiti. Suddenly there were thousands of people on the ground distributing food, putting up tents, et cetera, et cetera, and once that response is over, they go back to their regular jobs. If an evaluation comes after that, they don't really find a lot of people to talk to. That's where the concept of real-time evaluation during the response to a disaster sort of came up.
From my point of view, it does blur the line with the independence, because if you are in the process you're reviewing, you're giving feedback and advising on what needs to change, you automatically then have a stake in the project when it comes to closure and you cannot do an impartial evaluation of it afterward. My preferred option for the MDBs is that they have their self-evaluation capacity where they review, they take course corrections, et cetera, et cetera, whereas the Independent Evaluation Office continues to do the job at the end.
Rakesh: Is there a role for independent evaluation to take a look at self-evaluation because self-evaluations usually work well on the project level, but what we find is at a sectoral or a thematic level, and even at a country strategy level, they don't seem to be working as well.
Caroline: In terms of self-evaluation, we in IEG actually evaluate the self-evaluation system as a whole to see whether it functions well. The most recent evaluations point much more to the incentives for doing good self-evaluation and taking course corrections rather than the mechanics of whether the form is right and the processes are functioning. We do that for projects, we do that also for country strategies. With that, what we're hoping to do is to instill an evaluative thinking and the incentives for evaluation within the institutions.
I'm a strong believer that institutions need to have some degree of analytical evaluative thinking so that they can course correct without anybody else having to tell them. This is even more important nowadays when we talk about resilience, about all of these calamities that are happening, where we constantly have to have adaptable management. If you don't have the ability to know why things are going wrong, you don't react maybe in time and in the right way.
Rakesh: Very true. On country strategy and program evaluations, each MDB calls them differently, but the overarching objective is pretty much the same. We've done perhaps the largest number of country strategy and program evaluations in the last two years - more than we have in the 30-year history of the independent development evaluation at the African Development Bank. There are two things that strike us. One: the similarity of the findings. In the end, we're saying, okay, we've done 15, 16 country strategy program evaluations in the last two years, and we find findings very similar. So, is there much value added in going forward?
The second thing we find is the timelines. We always take a 10-year timeline to allow for multiple cycles of the program, and what we find is that this is now getting to be a problem because in many countries in Africa, the more recent periods, in terms of the last two or three years, have been the more emerging in terms of growth or even shifts that are happening in both the economy as well as in the social and political arena.
Caroline: This is actually really interesting and valuable because what you're saying is that you have found systemic issues with the evaluations that keep coming up, or turning up the same thing. If I were in your situation, I would actually have a dialogue both with the board and with management and say, "Look, here is the pattern, and what is behind this? What are the systemic causes for seeing these repeat problems?" If that's clear, then maybe management can actually move forward and start addressing some of those things. If the root causes are not clear, you might want to actually do a thematic evaluation to do a deep dive into, "Well, why are we keep seeing this type of stuff?" Right? That's sort of where a country-level evaluation can be complimented with a more systemic, strategic, or thematic evaluation as well.
Rakesh: We're also struggling with the issue of using ratings for our evaluations because we thought that putting in ratings would actually focus the mind of management, right? Unfortunately, it seems to detract from the learning message, if you like.
Caroline: It focused the mind very much onto the number.
Watch the full interview
Rakesh: Yes, exactly. It's a case of did I get an A or did I get a C, and everybody then starts to focus on how to make the C a B or a B+, rather than what lessons are emerging from it. We're having this interesting debate in our board, where for instance, projects are rated a four-point scale, country strategy and program evaluations and thematic evaluations and sector are rated on a six-point scale. We've reached a deadlock in a sense, because the board is saying, "We do not like this moderately-satisfactory or moderately-unsatisfactory terminology of yours. We don't like this gradation or the gray area. We'd like to know the black and white. Is it satisfactory, or is it not? Please tell us in as many words," and us going back and saying, "Well, the program achieved parts of it, and parts it did not, and therefore, it is moderately-satisfactory or unsatisfactory."
My fear is that if we go to a four-point scale, the focusing of the mind, as you rightly said, will then be on not the learning but really the moving of the scale. What should we do in situations like this?
Caroline: Yeah. This has a long history, this rating thing. I've worked about half of my professional life in institutions that rate, and half that don't rate. There are some advocates that say as soon as you take the rating away, this problem will go away. I can assure you, it doesn't, unless you start using language that does not indicate whether something worked or it didn't work, so then you eliminate the entire evaluative content, and then you don't have the argument anymore. To me, it's not a matter of taking away the rating scale. The second thing is that when I started at the Asian Development Bank in 1995 in the evaluation office, we had a three-point rating scale. And guess what?
Rakesh: Everybody went to the middle.
Caroline: Exactly. Then we went to the four-point scale. At a certain point we were on a six-point scale then took it back to the four-point scale.
Rakesh: I see this myself, I'm sure you do too, that our recommendations range from being rather prescriptive to so broad and at such a high level that you can barely see the treetops. Therefore, what the frustration for us also, is how do you find that right balance where you allow for management to have some flexibility in the types of actions they would take, which would then address the underlying issue that has been uncovered. How has your experience been in these areas?
Caroline: This is a really interesting area where we're right now experimenting quite a bit. One is that yes, we've had always the demand to keep recommendations at a level that is more strategic and not prescriptive. Whenever we've had very prescriptive recommendations, even if we only said, "We recommend you do this. For example, it could look like this”, that would trigger a partial agreement because they would agree with the overall broader, strategic recommendation, but not with our example.
What we're doing right now is we have a couple of pilots where we're taking some of these big evaluations and work with management in a workshop scenario, everything from discussing and debating together with them, what would be a meaningful wording of recommendations. Right now we have one where management has to write the recommendations, so we had to be really careful to craft the report in a way that the conclusion section makes it much clearer, like what are some of the actions that are needed? If you leave that more vague, then actually, almost anything goes. This was actually a very good exercise to make us think more carefully about how do we conclude on an evaluation, and what are some of the lessons, the insights, the actions, without spelling them out.
Rakesh: Well, that's an exciting part. I'd love to go deeper into it, to learn how you're going through this because I do very much agree with you. At some point, we just need to provide what the causality is, if you like, and what the findings are, and then management must have some degree of freedom, because in the end, they need to have the ownership of the actions, they need to be doing the implementation. We've had many discussions; should our role finish at findings and let there be no recommendations, to the other extreme, which is: no, no, no, you need to have clear recommendations so management can then implement. But I like the balance you're creating here, because I think that is exactly what will come in.
One other thing I'd like to get your thoughts on is we created this management action record system, which keeps track of all the actions. These actions now get implemented and we're going to be presenting the status of recommendations over the last five years to our board next month. One of the things that our board members are asking is, "Yes, we see that X percent have been implemented, but did it resolve the underlying issue that you guys uncovered?" Do you go ex-post in a sense saying, "Okay, let's take a look at these actions that were implemented, and on a random basis, let's dig deeper and see if those actions address the underlying problem."
Caroline: This is exactly what this strategic dialogue with management is aiming to address to a certain extent, because just like in your case, we do track this, we report every year X percent of recommendations have been implemented or not. We have added a little bit more content to that over the last couple of years, but really not to any depth or real substance because a lot of the actions and the reporting on what has happened to them gets virtually passed down to a staff member who may be working on something, but it lacks the bigger picture. It lacks an understanding of does this really make a difference to how the World Bank will function. Having these synthesis, not only of our recommendations, but also of the responses that we have received, and then having that dialogue, will sort of start stimulating a different conversation.
Rakesh: I think that's an excellent idea. Those are the kinds of things that are on the cutting edge, which bring together much of these actions and seeing how we are helping the institution itself on achieving development objectives.
With this thinking, our board agreed on a pilot basis to let us go into a couple of countries to start to build the evaluation capacity of the countries, in a sense. Do you see a role for MDBs in here, and how should we be playing it, possibly jointly, if that's possible?
Caroline: This is really, really important, and here is where I think an incredible change has happened over the last 15 years. When I started in evaluation 30 years ago, evaluation was a donor-driven activity. Donors wanted to know what happened to our money. That's where the accountability side has come in. Over the last 10, 15 years, the partner countries have increased their demand, their appetite, their hunger for evaluation, and they want their own capacities. One thing that IEG does, and you are a part of this, is the CLEAR Initiative to create a network of institutions in the partner countries that sort of promote training and capacity development in evaluation methodologies. The MDBs, the bilaterals, everybody has a role to play.
Rakesh: Everybody has a different idea of what impact evaluation means, but they somehow think that this is a better instrument than anything else. I find it expensive and of limited broader value. If you do an impact evaluation, let's say for all the water in a particular province, it may not even be applicable to another province in the same country. How do you see this, and how much effort and resources are you putting into impact evaluations and other instruments, which have such narrow scope, in a sense?
Caroline: Here at the World Bank Group, impact evaluations, as you know, are not really done by IEG. Impact evaluations are with a research department in the Development Impact Education (DIME) Initiative and in the human development vice presidency with the Strategic Impact Evaluation Fund (SIEF). These are again largely donor-funded initiatives, and they generate a lot of information and a lot of impact evaluations. I think impact evaluations are very important as one of many instruments. In IEG, we've become a consumer of impact evaluations, whether the individual ones, or even more importantly, the systematic reviews, which take a bundle of impact evaluations, bring them together, analyze how robust the findings are, and then summarize what are some of the transferrable findings.
Rakesh: Well, I think that sounds like a smart strategy. SIEF and DIME are broader and they're in a separate part of the institution, perhaps. That's something that we should look at as a model also, to look at the broader implications for our own programming and resource-allocation systems as well.
Caroline: Or, for example, partner with 3ie because they are really stimulating the production of a lot of impact evaluations. Very often, collaborations with local institutions that have to be partnering with maybe an international actor to do an impact evaluation, the Corcoran Lab that has a repository of impact evaluations that one can tap into. I think it's really a matter for us as a development community to tap into what's there, identify where are the gaps, so that one can really add value rather than repeat what's out there anyway.
Rakesh: And they bring a breadth of experience as well, which could be applicable across several sub sectors.