The growing interest in strengthening development outcomes has stirred increasing debate about evaluation effectiveness. Today, many development institutions subscribe to what has come to be known as the DAC evaluation criteria. Specifically, these are five criteria – relevance, effectiveness, efficiency, impact, and sustainability; in short R/E/E/I/S – that underpin most evaluations in international development.
In a recent blog series that was read by over 12,000 participants, Caroline Heider, Director General Evaluation at the World Bank Group suggested that it was time to review the criteria. Over 100 readers shared their comments and questions. The Conversations team invited Ms. Heider and Hans Lundgren, Manager of the OECD/DAC Network on Development Evaluation, to respond to some of the questions that were submitted online, and to share their thoughts on the state of development evaluation today.
Question 1: Let’s start with you Caroline. For many years , the development community has used a common set of evaluation criteria, commonly known as the DAC evaluation criteria. In one of your recent blogs, you suggested that now is a good time to revisit the DAC evaluation criteria, and that we may be at a "Copernican" moment. Why do you think so?
"It is (in my view) time to move to development models – theories of change – that are less linear, more representative of complex realities, and build on adaptive management. These approaches require evaluation to become more dynamic as well, adopt methods that capture complexity and unintended effects. In addition, there is a need to assess the adaptitiveness of project management." - Caroline Heider |
Caroline Heider: Copernicus is a famous symbol for rethinking how we see the world. For a long time, models have been developed that made assumptions or simplifications. These assumptions were necessary to make the models work, but removed them from the complexity of reality. Today, we are increasingly able to cope with complexity, at least in our thinking and in our modelling capacity. Therefore, it is (in my view) time to move to development models – theories of change – that are less linear, more representative of complex realities, and build on adaptive management. These approaches require evaluation to become more dynamic as well, adopt methods that capture complexity and unintended effects. In addition, there is a need to assess the adaptitiveness of project management. For instance, are adaptations happening at the right time, what causes them, and so on.
2) Hans - you were involved in the process that led to the DAC evaluation criteria. Tell us about that experience and how these criteria came to be adapted so widely by the development community?
Hans Lundgren: The DAC evaluation criteria have their origin in the DAC principles for evaluation which was one of the first tasks I was responsible for when assuming responsibility of the DAC Evaluation Network back in 1989. The criteria were then updated in 2002 with the Glossary of evaluation terms which was developed in collaboration with IEG. Both these processes involved extensive consultations and consensus-building efforts, which were finally agreed to by all member countries and agencies. The criteria are part of a broader package of principles, guidance and standards developed by the DAC Evaluation Network. The criteria were conceived to help evaluation managers to reflect upon and structure the key questions in an evaluation. I think one reason behind their wide-spread use is that they are relatively easy to understand and to use when framing evaluation questions. Moreover, they relate to some key issues when assessing the success or failure of a programme.
Caroline: I agree with Hans that the criteria have been useful to shape overall questions about what we aim to assess. But, in practice I have seen too many evaluations that ask these questions without thinking. They use standardized – what made the program effective?, how efficient was the project?, etc. – without asking whether these questions are most important and useful. There are many other ways of asking questions that are more responsive to program managers, less jargonistic, and that will still lead to an assessment – or evaluative conclusion – of the relevance, effectiveness, efficiency, sustainability, and impact of programs that are evaluated.
3) Are all five criteria in the R/E/E/I/S framework still relevant? Is it time to review or replace all or some of them?
"I am personally open to look again at the criteria and see how they can be refreshed. But before throwing the adolescent out with the bathwater – the criteria have been in place for fifteen years now and not a baby anymore – we should reflect on what we can build on and the fact that since they have such a wide-spread use many consider them useful in practical work.." - Hans Lundgren |
Hans: Since your question asks if they are still relevant, I guess the criterion of relevance at least is still relevant! More seriously, I am personally open to look again at the criteria and see how they can be refreshed. But before throwing the adolescent out with the bathwater – the criteria have been in place for fifteen years now and not a baby anymore – we should reflect on what we can build on and the fact that since they have such a wide-spread use many consider them useful in practical work.
Caroline: True. It is not a matter of throwing the criteria out and starting all over. But, as evaluators we should take stock of how well they have worked and how they can be improved. I have made a number of suggestions in my recent blog series and we will take stock of all of the comments to think through the next steps.
4) Do you see some criteria as being more relevant for some types of programs/projects than others or are they applicable to most cases?
Hans: The five criteria should not necessarily be used in all evaluations. The application of the criteria or any other criteria depends on the evaluation questions and the objectives of the evaluation. Furthermore, we have developed additional criteria in evaluating humanitarian aid and for peacebuilding activities in settings of conflict and fragility. I am in favour of a thoughtful application of these or other criteria not a mechanical application.
5) Revising the evaluation criteria is likely to be messy and difficult. Is it worth it? Can’t we just work with what we have?
Caroline: On the messiness of the process, Hans has a lot of experience in negotiating consensus among different parties. In addition to the challenges he points out, I would say that the tent has become bigger: there are more actors involved in development, which means there are more involved in evaluation. I would hope that a body like the OECD/DAC remains a standard setter and the legitimate convener of building consensus even with an enlarged group of players. But, in response to whether it is worth it? Yes, I do think so! The wide-spread use of the criteria demonstrates how important they – and the consensus around them – were. For evaluation – as a profession or practice – to adapt to modern times, it has to redefine itself periodically. Research into evaluation methods and their practical application are leading the way, but eventually we will have to update and redefine the norms.
Hans: It is true that developing and building consensus around internationally agreed norms and standards is a not a simple process, and I have spent years in my career on facilitating such consensus-building processes. It is not only because of the number of actors but because some countries and agencies may hold very firm positions. For instance, the DAC evaluation standards took three years to develop, test, revise and reach consensus on. An alternative to agreed, common approaches is of course that each agency and development bank develops their own criteria, norms and standards. However,this would limit the possibilities of collaboration and reduce comparability.
6) One unintended consequence is that the criteria have potentially become somewhat of a straightjacket and lack the necessary flexibility. In other words, they foster a rigid structure that produces the same old reports spat out to the same old formula. Is this a fair criticism?
Caroline: This critique is not new to me and often takes the shape of complaints about jargon that only evaluators can understand. I don’t think this is a problem of the criteria as such, but has to do with their use, that is: the practice of evaluation. As I mentioned before, I have found evaluators – in many of the institutions that I have worked for – that have rigidly stuck to the criteria and were unable to use the criteria as the tool they were meant to be.
Hans: I am not sure which agency or development bank you have in mind when you say that they produce the same old reports spat out to the same old formulae. The application of the criteria has not blocked innovation as new methods and approaches have been developed during the last 15 years both for qualitative and quantitative evaluations. The criteria do not specify a specific method for evaluation but rather a way to help evaluators think about and structure the evaluation questions.
7) Are some criteria more important than others? Some have argued, for instance, that impact and sustainability matter more than efficiency, relevance, and effectiveness.
Hans: Which criteria are most important depends on the focus of the evaluation. There is obviously some interdependence between the criteria – if you get a number of positive effects it is also likely that your programme was implemented effectively. One way of dealing with complexity and interdependence would be the merging of criteria which is mentioned in the blog series. At the same time, any changes need to be clear and practical in order to be applied.
8) In reviewing the criteria, how do we avoid the danger of being trapped into even more elaborate box-ticking approach to evaluation?
Caroline: The problem that you raise is true, but not just for evaluation. I have seen this happen in many circumstances in the development field, and commented on the problem in evaluations I have written. I have not yet found the answer why this behaviour occurs: is it the normal course of bureaucracies, or a natural response to ever more demanding agendas that ask too much for people to handle? At least initially, I do hope that we can keep the discussion of evaluation criteria and methods sufficiently “charged” to hold off on the more standardized responses or practices that you described as “box-ticking”. In addition, my hope is that with an increasing number of evaluators who have dedicated their studies, research, and professional practice to evaluation, they will carry the banner that keeps renewing practices, including methods and criteria, to counter any risk of falling into stale routines.
Hans: As I am not in favour of a box ticking approach with the current set of criteria, I would not be in favour of a box ticking approach with a different set either.
9) There is a risk that incorporating the new criterion into evaluations will add complexity to what some already see as an already complex endeavor and entail a new learning curve. Is this where the development community should be spending its resources?
Hans: In my view, to get wide spread use, any new criterion needs be to clear and not overly complex. I think there are other issues around “re-thinking evaluation” that the community needs to reflect on. An important issue is whether evaluation in its current form really provides policy makers with the evidence needed to make decisions on trade-offs between choices. Policy makers need to take decisions on alternative options, involving uncertainty and sometimes limited information. Perhaps evaluation work needs to become more exploratory in nature, rather than generating a historic record of accountability. Moreover, current evaluation and knowledge systems do not always function optimally and work remains to be done to improve use of evaluation findings and promote learning.
Caroline: Indeed, there are many things we need to work on, and that the criteria are only one of them. And while Hans is right that decision-makers have to have evidence to weigh trade-offs between choices, this should not be limited to or even be primarily the responsibility of evaluation. In development banks, the appraisal of projects should include a comparison of the proposed solution with alternative options. Only: in practice that hardly ever happens. And, I do believe that an update to the evaluation criteria could incentivize the evaluation practice to address issues of importance to decision-makers. For instance, an evaluation that would evolve from an assessment of project relevance in its policy context to one that produces evidence whether the most impactful development challenge was addressed – as suggested in our blog series – would be a step towards answering questions in a more complex and uncertain world.
10) Complexity, agility, coherence, sustainability, and equity are examples of emerging areas in the area of evaluation. How are evaluators addressing these and other emerging issues?
Hans: I think new approaches, new methods and new evaluation thinking are all to be welcomed. Evaluation research is leading the way and finding its way increasingly into practical evaluation work on such issues as complexity and equity for instance. But it would be good to see more experimentation and broader uptake of a variety methods. For instance, the use of big data in evaluation seems still to be in its infancy at least in development evaluation work. Further work on unintended effects would also seem to warrant more attention. Re-thinking evaluation however goes far beyond the discussion on criteria.
Caroline: Hans is right to say that rethinking evaluation goes beyond the criteria. As the past has shown: the criteria have incentivized a focus on certain aspects of development practice and can therefore be transformative if they are defined in line with current needs. That is not to replace the development, testing, and experimentation of new methods, but to stimulate and support these developments and keep with times.
11) Are we keeping up with trends outside the world of development evaluation? There is a vibrant and much larger universe of evaluation, beyond that of the development industry, that is continuously evolving and flourishing, and for which "rethink, reframe, revalue, relearn, retool and engage" is an embedded and ongoing process.
Caroline: By all means: we are open to new ideas and improved practices. At IEG, we have hired a number of evaluation experts with the vision to upgrade our methodologies and evaluation practices. In addition, we are drawing on expertise and literature from any of the fields of evaluation to continuously grow.
Hans: I don’t have the impression that the development evaluation field has gone stale and is inward looking. New articles in evaluation journals and books are being published constantly. And I am certainly in favour of promoting cross- fertilization from other areas.
12) Do the SDG's present an opportunity to reframe the evaluation dialogue and build the foundations for a more embracing, resilient, inclusive and sustainable world? What other drivers do you see as pushing the need to change?
Hans: The Sustainable Development Goals as a vision for 2030 are certainly both an opportunity and a challenge. One lesson from the MDG era was that monitoring took the main role while evaluation was in the backseat. The implementation of the ambitious 17 goals, 169 targets, and the monitoring of 230 indicators certainly poses a number of challenges. From an evaluation perspective, I would like to see some more critical thinking: What is the theory of change? What about the assumptions in reaching the goals and targets? What steps need to be taken to enable evaluation to play a useful role in supporting implementation? A number of factors are driving change and disruption in our societies, including technology, violent extremism, competition between private firms and states - not only collaboration. Evaluators need to look outside the box.
Caroline: In addition, the SDGs include some targets on consumption patterns. If all countries aimed for consumption levels like those in OECD countries, the world overall would face considerable constraints and not achieve sustainability. Everyone needs to rethink consumption, including how we evaluate progress towards new consumption patterns. For instance, the efficiency criterion asks whether project resources were used as efficiently as possible, but not whether the project (by design and in its final implemented state) contributes to wasteful consumption or sustainable consumption patterns. It is the most difficult part of the SDG agenda, is uncomfortable, and falls under no-ones mandate in particular, which are ingredients for a “forgotten” agenda that will be revived far too late, that is close to the 2030 target year.
13) Given the amount of interest that this topic has generated, how and where can stakeholders engage with you to build on the existing R/E/E/I/S framework going forward?
Hans: The stakeholder group that I am most involved with is the DAC Evaluation Network which consists of some 40 evaluation departments from ministries, development agencies and banks. I believe there is an openness to discuss issues around “re-thinking evaluation”. If a process of revisiting the criteria will be launched, it would be important to reach out widely to partners, civil society and evaluators in a consultative mode of engagement.
Caroline Our first step will be to review the many comments and contributions we received on the blog series and then discuss with stakeholders, like Hans, whether and where to take this discussion. I agree with Hans that such a process would be open to wide-ranging consultation process.
Read the #WhatWorks series Rethinking Evaluation:
Have we had enough of R/E/E/I/S?, Is Relevance Still Relevant?, Agility and Responsiveness are Key to Success, Efficiency, Efficiency, Efficiency, What is Wrong with Development Effectiveness?, Assessing Design Quality, Impact, the Reason to Exist. and Sustaining a Focus on Sustainability.
Also:
Conversations: Making Evaluation Work for the World Bank, the African Development Bank, and Beyond
Caroline Heider and Rakesh Nangia, Evaluator General of the African Development Bank, explore the role of independent evaluation in their respective institutions, and some of the key issues they have encountered.
Comments
Thanks for sharing such a…
Thanks for sharing such a great conversation! I am not a very pessimistic person, but I have got a huge doubt on the effectiveness of development cooperation. I think before we think about the effectiness of DAC evaluation framework, I am convinced that we need to rethink the whole development cooperation delivery mechanism. This is because at least during my age, I have not seen any country that has been transformed as result of development cooperation. Development cooperation however has become part of the problem particularly in creating vicious circle of dependency, endemic corruption and a sophisticated beaurocracy that sucks the money of hard work tax payers. Many of our problems are entirely political and I think we need to judge political and social commitments before we think of engaging in development cooperation with any country!
Over some 45 years with the…
Over some 45 years with the Bank, and I forget how many with IEG, the main change I have seen in projects is a shift from top-down simple planned projects, setting objectives and ensuring that activities are designed to meet them efficiently, to messier projects which attempted to follow a more organic approach. Community development projects are the obvious case. The need for such a shift was best summed up in a book called Redesigning Rural Development written long ago in 1982 by Johnston and Clark. They believed that rural development projects should be handled more like an Eskimo/Inuit whittling a piece of bone. "He carves a little not quite knowing what he is making, exploring the bone for its texture, faults and potential. He pauses, examines, carves a bit more ... Finally, a smile of recognition: 'Hello seal, I wondered if it might be you'. Problem posed and resolved all in the same process ..." This was a fine concept but how to make it work? And how would one evaluate?
I see no merit in abandoning the three main criteria of relevance, efficacy and efficiency. But the way Bank operations plan and implement against those criteria, and the way we evaluate them, needs to accommodate such iterative, interactive, approaches and be more generally flexible. But this is easier set out in such platitudes than turned into a manageable practice. One cannot have criteria that are so malleable through alternative interpretations that nobody can be held accountable.
We also need to focus on the highest priorities in evaluation. I my view, the biggest weakness in evaluation in all agencies is, and always has been, the tricky process of turning the evidence into useful written lessons. And a part of that problem is that we have never developed any sets of evolving lessons for different sectors that might be built upon with the findings of each new evaluated project and used as a measuring stick. We rarely, if ever, say, "This is what was found in this project case and it is similar to projects x, y and z and in countries of types a and b under circumstances 1, 2 and 3." In other words, we have no constructed learning pyramids, just a flat expanse of plain onto which we randomly throw the odd interesting rock or two.
I was involved in several…
I was involved in several recent exercises to review the OECD-DAC criteria and their applicability to economic and social development endeavors. These adaptations were timely and needed. Note that I use the word "endeavors" to encompass not only the donor-financed "traditional projects" for which the OECD-DAC criteria were originally designed, but also longer term humanitarian efforts, multidonor partnership programs, social entrepreneurship ventures, advocacy campaigns, networking efforts and communities of practice, etc. The adapting of the original OECD-DAC criteria to humanitarian efforts was needed, but some of them are still hard to apply (those related to human rights). More work can be done there. The adaptation of the OECD-DAC criteria to global and regional partnership programs (GRPPs), in which I was involved with a team at World Bank Independent Evaluation Group (IEG) in 2007, was an important addition, adding the importance of evaluating governance, and refining the "sustainability" concept which is too limiting, addressing as it does mainly traditional donor-funded projects, and leaving out other legitimate long-term objectives such as "sustainability plus, or scaleup" , replicability in different contexts, "building back better to incorporate environmental improvements or resilience, scale-down (if program not working) or devolution of implementation responsibility to local implementers, etc. The other exercise in which I was involved was funded by the Faster Forward Fund-- to look at both general critiques of the OECD-DAC criteria with a fresh eye-- and -- how they might need to be adapted to apply to social entrepreneurship ventures. My findings and suggestions are available on request. The critique captures some of Caroline Heider's points about complexity-- social entrepreneurship ventures are dynamic enterprises, with often conflicting dual objectives-- of profit or financial sustainability and social purpose-- which means the optimal economic equilibrium may be shifting over time. How can this dynamic be captured? To sum, I believe the OECD-DAC criteria served evaluations well for awhile, serving as they did mainly as a guide for more in-depth and flexible evaluation questions-- but there are flaws that, when addressed, would strengthen them further. One of the most obvious-- and that most evaluations just naturally overcame by adding the appropriate evaluation question-- was "Reach" and "Results" -- there was no requirement for a description of actual results -- whether outputs or outcomes--or the factors that improve results and subtract from them. (You could say that impact covered this, but in fact, there were many cases where intermediate outcomes were the most visible and measurable). The other flaw often mentioned is that one doesn't have to base an evaluation on pre-determined objectives (as efficacy calls for). One can have a "goal-less evaluation". A similar flaw is that many times different stakeholders have different objectives, or "stakes" in the program. With the increased emphasis on stakeholder participation, the assessment of "efficacy" (and the other criteria which implicitly also rely on objectives) is limiting. Finally, the concept of sustainability was often misunderstsood, even when applied correctly in the context for which it was meant-- donor-financed projects. The correct interpretation was "sustainability of outcomes beyond the project period after donor financing ceased". (Many mistakenly thought it referred to sustainability of the project activities). But even when applied correctly- the time frame was not well-defined, and the most important question-- how is exit to be managed?-- was not always asked.. Nowadays, with so much emphasis on environmental sustainability, the term simply creates more confusion than is worthwhile. And for the many other "endeavors" we find in the world of economic development today-- sustainability is indeed, only one of many long term objectives.
It is a very good discussion…
It is a very good discussion .. with many elements of reflection ... the challenge is how people take advantage of the tools, as a means and not an end .. change the view of the use of criteria CAD and reflect on their improvement, always Is necessary .. since in some cases the same projects use the same criteria and evaluations as a school qualification list and the evaluation itself does not see it as an opportunity for improvement .... but as a requirement within the project. ... and are not giving adequate use to the results of an evaluation and do not make the adjustments that must do or how to enhance the changes generated positively by the intervention and scale in the model ... Another important point is how involving to different authors .. example independent consultants who are applying the evaluations. And how to move these discussions to different regions and in different languages.
Add new comment