Rethinking Evaluation – Have we had enough of R/E/E/I/S?
After nearly 15 years of adhering to the DAC evaluation criteria, is it time for a rethink?
After nearly 15 years of adhering to the DAC evaluation criteria, is it time for a rethink?
By: Caroline HeiderEvaluators must ask if DAC criteria are inclusive enough and respect under-privileged groups.
Evaluation must take into account new thinking that questions escalating consumption patterns.
Need to develop evaluation models that capture complexity to inform policy.
Have we reached a Copernican moment where we realize the 'earth isn’t flat', and our definitions and 'understanding of the world' need to be reset? Leaving aside jargon and methodological challenges, there are other good reasons to revisit the evaluation criteria we use.
Over the past 30 years, evaluation in the development field has gone through multiple cycles of questioning which method is better than another. But few in the development circles in which I have operated, have questioned the standard evaluation criteria that we use.
Many development institutions, including the World Bank, regional development banks, the UN, and bilateral aid agencies subscribe to what has come to be known as the DAC evaluation criteria. Specifically, there are five criteria – relevance, effectiveness, efficiency, impact, and sustainability; in short R/E/E/I/S – that underpin most evaluation systems in international development.
Evaluation questions get framed around these criteria, and reports get written up using this language. But, many an evaluation struggles to implement these criteria in sincerity. Others are accused of using too much jargon as they report faithfully on these criteria. And often, the evaluations tend to leave readers with unanswered questions.
After nearly 15 years of adhering to the DAC evaluation criteria, is it time for a rethink? Have we reached a Copernican moment where we realize the “earth isn’t flat”, and our definitions and “understanding of the world” need to be reset? Leaving aside jargon and methodological challenges, there are other good reasons to revisit the evaluation criteria we use.
As our societies develop, norms and values shift. While the evaluation criteria appear to be neutral and should be applied as such, they were formed by a set of values. The post-2015 agenda has declared its intention to be more inclusive, respecting under-privileged groups of people, which means we as evaluators need to reflect whether the criteria represent such diverse views. Being able to shape norms that are more inclusive of diversity rather than judge everyone through more limiting norms will be a necessity if 2030 is to become the world we want.
The adoption of the Sustainable Development Goals (SDGs) signal that we need to shift our understanding of development outcomes. Our development and economic models are premised on ever-increasing consumption. By contrast, the SDGs recognize that such consumption levels are unsustainable from an environmental, economic, and social point of view. This new commitment should lead to a paradigm shift around desirable development pathways that are not premised on escalating consumption patterns. Evaluation tools to unpack intrinsic impacts on consumption patterns will be needed to determine whether the world is evolving in desired ways.
The world has become more complex, or rather: our ability to accept and understand complexity has increased. International development has relied on often linear and simplified logical frameworks or results chains that string inputs-activities-outputs-outcomes-impacts into a straight causal path. Development practitioners as much as evaluators know that development processes do not follow such linear assumptions. Instead, one action might cause a number of reactions that have effects in rather diverse ways. Hence,we need to develop evaluation models that capture the effects of complexity to inform policy-makers and practitioners about the actual effects of choices they make and actions they take (see excellent book on this topic by Jos Vaessen et al).
The pace at which technology develops and influences lives has far-reaching effects on societies. Solutions to complex problems can be generated in un-thought of ways and often through unconventional networks of people. Information travels, is demanded, and influences large groups of people at a much faster and inter-connected pace than ever before. We are faced with an avalanche of data, a dearth of facts, and an ease of spreading (mis)information that has been unprecedented. Evaluation can benefit from technology, be it to construct with greater ease models that reflect theories of change, help with data collection and processing, or sharing evaluation evidence with a much wider audience than before. But, it does so in an environment of multitudes of realities that may or may not lead to evidence-based decision-making, especially if a “post-fact” era were inevitable.
Current considerations of efficiency, cost savings, or cost-benefit analyses are challenged to take long-term impacts into account. Something that appears efficient today, might have inadvertent devastating long-effects on natural resources or the social capital of communities. Likewise, the distribution of cost and benefits have been uneven, as witnessed by those who bear the brunt of eroded natural resources, or of development outcomes that benefit some groups in society and not others.
Do these issues really necessitate a Copernican shift in the evaluation field that would require questioning the established five evaluation criteria? Are the criteria so inflexible that they can’t be adapted as they are to address these challenges? Does this even matter for anyone else, other than the nerdy evaluators and their jargon-filled reports?
I say yes to all three questions. And particularly so, in a world that lives by the mantra “what gets measured, gets done”.
The Rethinking Evaluation series is dedicated to unpacking and debating evaluation criteria by which we judge success and failure, and whether they are fit for the future. Stay tuned and contribute your views.
Is Relevance Still Relevant?, Agility and Responsiveness are Key to Success, Efficiency, Efficiency, Efficiency, What is Wrong with Development Effectiveness?, Assessing Design Quality, Impact: The Reason to Exist, and Sustaining a Focus on Sustainability
Comments
A "Copernican" moment in…
A "Copernican" moment in time or the beginning of an Archimedial "Eureka" and a revolutionary movement for change? Are the SDG's an opportunity to reframe the Evaluation dialogue and build the foundations for a more embracing, resilient, inclusive and sustainable world? A time for more innovation and creativity in both approaches and methods and a recognition of the longer term trajectory of change - and dare I say it - behavioural science and economics - it's a messy and risky business we're in!
Thanks, Mark. Good questions…
Thanks, Mark. Good questions. And, yes, it's a messy business, but that's even more a reason to ask the difficult questions and find answers to them.
Thanks Caroline for such…
Thanks Caroline for such thought-provoking message. There's debate and concern on suitability and appropriateness in choosing and in applying evaluation methods in various contexts, perhaps similarly, how those ‘gold standard’ DAC evaluation criteria get used in reality is worth reviewing as well? Are they simply used as ‘checklist’ to fulfil bureaucratic requirements or are they applied to really reflect and capture the change stories? Are those criteria more relevant for some types of programmes/projects than others or are they applicable to most cases? This seems interesting research area.
Thanks, Ting, for your point…
Thanks, Ting, for your point about asking also whether the criteria are being used -- at all, as a checklist, or otherwise. A number of blogs speak to exactly this point. I hope you will comment and add your experience to the discussion.
The R/E/E/I/S framework can…
The R/E/E/I/S framework can capture diversity and unintended consequences if we allow it to. The problem is not R/E/E/I/S but the way we use the Results Framework, like a racehorse's blinkers, to focus our attention on our original intentions.
Thanks, Alex. Yes, there is…
Thanks, Alex. Yes, there is a question about application. Like any tool: it can be designed with the best intention and ideas in mind, but when used badly even the best tool cannot produce good results. But, I do think and will argue in the blog series that a number of the tools need a face lift. Hope you stay engaged and post your contributions as we go along.
Caroline and Alex thank you…
Caroline and Alex thank you so much for your comments. Not only do we need to rethink evaluation but we need to be think the aims of development. Too often it is we who design development projects with what we think they should have rather than consulting local partners and participants in-country from the onset on what they think they should have. Far too often we evaluate how well they fulfilled our plans rather than them evaluating us on how well we have helped them have sustainable health, livelihoods, environmental conditions (as your article alludes to). Do we design for sustained impact? Not yet ;). Thank you.
More at www.ValuingVoices.com
Point taken. And, by…
Point taken. And, by rethinking evaluation -- including our purpose, approaches, tools -- we can stimulate a conversation and generate evidence to contribution to the dialogue that the development community is undergoing as well.
Readers will probably have…
Readers will probably have many candidates for additional or replacement criteria, but one which I think quite a few people would like to see included is: equity
Many thanks, Rick. Very…
Many thanks, Rick. Very useful. We are right now in the process of evaluating what the World Bank Group calls "shared prosperity", which is all about distributional effects and equity. There will be a lot to learn in terms of evaluation methods. Looking forward to your suggestions.
There is a trend towards…
There is a trend towards technicist instrumentalism, a kind of elaborated tick box approach to evaluation, that answers the questions of the evaluation criteria but in the end often leaves me feeling flat, dulled by the blandness of reports that lack sharply observed engagement with the complex realities of the programme. Evaluation criteria and the associated norms, standards and quality assurance measures were needed to support the many new evaluators who came into the field as it expanded in 2000s. That helped make sense of evaluation, but now we are in danger of losing its real meaning. The criteria have done their time, bravo, but lets now loosen the straightjacket.
Thanks Caroline for a…
Thanks Caroline for a stimulating blog post. What we need are evaluative omnivores, more flexibility - willing to try new approaches and methods (developing the menu - not sticking to it) that are driven by context and questions and the demands of those wanting evaluation (who, at least outside of the bilaterals and multilaterals not always aware of R/E/E/I/S or want it when they are made aware). We use it for some evaluations (when appropriate) and others we are trying out feedback (constituent voice - with social entrepreneurs) and exploring the use of developmental evaluation for impact investing - where flexibility, embedding evaluation / and evaluator into operations, quick feedback and learning is needed in interventions which are not neatly conceptualised and subject to change. R/E/E/I/S in the way it is presently applied does not offer much flexibility. It probably has contributed to the mass of impentrable evaluation reports being left of the shelf - the same old reports spat out to the same old formular. We are in the business of evidence, learning, influence and change - can we always get it OECD DAC criteria? No. So I agree, time for a rethink and a fun and interesting debate to make relevant for the challenges of the SDGs, impact investing and corporations who want to be 'a force for good'.
Many thanks, Lee, for your…
Many thanks, Lee, for your comment and contribution. It is exactly this -- opening our tool box and asking ourselves what else do we need to be most useful to policy-makers and practitioners so that they can make better informed decisions -- that we aim to stimulate with this discussion. Thanks for sharing your experience.
Thanks Caroline for a…
Thanks Caroline for a succinct, but very powerful message on the need to rethink the current practices in evaluation. I am relatively new to evaluation. Couple of observations: (a) Evaluation criteria are set to measure the "project mode" of work. There is a need to adapt evaluations/use flexible approaches in several areas where global organizations work, which involve supporting political processes and aims at policy changes. As you rightly pointed out, evaluation in such cases are complex in nature, and attribution is often difficult; (b) Impact of normative work of global agencies are not easy to measure, as decision to use and the methods of use of the normative tools/products/instruments involve political decisions.
Look forward to more thought provoking posts.
Thanks for sharing these…
Thanks for sharing these thoughts. The evaluation tools have often been used well beyond the project for larger evaluations, where they make some sense, but might have over-powered other important questions that need to be asked. For the normative work, the UN Evaluation Group discussed some time back the evaluation of normative work. Hopefully some of them will contribute to the discussion here.
Thank you, I enjoyed reading…
Thank you, I enjoyed reading the ideas here. Doing “complexity aware evaluation” or one that has “equity” at its core is a lens through which evaluation projects could take place, regardless of what criteria is used. But I do agree that developing these concepts into explicit evaluation criteria, or adapting the current ones to better represent these ideas, help keep them at the forefront of evaluation projects. It does matter to non evaluators, absolutely, particularly to program designers, but also to anyone who genuinely wants to help put data into action.
Good points. Thanks for…
Good points. Thanks for pointing out that users of evaluation also feel the need for this information and clarity about what gets evaluated and how.
It is time to improve the…
It is time to improve the evaluation practice by adopting better and more robust approaches to the assessment of program impacts and of their relative contributions to observed results in the normal context in which these programs operate. Politicians need more reliable assessments of the cost-effectiveness of program interventions to support (re)allocation decisions. Unfortunately, because of capacity issues, the evaluation community has been focusing too much on efficiency and has generally failed to deliver critical information on effectiveness and cost-effectiveness. Economists have been able to develop complex monitoring systems to measure and manage the impact of fiscal and monetary programs/policies. The technology and methodology has been there for some time. The evaluation community should start pushing for equivalent system and capabilities to monitor and manage the impact of public programs that are more oriented towards social development, health and environmental issues.
Michel, interesting point…
Michel, interesting point. First of all: thank you for recognizing that decision-makers need evaluation inputs, and for pointing to some areas where more work would be helpful. The only question I have: where have evaluators focused more on efficiency? i am not sure that this is the case where I have worked and would like to learn more about examples and practices.
Thank you for challenging…
Thank you for challenging the status quo. The five DAC criteria indeed have been very helpful in bringing discipline, comparability and reliability into the evaluation world , but they often need to be adapted, stretched or accompanied by other criteria to better analyse and unpack the complexity of the world we try to evaluate. For instance by including other criteria such as equity, accessibility, sustainability, policy coherence, etc. The new SDG indeed offer us a great opportunity to stop and rethink how best we can meet the increased demand and desire to unpack and analyse out reality to guide future policy interventions.
Interesting discussion. I…
Interesting discussion. I agree that diversity, social distribution, environment, unintended effects, are increasingly important to consider in evaluations but in my opinion the five criteria will still remain valid and can cover these issues, with enlarged and rethought methods though. In the Council of Europe we address often the political framework conditions in legal and political terms, providing advice for changes of constitutions and laws in various fields, but also through projects ranging from policy advice to training and technical assistance in many areas. They are not growth oriented nor affecting the environment nor the income distribution but nevertheless can be and are evaluated by us using the five criteria and mostly qualitative methods of data collection. Our main problem is documenting impact, as this is long term and the causal chain is difficult to isolate.
Ansgar, thank you. I agree,…
Ansgar, thank you. I agree, there are a lot of things that can be addressed with the criteria, but that will depend on individual evaluators and lead to rather uneven practices. While there is value in interpretation and flexibility, if there are areas where consistently we need to interpret and adapt, maybe it is time to rethink a little, without throwing the baby out with the bathwater.
Good to discuss, but I think…
Good to discuss, but I think that you can build most of the social, equity. environment related issues into the initial set of objectives and then they will show up in the evaluation, even if using the five DAC criteria. Afterwards it is rather a question how to get the data, including on unintended effects.
Very interesting discussion…
Very interesting discussion. From my perspective, the big Five remain a valid to discuss about evaluation objectives. What I’ve found frustrating is when these criteria are put in ToR as a sort of shopping list where implementers or donors (more often implementers) ask for everything without really focus on specific, realistic and useful key research questions.
Agreed
Agreed
This is an interesting…
This is an interesting discussion. I would like to add some points related to the origins of the DAC Evaluation Criteria, provide a suggested explanation for the wide-spread use of the criteria, and comment on the perception that they have been conceived as a ”straightjacket”.
First on the origins. In 1989, and quite freshly recruited to the OECD, I took on responsibility for the DAC Expert Group on Aid Evaluation (now the DAC Network on Development Evaluation). One of my first major tasks was to lead and co-ordinate the drafting of the DAC Principles for Aid Evaluation which were developed over a two year period with member countries and partner organisations. The principles were adopted formally by the Expert Group and then by the DAC in 1991. The principles spelled out the role and purpose of evaluation in development co-operation and clarified important principles for the evaluation function in agencies – such as impartiality , independence and credibility, as well as on evaluation programming and management. The principles also state that it is essential to define the questions which will be addressed in an evaluation – these were referred to as the ‘issues’ of evaluation, and laid out a manageable framework with basic groups of evaluation issues. These issues are the origin of what then became the DAC Evaluation Criteria.
Why have these criteria become so successful in terms of their wide-spread use? Even I have even been somewhat surprised by that, to tell the truth. While we expected them to be used by our members, it is clear that they are used by many more actors in development including partners, civil society, etc. I believe that a key reason for their success is that the DAC criteria are a manageable and relatively easy framework to understand and to use for grouping your key evaluation questions. Moreover, if each funder or organisation used their own specific criteria, it would be difficult to compare and collaborate on evaluations. We have worked to unify basic concepts, terminology, management of evaluations, and quality standards so that we can work together and communicate with a shared understanding. It should be noted that much of the normative work is available in a number of languages besides English – which has not only involved translation but adaptation to the linguistic and cultural setting in which these tools are to be used.
Finally on the “straightjacket” argument. The criteria were never meant to be a straightjacket, but conceived rather as a tool to help more or less experienced evaluation managers to structure the key questions of the evaluation. We have recommended an application of the criteria that should be tailored to the purpose and use of the evaluation – this may mean that some criteria will be in focus while others may not be relevant for the evaluation. Moreover, we have developed additional criteria for use in evaluations of humanitarian assistance - such as coverage and coherence - and also in our guidance on evaluating in settings of conflict and fragility. The DAC Evaluation Standards, adopted in 2010, were developed through a three-year test phase followed by a consensus building processes led by the Network secretariat with many development agencies, partners and ministries involved. The quality standards recommend the application of the agreed criteria, but it is also explicitly stated that the application of the five criteria and any additional criteria depends on the evaluation questions and the objectives of the evaluation. We are not in favour of a straightjacket or a mechanistic application. Rather, they should be used as a framework to help frame the key evaluation questions.
This said, a discussion on the criteria, their application, and potential new dimensions is certainly welcome. Should a change be needed we should be open to it - but not a change for the sake of change but only if we are convinced that the current approach does not help us in framing the right questions to assess an activity, programme or policy.
My apologies for reacting…
My apologies for reacting late to the last contribution, certainly a most valuable one to those who view evaluation practices in an historical perspective. Indeed, I concur with the view that the establishing a set of uniform criteria has proved to be effective in structuring development evaluations, and thereby their overall technical quality.
However, not all development agencies enjoy the ‘luxury’ of having independent evaluation units or departments. Even if they have, I wonder whether the REEIS criteria have really served as a kind of safety velvet in ensuring the “impartiality, independence and credibility” of the evaluation function, as mentioned by Hans (Lundgren). If this were the case, individual professionals – all learning on the job – would have become more autonomous in exercising their evaluation role than some decades ago. This hypothesis deserves to be tested, so let me illustrate this below.
Last week I realized that based in a private consultancy and research foundation (CDR in San José, Costa Rica) I had become involved in external evaluations for a period of exactly 20 years. The assignments mostly focused on rural development ‘interventions’ relating to financial services, value chains and (fair) trade networks, environmental governance, sector programmes and institutional development. Most of them were ex–post evaluations, carried out in the western hemisphere. Partly I undertook the evaluations alone, but to a significant extent it was teamwork.
I decided to assess the extent by which I had experienced the three above mentioned elements in each of the in all 64 assignments, during the period 1997 – 2016. Such exercise is obviously due to subjective judgement and prone to some retrospective mental bias, so I should at least try to reduce that risk.
‘Professional autonomy’ may for this purpose viewed in three dimensions. First, the quality assessment of the technical feedback with the client, concerning evaluation methods and tools for data collection. Second, the support received and latitude offered during the field phase in meeting with relevant actors. Third, the receptivity of the client in dealing with – i.e. ultimately accepting – the contents, conclusions and recommendations of the evaluation. On each of these indicators, I gave a score to the assignments between 1 (lowest) and 4 (highest). In determining the overall score of an assignment, I decided to give the first two factors single weight, and the last one a double weight. As a final step, I wanted to average the yearly averages in a quinquennial (5 yr.) average. So apart from an average 20 year score, for each half a decade I got a separate score.
The resulting average of ‘professional autonomy’ for the entire 1997-2016 period is 3.45, which may be interpreted as between sufficient and good. However, the five year averages turned out to be 3.58, 3.59, 3.40 and 3.21, respectively. This in my view reflects the trend as perceived by myself that, in spite of methodological upgrading, there has been a steady loss of impartiality, independence and credibility. This has less to do with the REEIS criteria (including the relevance of the R) and more with the institutional and policy context in which evaluation work has been carried out over the last ten years.
At the time of writing, at many levels there is a debate going on about journalists, judges and other professionals, whose work – with unpleasant results to those at the top – has come under immense pressure. Not in the public domain, but to some extent certainly comparable, has remained the work of development evaluators when facing their clients. I fully agree with overarching goals of respecting underprivileged groups, limiting overconsumption and dealing with complexity. However, the conditions for that include a minimum degree of autonomy of the profession, which in my impression has not enjoyed the same degree of global recognition as REEIS. Time for a rethink?
Hans, thanks for your very…
Hans, thanks for your very personal reflection and sharing your long-standing practice. I agree that the criteria -- REEIS -- do not automatically lead to independence, credibility, and utility. For that some other dimensions are needed. But, transparency and clarity about criteria, methods, and processes is an important factor in ensuring credibility and can lead to utility. In my view it even helps with independence, because it makes it more obvious whether and if so where interference happens in assembling and interpreting evidence.
Thanks, Hans, you are really…
Thanks, Hans, you are really the institutional memory on this topic. And, you are right: I shaved some 10 years off their existence and struggled to reconcile my knowledge and use of them well ahead of the compendium. Thanks for that clarification. I agree with you about the value of the criteria in shaping the profession. Equally, I hope this renewed debate about rethinking evaluation will help renew the profession in ways that the criteria did way back in the early 1990s.
You do not need contemporary…
You do not need contemporary reasons for ditching the REEIS - they have always been terrible: a bureaucratic solution that imbedded the input-focused desires of those who came up with them. Given the high degree of failure in ODA projects (I am not being cynical here - failure should be expected as normal - in complex dynamic environments in developing countries, it may be higher than private business start-ups in our own countries [where over 50% of new businesses close within two years]), what is crucial is understanding impact. Yet under REEIS that is just one of five criteria, and indeed - god knows how or why - is separated from sustainability. I suppose if we put them together we might have to take post-evaluations seriously. "Relevance" is the most irrelevant, certainly in practice as it is a box ticked by being a project supporting some vague sentences of a recipient government 5-year plan (why do developing countries need these and ours do not?). "Efficiency" and "Effectiveness" are just arse-covering for ongoing funds by donor organisations: private firms would laugh at the whole value-for-money black hole that DFID has gone down. Who cares about these when projects have no impact - oh, sorry, there are no failures!
This is a beautiful and…
This is a beautiful and inspiring article (and comments!!!). REEIS are not terrible. The challenge for evaluators remains the operationalization and tailoring of the big-five criteria to the specific context and operation being evaluated at the time of preparation of the evaluation terms of reference. If we tailor, the straight jacket gets more comfortable…even if excessive tailoring comes at the price of reduced comparability. In the future, in a complex world where resources are scarce and the risk of social resentment is high, concepts such as social equity, social well-being, policy synergies, coherence and connectedness can become more and more dominant in our evaluations for the simple reason that they are key elements in determining the merit and worth of development operations. Not sure though whether these can be elevated to criteria. Similarly, as said in a more recent article, the challenge will be having the right instruments to draw conclusions on the sustainability….
Inspirational blog, thinking…
Inspirational blog, thinking beyond the boxes! Conventionally, evaluation sounds like giving subject grades at high school and REEIS echo the subjects (Maths, Science etc.). The evaluation focus is also on the linear relationship of inputs to impacts; and resonates reactive. The question for me is on how I can make my evaluation proactive in getting expected and unexpected OUTCOMES on poverty, gender, health, environment, climate change (MDGs/SDGs) and learning. As an aspirant to evaluation, I was wondering if improving/changing criteria would make the evaluation a keyed up process.
Indeed, it is reaffirming to…
Indeed, it is reaffirming to read this blog and the resonance in the comments. It is a topic that we have also been discussing and will have a related panel presentation at the American Evaluation Association conference in DC this November called, “Beyond the DAC-gnificent evaluation criteria: From learning to action through methodological innovation.” (Panel Members: Michele Tarsilla, Steven Hansch, Sara Vaca, Riccardo Polastro, and myself.) “The panel will consist of an in-depth reflection on the use and misuse of criteria in contemporary evaluation practice. Based on the realization that evaluation commissioners often include OECD-DAC criteria in Terms of References (ToR) by default, this panel will encourage the audience to rethink the way evaluation criteria are selected during the development of either ToR or evaluation proposals. The first presentation will showcase instances where the integration of OECD criteria with others outside of the “conventional” paradigm, has proved particularly beneficial. The second presentation will focus on how the use and periodic revision of internationally agreed standards in humanitarian evaluation (e.g., Spheres) has generated operationally relevant learning. The third presentation will specifically engage the audience in a visualization exercise that will illustrate the linkages between the OECD-DAC criteria and a set of newer ones increasingly used in development and humanitarian evaluations.”
Good story Caroline. The…
Good story Caroline. The five criteria could remain relevant to achieving SDG with a new approach to risk tolerance and management. LDC's will struggle to make progress towards middle income security and growth without 'game-changing' or transformative strategies, policies and programmes, coupled with improved implementation capacity and resources to match. The five criteria, 'R/E/E/I/S', dont really deal with transformative impact, just impact. Ethiopia is one country that's explored the transformative strategy and achieved some success but implementation has remained challenging. For development partners to remain 'relevant' in a 'Copernican moment' or transformation phase they need to increase their tolerance for risk in taking on solutions to the serious binding constraints facing LDCs and not just run away when an investment opportunity looks a little risky. Can expect some projects to fail with higher risk tolerance but others to produce the high level outcomes that are need to have a transformative impact.
Add new comment