Rethinking Evaluation – Have we had enough of R/E/E/I/S?

Add new comment

Comments

A "Copernican" moment in…

A "Copernican" moment in time or the beginning of an Archimedial "Eureka" and a revolutionary movement for change? Are the SDG's an opportunity to reframe the Evaluation dialogue and build the foundations for a more embracing, resilient, inclusive and sustainable world? A time for more innovation and creativity in both approaches and methods and a recognition of the longer term trajectory of change - and dare I say it - behavioural science and economics - it's a messy and risky business we're in!

Thanks, Mark. Good questions…

Thanks, Mark. Good questions. And, yes, it's a messy business, but that's even more a reason to ask the difficult questions and find answers to them.

Thanks Caroline for such…

Thanks Caroline for such thought-provoking message. There's debate and concern on suitability and appropriateness in choosing and in applying evaluation methods in various contexts, perhaps similarly, how those ‘gold standard’ DAC evaluation criteria get used in reality is worth reviewing as well? Are they simply used as ‘checklist’ to fulfil bureaucratic requirements or are they applied to really reflect and capture the change stories? Are those criteria more relevant for some types of programmes/projects than others or are they applicable to most cases? This seems interesting research area.

Thanks, Ting, for your point…

Thanks, Ting, for your point about asking also whether the criteria are being used -- at all, as a checklist, or otherwise. A number of blogs speak to exactly this point. I hope you will comment and add your experience to the discussion.

The R/E/E/I/S framework can…

The R/E/E/I/S framework can capture diversity and unintended consequences if we allow it to. The problem is not R/E/E/I/S but the way we use the Results Framework, like a racehorse's blinkers, to focus our attention on our original intentions.

Thanks, Alex. Yes, there is…

Thanks, Alex. Yes, there is a question about application. Like any tool: it can be designed with the best intention and ideas in mind, but when used badly even the best tool cannot produce good results. But, I do think and will argue in the blog series that a number of the tools need a face lift. Hope you stay engaged and post your contributions as we go along.

Caroline and Alex thank you…

Caroline and Alex thank you so much for your comments. Not only do we need to rethink evaluation but we need to be think the aims of development. Too often it is we who design development projects with what we think they should have rather than consulting local partners and participants in-country from the onset on what they think they should have. Far too often we evaluate how well they fulfilled our plans rather than them evaluating us on how well we have helped them have sustainable health, livelihoods, environmental conditions (as your article alludes to). Do we design for sustained impact? Not yet ;). Thank you.
More at www.ValuingVoices.com

Point taken. And, by…

Point taken. And, by rethinking evaluation -- including our purpose, approaches, tools -- we can stimulate a conversation and generate evidence to contribution to the dialogue that the development community is undergoing as well.

Readers will probably have…

Readers will probably have many candidates for additional or replacement criteria, but one which I think quite a few people would like to see included is: equity

Many thanks, Rick. Very…

Many thanks, Rick. Very useful. We are right now in the process of evaluating what the World Bank Group calls "shared prosperity", which is all about distributional effects and equity. There will be a lot to learn in terms of evaluation methods. Looking forward to your suggestions.

There is a trend towards…

There is a trend towards technicist instrumentalism, a kind of elaborated tick box approach to evaluation, that answers the questions of the evaluation criteria but in the end often leaves me feeling flat, dulled by the blandness of reports that lack sharply observed engagement with the complex realities of the programme. Evaluation criteria and the associated norms, standards and quality assurance measures were needed to support the many new evaluators who came into the field as it expanded in 2000s. That helped make sense of evaluation, but now we are in danger of losing its real meaning. The criteria have done their time, bravo, but lets now loosen the straightjacket.

Thanks Caroline for a…

Thanks Caroline for a stimulating blog post. What we need are evaluative omnivores, more flexibility - willing to try new approaches and methods (developing the menu - not sticking to it) that are driven by context and questions and the demands of those wanting evaluation (who, at least outside of the bilaterals and multilaterals not always aware of R/E/E/I/S or want it when they are made aware). We use it for some evaluations (when appropriate) and others we are trying out feedback (constituent voice - with social entrepreneurs) and exploring the use of developmental evaluation for impact investing - where flexibility, embedding evaluation / and evaluator into operations, quick feedback and learning is needed in interventions which are not neatly conceptualised and subject to change. R/E/E/I/S in the way it is presently applied does not offer much flexibility. It probably has contributed to the mass of impentrable evaluation reports being left of the shelf - the same old reports spat out to the same old formular. We are in the business of evidence, learning, influence and change - can we always get it OECD DAC criteria? No. So I agree, time for a rethink and a fun and interesting debate to make relevant for the challenges of the SDGs, impact investing and corporations who want to be 'a force for good'.

Many thanks, Lee, for your…

Many thanks, Lee, for your comment and contribution. It is exactly this -- opening our tool box and asking ourselves what else do we need to be most useful to policy-makers and practitioners so that they can make better informed decisions -- that we aim to stimulate with this discussion. Thanks for sharing your experience.

Thanks Caroline for a…

Thanks Caroline for a succinct, but very powerful message on the need to rethink the current practices in evaluation. I am relatively new to evaluation. Couple of observations: (a) Evaluation criteria are set to measure the "project mode" of work. There is a need to adapt evaluations/use flexible approaches in several areas where global organizations work, which involve supporting political processes and aims at policy changes. As you rightly pointed out, evaluation in such cases are complex in nature, and attribution is often difficult; (b) Impact of normative work of global agencies are not easy to measure, as decision to use and the methods of use of the normative tools/products/instruments involve political decisions.

Look forward to more thought provoking posts.

Thanks for sharing these…

Thanks for sharing these thoughts. The evaluation tools have often been used well beyond the project for larger evaluations, where they make some sense, but might have over-powered other important questions that need to be asked. For the normative work, the UN Evaluation Group discussed some time back the evaluation of normative work. Hopefully some of them will contribute to the discussion here.

Thank you, I enjoyed reading…

Thank you, I enjoyed reading the ideas here. Doing “complexity aware evaluation” or one that has “equity” at its core is a lens through which evaluation projects could take place, regardless of what criteria is used. But I do agree that developing these concepts into explicit evaluation criteria, or adapting the current ones to better represent these ideas, help keep them at the forefront of evaluation projects. It does matter to non evaluators, absolutely, particularly to program designers, but also to anyone who genuinely wants to help put data into action.

Good points. Thanks for…

Good points. Thanks for pointing out that users of evaluation also feel the need for this information and clarity about what gets evaluated and how.

It is time to improve the…

It is time to improve the evaluation practice by adopting better and more robust approaches to the assessment of program impacts and of their relative contributions to observed results in the normal context in which these programs operate. Politicians need more reliable assessments of the cost-effectiveness of program interventions to support (re)allocation decisions. Unfortunately, because of capacity issues, the evaluation community has been focusing too much on efficiency and has generally failed to deliver critical information on effectiveness and cost-effectiveness. Economists have been able to develop complex monitoring systems to measure and manage the impact of fiscal and monetary programs/policies. The technology and methodology has been there for some time. The evaluation community should start pushing for equivalent system and capabilities to monitor and manage the impact of public programs that are more oriented towards social development, health and environmental issues.

Michel, interesting point…

Michel, interesting point. First of all: thank you for recognizing that decision-makers need evaluation inputs, and for pointing to some areas where more work would be helpful. The only question I have: where have evaluators focused more on efficiency? i am not sure that this is the case where I have worked and would like to learn more about examples and practices.

Thank you for challenging…

Thank you for challenging the status quo. The five DAC criteria indeed have been very helpful in bringing discipline, comparability and reliability into the evaluation world , but they often need to be adapted, stretched or accompanied by other criteria to better analyse and unpack the complexity of the world we try to evaluate. For instance by including other criteria such as equity, accessibility, sustainability, policy coherence, etc. The new SDG indeed offer us a great opportunity to stop and rethink how best we can meet the increased demand and desire to unpack and analyse out reality to guide future policy interventions.

Interesting discussion. I…

Interesting discussion. I agree that diversity, social distribution, environment, unintended effects, are increasingly important to consider in evaluations but in my opinion the five criteria will still remain valid and can cover these issues, with enlarged and rethought methods though. In the Council of Europe we address often the political framework conditions in legal and political terms, providing advice for changes of constitutions and laws in various fields, but also through projects ranging from policy advice to training and technical assistance in many areas. They are not growth oriented nor affecting the environment nor the income distribution but nevertheless can be and are evaluated by us using the five criteria and mostly qualitative methods of data collection. Our main problem is documenting impact, as this is long term and the causal chain is difficult to isolate.

Ansgar, thank you. I agree,…

Ansgar, thank you. I agree, there are a lot of things that can be addressed with the criteria, but that will depend on individual evaluators and lead to rather uneven practices. While there is value in interpretation and flexibility, if there are areas where consistently we need to interpret and adapt, maybe it is time to rethink a little, without throwing the baby out with the bathwater.

Good to discuss, but I think…

Good to discuss, but I think that you can build most of the social, equity. environment related issues into the initial set of objectives and then they will show up in the evaluation, even if using the five DAC criteria. Afterwards it is rather a question how to get the data, including on unintended effects.

Very interesting discussion…

Very interesting discussion. From my perspective, the big Five remain a valid to discuss about evaluation objectives. What I’ve found frustrating is when these criteria are put in ToR as a sort of shopping list where implementers or donors (more often implementers) ask for everything without really focus on specific, realistic and useful key research questions.

Agreed

This is an interesting…

This is an interesting discussion. I would like to add some points related to the origins of the DAC Evaluation Criteria, provide a suggested explanation for the wide-spread use of the criteria, and comment on the perception that they have been conceived as a ”straightjacket”.

First on the origins. In 1989, and quite freshly recruited to the OECD, I took on responsibility for the DAC Expert Group on Aid Evaluation (now the DAC Network on Development Evaluation). One of my first major tasks was to lead and co-ordinate the drafting of the DAC Principles for Aid Evaluation which were developed over a two year period with member countries and partner organisations. The principles were adopted formally by the Expert Group and then by the DAC in 1991. The principles spelled out the role and purpose of evaluation in development co-operation and clarified important principles for the evaluation function in agencies – such as impartiality , independence and credibility, as well as on evaluation programming and management. The principles also state that it is essential to define the questions which will be addressed in an evaluation – these were referred to as the ‘issues’ of evaluation, and laid out a manageable framework with basic groups of evaluation issues. These issues are the origin of what then became the DAC Evaluation Criteria.

Why have these criteria become so successful in terms of their wide-spread use? Even I have even been somewhat surprised by that, to tell the truth. While we expected them to be used by our members, it is clear that they are used by many more actors in development including partners, civil society, etc. I believe that a key reason for their success is that the DAC criteria are a manageable and relatively easy framework to understand and to use for grouping your key evaluation questions. Moreover, if each funder or organisation used their own specific criteria, it would be difficult to compare and collaborate on evaluations. We have worked to unify basic concepts, terminology, management of evaluations, and quality standards so that we can work together and communicate with a shared understanding. It should be noted that much of the normative work is available in a number of languages besides English – which has not only involved translation but adaptation to the linguistic and cultural setting in which these tools are to be used.

Finally on the “straightjacket” argument. The criteria were never meant to be a straightjacket, but conceived rather as a tool to help more or less experienced evaluation managers to structure the key questions of the evaluation. We have recommended an application of the criteria that should be tailored to the purpose and use of the evaluation – this may mean that some criteria will be in focus while others may not be relevant for the evaluation. Moreover, we have developed additional criteria for use in evaluations of humanitarian assistance - such as coverage and coherence - and also in our guidance on evaluating in settings of conflict and fragility. The DAC Evaluation Standards, adopted in 2010, were developed through a three-year test phase followed by a consensus building processes led by the Network secretariat with many development agencies, partners and ministries involved. The quality standards recommend the application of the agreed criteria, but it is also explicitly stated that the application of the five criteria and any additional criteria depends on the evaluation questions and the objectives of the evaluation. We are not in favour of a straightjacket or a mechanistic application. Rather, they should be used as a framework to help frame the key evaluation questions.

This said, a discussion on the criteria, their application, and potential new dimensions is certainly welcome. Should a change be needed we should be open to it - but not a change for the sake of change but only if we are convinced that the current approach does not help us in framing the right questions to assess an activity, programme or policy.

My apologies for reacting…

My apologies for reacting late to the last contribution, certainly a most valuable one to those who view evaluation practices in an historical perspective. Indeed, I concur with the view that the establishing a set of uniform criteria has proved to be effective in structuring development evaluations, and thereby their overall technical quality.
However, not all development agencies enjoy the ‘luxury’ of having independent evaluation units or departments. Even if they have, I wonder whether the REEIS criteria have really served as a kind of safety velvet in ensuring the “impartiality, independence and credibility” of the evaluation function, as mentioned by Hans (Lundgren). If this were the case, individual professionals – all learning on the job – would have become more autonomous in exercising their evaluation role than some decades ago. This hypothesis deserves to be tested, so let me illustrate this below.
Last week I realized that based in a private consultancy and research foundation (CDR in San José, Costa Rica) I had become involved in external evaluations for a period of exactly 20 years. The assignments mostly focused on rural development ‘interventions’ relating to financial services, value chains and (fair) trade networks, environmental governance, sector programmes and institutional development. Most of them were ex–post evaluations, carried out in the western hemisphere. Partly I undertook the evaluations alone, but to a significant extent it was teamwork.
I decided to assess the extent by which I had experienced the three above mentioned elements in each of the in all 64 assignments, during the period 1997 – 2016. Such exercise is obviously due to subjective judgement and prone to some retrospective mental bias, so I should at least try to reduce that risk.
‘Professional autonomy’ may for this purpose viewed in three dimensions. First, the quality assessment of the technical feedback with the client, concerning evaluation methods and tools for data collection. Second, the support received and latitude offered during the field phase in meeting with relevant actors. Third, the receptivity of the client in dealing with – i.e. ultimately accepting – the contents, conclusions and recommendations of the evaluation. On each of these indicators, I gave a score to the assignments between 1 (lowest) and 4 (highest). In determining the overall score of an assignment, I decided to give the first two factors single weight, and the last one a double weight. As a final step, I wanted to average the yearly averages in a quinquennial (5 yr.) average. So apart from an average 20 year score, for each half a decade I got a separate score.
The resulting average of ‘professional autonomy’ for the entire 1997-2016 period is 3.45, which may be interpreted as between sufficient and good. However, the five year averages turned out to be 3.58, 3.59, 3.40 and 3.21, respectively. This in my view reflects the trend as perceived by myself that, in spite of methodological upgrading, there has been a steady loss of impartiality, independence and credibility. This has less to do with the REEIS criteria (including the relevance of the R) and more with the institutional and policy context in which evaluation work has been carried out over the last ten years.
At the time of writing, at many levels there is a debate going on about journalists, judges and other professionals, whose work – with unpleasant results to those at the top – has come under immense pressure. Not in the public domain, but to some extent certainly comparable, has remained the work of development evaluators when facing their clients. I fully agree with overarching goals of respecting underprivileged groups, limiting overconsumption and dealing with complexity. However, the conditions for that include a minimum degree of autonomy of the profession, which in my impression has not enjoyed the same degree of global recognition as REEIS. Time for a rethink?

Hans, thanks for your very…

Hans, thanks for your very personal reflection and sharing your long-standing practice. I agree that the criteria -- REEIS -- do not automatically lead to independence, credibility, and utility. For that some other dimensions are needed. But, transparency and clarity about criteria, methods, and processes is an important factor in ensuring credibility and can lead to utility. In my view it even helps with independence, because it makes it more obvious whether and if so where interference happens in assembling and interpreting evidence.

Thanks, Hans, you are really…

Thanks, Hans, you are really the institutional memory on this topic. And, you are right: I shaved some 10 years off their existence and struggled to reconcile my knowledge and use of them well ahead of the compendium. Thanks for that clarification. I agree with you about the value of the criteria in shaping the profession. Equally, I hope this renewed debate about rethinking evaluation will help renew the profession in ways that the criteria did way back in the early 1990s.

You do not need contemporary…

You do not need contemporary reasons for ditching the REEIS - they have always been terrible: a bureaucratic solution that imbedded the input-focused desires of those who came up with them. Given the high degree of failure in ODA projects (I am not being cynical here - failure should be expected as normal - in complex dynamic environments in developing countries, it may be higher than private business start-ups in our own countries [where over 50% of new businesses close within two years]), what is crucial is understanding impact. Yet under REEIS that is just one of five criteria, and indeed - god knows how or why - is separated from sustainability. I suppose if we put them together we might have to take post-evaluations seriously. "Relevance" is the most irrelevant, certainly in practice as it is a box ticked by being a project supporting some vague sentences of a recipient government 5-year plan (why do developing countries need these and ours do not?). "Efficiency" and "Effectiveness" are just arse-covering for ongoing funds by donor organisations: private firms would laugh at the whole value-for-money black hole that DFID has gone down. Who cares about these when projects have no impact - oh, sorry, there are no failures!

This is a beautiful and…

This is a beautiful and inspiring article (and comments!!!). REEIS are not terrible. The challenge for evaluators remains the operationalization and tailoring of the big-five criteria to the specific context and operation being evaluated at the time of preparation of the evaluation terms of reference. If we tailor, the straight jacket gets more comfortable…even if excessive tailoring comes at the price of reduced comparability. In the future, in a complex world where resources are scarce and the risk of social resentment is high, concepts such as social equity, social well-being, policy synergies, coherence and connectedness can become more and more dominant in our evaluations for the simple reason that they are key elements in determining the merit and worth of development operations. Not sure though whether these can be elevated to criteria. Similarly, as said in a more recent article, the challenge will be having the right instruments to draw conclusions on the sustainability….

Inspirational blog, thinking…

Inspirational blog, thinking beyond the boxes! Conventionally, evaluation sounds like giving subject grades at high school and REEIS echo the subjects (Maths, Science etc.). The evaluation focus is also on the linear relationship of inputs to impacts; and resonates reactive. The question for me is on how I can make my evaluation proactive in getting expected and unexpected OUTCOMES on poverty, gender, health, environment, climate change (MDGs/SDGs) and learning. As an aspirant to evaluation, I was wondering if improving/changing criteria would make the evaluation a keyed up process.

Indeed, it is reaffirming to…

Indeed, it is reaffirming to read this blog and the resonance in the comments. It is a topic that we have also been discussing and will have a related panel presentation at the American Evaluation Association conference in DC this November called, “Beyond the DAC-gnificent evaluation criteria: From learning to action through methodological innovation.” (Panel Members: Michele Tarsilla, Steven Hansch, Sara Vaca, Riccardo Polastro, and myself.) “The panel will consist of an in-depth reflection on the use and misuse of criteria in contemporary evaluation practice. Based on the realization that evaluation commissioners often include OECD-DAC criteria in Terms of References (ToR) by default, this panel will encourage the audience to rethink the way evaluation criteria are selected during the development of either ToR or evaluation proposals. The first presentation will showcase instances where the integration of OECD criteria with others outside of the “conventional” paradigm, has proved particularly beneficial. The second presentation will focus on how the use and periodic revision of internationally agreed standards in humanitarian evaluation (e.g., Spheres) has generated operationally relevant learning. The third presentation will specifically engage the audience in a visualization exercise that will illustrate the linkages between the OECD-DAC criteria and a set of newer ones increasingly used in development and humanitarian evaluations.”

Good story Caroline. The…

Good story Caroline. The five criteria could remain relevant to achieving SDG with a new approach to risk tolerance and management. LDC's will struggle to make progress towards middle income security and growth without 'game-changing' or transformative strategies, policies and programmes, coupled with improved implementation capacity and resources to match. The five criteria, 'R/E/E/I/S', dont really deal with transformative impact, just impact. Ethiopia is one country that's explored the transformative strategy and achieved some success but implementation has remained challenging. For development partners to remain 'relevant' in a 'Copernican moment' or transformation phase they need to increase their tolerance for risk in taking on solutions to the serious binding constraints facing LDCs and not just run away when an investment opportunity looks a little risky. Can expect some projects to fail with higher risk tolerance but others to produce the high level outcomes that are need to have a transformative impact.

Article
Blog
comment compare
Custom decscriptions
Data
Evaluation
Multimedia
Event
Expert
General Documents
Homepage spotlight feature
Home page content spotlight
ICRR Reports
IEG Timeline
MAR
News
Basic page
Podcast
Reader chapter
Reader publication
Reports
Series
Survey Banner
Topic
Upcoming Report
Upload Mar
Xml Import

Rethinking Evaluation – Have we had enough of R/E/E/I/S?

Rethinking Evaluation – Have we had enough of R/E/E/I/S?

TWEET THIS:

Values

End Game

Complexity

Technology

Cost & Benefits

Read other #Whatworks posts in this series, Rethinking Evaluation:

About the Author

FILTER BY

Comments

Add new comment