The world’s leading development economists and evaluators have been engaged in a passionate argument for years on Randomized Controlled Trials (RCTs) vs. Observational Studies.   For those who want the four minute version of the debate, check out the exchange  last year at the NYU Development Research Institute between  economist gurus Abhijit Banerjee and Angus Deaton.

Unfortunately, the tendency towards bipolar choice is common in many fields, not least the evaluation profession.  I remember the debates way back in the early ‘90s when adherents of quantitative and qualitative methods were arguing that their methodology was the only one that was right. Then, in 2006, the article “When Will We Ever Learn” was instrumental to launching the birth of rigorous impact evaluations based on RCTs.  This, in turn, has prompted a counter-movement to advocate qualitative methods, pushing the envelope for participatory evaluations to go beyond focus groups and ask whose reality counts.

This was the good side: at each end of the spectrum, people challenged themselves to enhance methods and deepen debates.  But how much time and energy did we have to invest to come out in a better place?  Eventually the Network of Networks on Impact Evaluation – an initiative to bring the two sides together – helped reconcile positions, at least among evaluators.  But, many donors still demand RCTs, and universities churn out students that are sold on the idea that this is the only way to go.

IEG’s evaluation of impact evaluations undertaken by the World Bank Group flagged some of the weaknesses of this approach.  The quality of impact evaluations is not uniform, the choices not strategic but rather clustered around a few subjects like conditional cash transfers, and – perhaps most importantly --  the use of their results was, especially in the early years, negligible.  

So, is it time to abandon ship on RCTs?  Development Economist Lant Pritchett delivered a searing critique of RCT Randomistas a few weeks ago at the fall meeting of the Evaluation Cooperation Group.  His main concern was about external validity and the danger of extrapolating from one context, often at small scale, to another very different context.  Instead, he urged for more “structured experiential learning,” which allows implementing agencies to rigorously search across alternative project designs using the monitoring data that provides real time performance information with direct feedback into project design and implementation.

My view?  What matters to me more than who is right or wrong, is that we need to draw on each and every method to deepen our understanding of what happened, how and why.  And not just, the independent evaluation folks but also those implementing projects who monitor, observe, evaluate – RCT, quasi-experimental, or qualitatively – and can do so real-time for greater learning.

What this means for our work in IEG is that we are now combining systematic reviews of existing impact evaluations with portfolio analyses and findings from qualitative evaluations, and tapping into big data and social media to get more information that is out there. And we combine all of that with our own interviews and site visits. The range of data points and opportunities for triangulation is incredible and each perspective enriches our understanding, not just of the “what” but more importantly the “why and how” that will help us – as development practitioners – replicate success, adapting from one situation to another, and avoid as much as possible failure.


Submitted by Sandra Gebhardt on Thu, 11/14/2013 - 06:00

In my own organization I drew many lessons from observing committee meetings in which new projects were being considered. Typically such committees had a long list of agenda items to cover and limited time to cover them. Out of, say, six key considerations of whether a given project deserved to move on to the next stage, only one or two would be discussed in detail. The question of which handful of issues were the ones really muddled-over I believe was important to the quality of the project and (to the extent such project would become a precedent for future projects) of the institution’s ongoing operations. Who or what determined where the committee focused its time and attention? The chairperson’s predispositions for one; and then there was a kind of unwritten rule about what types of input were culturally acceptable, either generating enthusiastic nods or impatient sighs around the room. I am sharing this aspect of my experience to jump-start the debate to which Caroline refers above: while I agree that all methods have their place, given the subjective nature of the factors I described and the complexity of the organization in which these decisions played out, I would place special value (at least for purposes of corporate and process evaluations) toward a highly contextualized and participatory method.

Submitted by Tessie Tzavara… on Wed, 11/13/2013 - 22:49

Thank you for raising this important topic, Caroline. In fact, at the AEA Conference in Washington, DC this October, we were discussing how we might invite the RCTers to present, and how to promote cross learning. I would love a follow-up blog or longer publication showcasing specific experiences at IEG. We need more examples of what it means "to draw from RCTs and other methods." I.e. at IEG, do you COMBINE elements of RCTs and other methods, do you apply them SEPARATELY and draw lessons, or is there another way? It would enhance our discussion to be led through one such evaluation in terms of process, methods, and evaluation outcomes. I look forward to learning more from the IEG's experience.

Submitted by Nikhil on Thu, 11/14/2013 - 20:42

Delighted. Thank you.

Submitted by Jeff Tanner on Mon, 11/18/2013 - 20:47

Embrace all APPROPRIATE evaluation methods. The challenge with inclusiveness is that it can lead to a watered-down inability to make any useful conclusions: When all methods are equally valid, we are left with no way to discern between competing claims. Rather, we should endeavor to apply the right method to the right question. In doing so we must be clear about the explicit identification assumptions as well as the too-often implicit ontological and epistemological assumptions behind different methods of evaluation. Although there is no hierarchy of evaluation methods generally, there is for specific questions. So while an experimental or quasi-experimental design is probably not the right method to answer the question about how stakeholders viewed the rollout process of a particular type of intervention, it is the right method to answer the question of the causal, attributable effect of a defined intervention in raising a defined construct of welfare for a defined population at a particular place and time. The question of external validity is a valid one. As Lant points out, there are few “invariance laws” for social science. In practice, therefore, the general principle of “external validity” is not likely to be accurate—there is no such thing as general generalizability in social sciences. Moreover, the question of out of sample prediction, which is what critics usually mean by external validity, is not limited to any particular type of evaluation method; all have the same challenge. Rather, we must have a particular, defined context in mind when discussing the transferability of programmatic findings, regardless of the evaluation method that generated those findings. The advantage of systematic reviews is not that they prescribe universal policy, but that they present those interventions that tend to be robust to the many ways things can go wrong and describe the contexts in which they worked to allow policy makers to form their own judgments on the appropriateness of local application. Beyond the oft-repeated and as often agreed to bit of useless wisdom that “context matters”, the challenge for evaluation—of all stripes—moving forward will be understanding which are the important elements of context that matter. While the applicability of a finding from one intervention to another place, time, or scale is never 100%, neither is it zero. Discovering the key elements upon which transferability is likely to hinge may well rely on both social science theory and the arsenal of evaluation methods, appropriately applied.

Submitted by Caroline Heider on Wed, 11/20/2013 - 03:23

Sandra, thanks for flagging these important issues. Yes, culture matters, whether at entry, during implementation or at the end of a projects life. When you look at IEG evaluations you will often find a reference to "incentives" which are part of the culture that drives behaviors. We are in the midst of an evaluation that is trying to understand the conditions that stimulate or hinder learning in World Bank lending, which should help us generate very interesting insights.

Submitted by Caroline Heider on Wed, 11/20/2013 - 03:29

Tessie: you will see a new post today on a Systematic Review of Impact Evaluations of mother and child health interventions. This review used existing impact evaluations and analyzed them first for their quality and then for their findings, including reliability. Building on this experience we are now undertaking a study of gender and social protection, which combines a systematic review of existing impact evaluations with a review of the World Bank's portfolio, and with one of qualitative evaluations on the subject. The exercise is drawing on existing information rather than going out to collect new data. In addition, we are doing systematic reviews for other large-scale evaluations, for instance, on access to electricity, where the systematic review will be combined with literature and portfolio reviews, in-depth project evaluations and country case studies. We will make sure to share more information about the "how" we went about these processes and not only what we found. Thanks for stimulating the discussion.

Submitted by Caroline Heider on Wed, 11/20/2013 - 06:16

Jeff, you are entirely right -- it is about choosing the appropriate evaluation method for the question. And, since our evaluations often cover a complex set of questions, it is about the right combination of methods within an evaluation to triangulate findings from different sources and through different methods to develop a better understanding of what happens, why and how.

Submitted by PAUDYAL, Dhruba P. on Sat, 12/07/2013 - 23:49

Selection of appropriate method can be very much influential in one evaluation or the other. Comparing two idiosyncratic environments, by no means, either experimental or quasi-experimental, either quantitative or qualitative, either systematic review or others, culturally justified. However, its a way of learning not finding a truth. None of the things behave exactly the same way as the other. Thus, it is better to be adaptive based on the rationality, relevancy, effectiveness, equity, the impact and sustainability perspective and if economy permits to verify the outcomes of one method by the other, triangulate to be closer to reality. Every evaluation method has some merits and some demerits. Thus, the capacity of weighing the value of a particular method for a given context and finding a best among options is the ingenuity of an evaluater the matters a lot.

Submitted by Caroline Heider on Mon, 12/09/2013 - 05:20

In reply to by PAUDYAL, Dhruba P.

Dhruba, well said! And this is what makes our profession exciting and dynamic: the many opportunities to grow and deepen our understanding about development as well as about evaluation, discovering patterns where none were expected, or iconoclast long-held assumptions when evidence tells us otherwise.

Submitted by Mathew Varghese on Wed, 01/08/2014 - 07:05

Very much enjoyed reading the discussion here. Evaluation in a complex environment of a complicated programme/policy requires a careful selection of methods and design to suit the question we have at hand. We cannot use the same tool for all purpose but try and select the best tools/methods for the issue at hand. The other variables that need to be taken into account would be cost and use of the findings in the final selection of methods. In the end, experience (not much literature as evidence) has shown the simpler the method the better the final quality of the evaluation report. There are many reasons for this but one being that the just as context is changing during a programme/policy intervention so is the context changing during the conduct of the evaluation and the fact that despite good methods and design of the evaluation the team conducting it may not have the capacity to conduct the evaluation as designed.

Add new comment