Discussions about quality in evaluation are as old as the discipline of evaluation itself (see for example Schwartz and Mayne, 2005). They often become manifest when stakeholders disagree with the findings and recommendations of a particular evaluation. Quality is also at the core of recent (and not so recent) epistemological debates around what is considered to be ‘good evidence’ in evaluation. What most of these debates have in common, apart from passionate exchanges based on thought-provoking arguments, is a rather limited focus (either intended or unintended) on what constitutes quality in evaluation.

High quality evaluation is not an elusive idea but does require some shared understanding on what it means and the changes that would be needed to achieve it.

In this blog post I would like to present a broader take on quality in evaluation by offering five complementary perspectives that can help us better understand the construct and its underlying dimensions. Two introductory remarks are in order. First, in my discussion below I deliberately leave out a plethora of guidelines, checklists and templates associated with quality in evaluation which have been published over the last decades. I will mention a few below but they are essentially the practical tools that emanate from the issues I discuss. Second, in presenting the different frameworks I deliberately do not rigorously distinguish between what constitutes quality as such and what are determinants of quality. Such a rigorous distinction would require a nuanced discussion, among other things on the unit of analysis (are we talking about (e.g.) quality in process, quality in data and evidence, quality in reporting?). This would not only take up more space, but more importantly would unnecessarily complicate the heuristics that I am putting on the table. The point in fact is that if we allow ourselves some conceptual flexibility, the five frameworks can provide us with a more comprehensive understanding of the building blocks of quality and, by implication, some useful entry points into the question of how to go about enhancing quality in evaluation.

So here it goes. At the metaphorical top of the pyramid one could position the overarching principles that one would look at when assessing the overall merit and worth of the evaluation function (e.g. in an institution) as a whole.

Quality from an evaluation function perspective can be usefully summarized by the trinity of Independence, Utility and Credibility (see for example UNEG, 2011). Independence concerns the principle of evaluation being free of from undue political pressure and organizational influence. It is useful to distinguish between different levels of independence (explained below: structural, functional and behavioral independence). Credibility refers to the evaluation process and output being as unbiased, ethical and professional as possible and grounded in a rigorous methodology. Finally, Utility is about the relevance and timeliness of evaluation processes and findings to organizational learning, decision-making and accountability for different types of stakeholders. Obviously, in a sense these are ‘meta-principles’ which need to be further unpacked. Several frameworks have been developed that do this with varying degrees of alignment to the abovementioned principles (see for example AEA, 2018; ECG, 2012; UNEG, 2016; OECD-DAC, 2010). In addition, the four other perspectives I describe below are also strongly aligned to one or more of these overarching principles.

Quality from an institutional enabling environment perspective is closely related to the construct of ‘evaluation culture’ in a particular institutional environment; to what extent do suppliers and users of evaluation understand and value evaluation as a source of evidence or a process that informs learning and accountability processes? It relates to such aspects as the incentives and attitudes of potential evaluation users toward using evaluations, as well as the incentives, resources and attitudes of evaluators toward conducting evaluations. It concerns both the enabling environment within the evaluation function (department, unit, office) as well as the broader enabling environment within the institution. Structural independence (i.e. the evaluation function not directly reporting to management but to some overarching oversight body), with human and financial resource decisions taken independently from management, and functional independence (i.e. the ability of the evaluation function to decide on what to evaluate and how to go about the evaluation) are two important building blocks of an enabling environment for independent evaluation. Second, the budget, time, data and resource constraints that shape the opportunity space for individual evaluations constitute another important aspect (See Bamberger et al., 2011).

Quality from a human resource perspective is a key entry point for looking at quality in evaluation planning and management. Generic human resource-related aspects such as merit-based promotion linked to evaluation quality would best fit under the previous perspective. A key dimension here is team composition and the different knowledge funds that are brought to bear on an evaluation exercise (see for example Jacob, 2008). Potentially, each evaluation through its team composition should cover the following knowledge funds and skills sets: substantive knowledge (e.g. of the policy field, theme, or nature of the intervention); context-related knowledge (e.g. of the country, beneficiaries); institutional knowledge (e.g. of the operational and decision-making environment of the commissioning institution(s)); knowledge of evaluation methods; communication skills (especially pertinent in inter-cultural and politically sensitive environments), and; project management skills. In addition, the underlying experience of team members in applying these different types of knowledge/skills in evaluation-related exercises is of key importance. Finally, adequate variation in disciplinary backgrounds and knowledge of relevant substantive theories in the behavioral and social sciences is likely to be valuable. A second core aspect is behavioral independence, which refers to the professional integrity and absence of bias in the attitude and behavioral conduct of the evaluator. This aspect also closely relates to the importance of an evaluator’s adherence to ethical norms of conduct and the ability to speak truth to power (Wildavsky, 1979).

Often, the default dimension when discussing quality in evaluation is around quality from a methodological perspective. In IEG we use a framework for methodological quality that is based on a combination of common sense derived from evaluation experience and accepted principles from the social and behavioral sciences. There are multiple views in the social and behavioral sciences on assessing quality from a methodological perspective. In evaluation we often refer to the Campbellian framework on validity (i.e. internal, external, construct, statistical conclusion validity; Cook and Campbell, 1979). I prefer the recent somewhat more eclectic interpretation by Hedges (2017) who refers to the principle of data analysis validity instead of the more narrowly defined statistical conclusion validity. Validity is a property of findings and each validity dimension is underpinned by a set of principles to guide the design, data collection and analysis of the evaluation. Reliability is another important dimension which directly refers to the research process. In principle, reliable research refers to the idea that if one would repeat the analysis it would lead to the same findings. Even though replicability would be too ambitious a goal in many (especially multi-level, multi-site, multi-level) evaluative exercises, at the very least transparency and clarity on research design (e.g. methods choice, selection/sampling) should be ensured to enhance the verifiability and defensibility of knowledge claims. A third dimension is consistency, which refers to the need for a logical flow between evaluation rationale, questions, design and methods choice, actual data collection and analysis, findings and recommendations. A final and fourth dimension concerns the importance of focus in evaluation. A perennial challenge in evaluation is to balance breadth and depth of analysis. Evaluations often broaden their scope (usually) for accountability purposes (e.g. by adding more evaluation questions or dimensions of interest), thereby sacrificing depth of analysis. One could argue that for accountability (and learning) purposes evaluations should focus their evaluations as much as possible by carefully managing the size and complexity of the evaluand and the number of questions posed, so as to concentrate the limited financial and human resources on in-depth analysis and assessment. The four concepts together – focus, consistency, reliability and validity – constitute a useful lens for looking at methodological quality in programmatic, thematic or corporate process evaluations, for example. Finally, the processes put in place to incentivize and guide quality evaluation are important. This includes the use of proper quality assurance mechanisms such as evaluation peer review, reference groups, stakeholder consultation and meta-evaluation with feedback loops.

Finally, there is an important strand in evaluative thinking and practice that perceives quality first and foremost as a property of being fit for purpose from a utilization perspective. In other words, quality is interpreted from the perspective of how and to what extent the evaluation meets the (information) needs of (different) groups of users (Patton, 2001). While the use of evaluation for learning and accountability purposes is often associated with the quality of the evaluation report, in many cases the quality of process (and the inclusion of stakeholders) can be of equal or higher importance to optimize ownership of and learning from evaluation. Utilization-focused evaluation and related types of participatory evaluation (Cousins and Whitmore, 1998) emphasize the importance of stakeholder involvement and iterative learning through evaluation as the foundation of utilization, and by implication quality. In the recently developed World Bank Group Evaluation Principles (WBG, forthcoming) we propose a whole-of-process-approach to optimizing evaluation use. The basic premise is that throughout the entire evaluation process one can improve specific aspects to optimize the likelihood of effective use of evaluations. For example, in the planning and design phase, issues such as what to evaluate and which stakeholders to involve with what modalities of consultation/participation would be important to consider. In the implementation phase, the use of credible methods and adequate expertise are examples of aspects to strengthen the rigor and depth of analysis and consequently the learning potential from an evaluation. Finally, in the reporting and dissemination phase it would be important to consider among other things multiple channels targeted to specific audiences and to have clear ideas about follow-up trajectories. More fundamentally, the framework also helps thinking about what types of evaluation (in terms of resources, scope, overall approach, timing, and so on) may be optimal for strengthening evaluation use among different stakeholder groups. While these principles may not significantly change the operations of an institution in the short-term, they will help nudge the institution toward a more utilization-oriented approach to evaluation.

To conclude, an informed approach to strengthening quality in evaluation in any organization would benefit from a multi-dimensional perspective. To avoid debates that are dominated by some form of methodological reductionism or short-term fads, it is important to keep in mind the broader organizational, cultural, behavioral and process-related aspects when taking informed decisions about evaluation. High quality evaluation is not an elusive idea but does require some shared understanding on what it means and the changes that would be needed to achieve it.



AEA (2018) American Evaluation Association guiding principles for evaluators, https://www.eval.org/p/cm/ld/fid=51

Bamberger, M., J. Rugh and L. Mabry (2011) Realworld evaluation: Working under budget, time, data, and political constraints, second edition, Sage, Thousand Oaks.

Cook, T.D. and D.T. Campbell (1979) Quasi-experimentation: Design and analysis for field settings, Rand McNally, Chicago.

Cousins, J.B. and E. Whitmore (1998) Framing participatory evaluation, in: E. Whitmore (ed.) Understanding and practicing participatory evaluation, New Directions for Evaluation, 80, Jossey-Bass, San Francisco.

ECG (2012) Big book on evaluation good practice standards, https://www.ecgnet.org/document/ecg-big-book-good-practice-standards

Hedges, L. V. (2017) Design of empirical research, in: R. Coe, M. Waring, L.V. Hedges and J. Arthur (eds.) Research methods and methodologies in education, Sage, Thousand Oaks.

Jacob, S. (2008) Cross-disciplinarization: A new talisman for evaluation? American Journal of Evaluation, 29(2), 175-194.

OECD-DAC (2010) Quality standards for development evaluation, https://www.oecd.org/development/evaluation/qualitystandards.pdf

Patton, M.Q. (2001) Use as a criterion of quality in evaluation, in: A. Benson, D.M. Hinn and C. Lloyd (eds.) Visions of quality: How evaluators define, understand and represent program quality, Advances in Program Evaluation, Vol. 8, 155-180.

Schwartz, R. and J. Mayne (2005) Assuring the quality of evaluative information: Theory and practice, Evaluation and Program Planning, 28(1), 1-14.

UNEG (2011) UNEG framework for professional peer reviews of the evaluation function of UN organizations, United Nations Evaluation Group, New York.

UNEG (2016) Norms and standards for evaluation, http://www.uneval.org/document/detail/1914

WBG (forthcoming) World Bank Group evaluation principles, World Bank Group, Washington D.C.

Wildavsky, A. (1979) Speaking truth to power: The art and craft of policy analysis, Little Brown, Boston.


Have you read?

Using ‘Theories of Change’ in international development

What is (good) program theory in international development?

How complicated does the (Intervention) Model have to be?

Dealing with attribution in an increasingly interconnected and policy-saturated world



Very useful discussion Jos. Thank you.
I would be interested in discussing further with you, and with colleagues who may be interested, on the concept of quality as it could be expressed when the purpose of the evaluation, i.e. its value proposition, is to make a positive change in the lives of people, which I would argue should always be its purpose (and one of its key distinctions from research). This is a conception of evaluation as a deliberate political and scientific endeavour, i.e. seeking to affect power relationships in favour of equity and social justice based on principles of sound science. This goes beyond an utilitarian valuing of evaluation to situate the discussion in a global frame (as MQP has done with blue marble evaluation). A few quick reactions: I suggest avoiding the term rigour as it is vague and loaded. I would distinguish between findings, i.e. what we saw, what we read and what we heard - the evidence; and conclusions, i.e. the result of evaluative judgment applied to the findings. As such validity is not a property of findings but rather a property of evaluative judgment. I would move away from the "speaking truth to power" metaphor as I think it captures only a fraction of what evaluation is about; as Sandra Mathison has said, it's about "speaking truth to the powerless". "Fit of purpose" is another term I suggest avoiding as I find it also quite vague. Thanks again Jos.

Add new comment

Restricted HTML

  • Allowed HTML tags: <a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h2 id> <h3 id> <h4 id> <h5 id> <h6 id>
  • Lines and paragraphs break automatically.
  • Web page addresses and email addresses turn into links automatically.