Meta-Evaluation of IEG Evaluations

Chapter 2 | Framework

Evaluation question 1. Can the meta-evaluation appraise the quality and credibility of IEG evaluations according to a dedicated assessment framework? How would such a framework be operationalized?

An assessment framework was developed to delineate the scope of the meta-evaluation, focusing the analysis on relevant evaluation reports and Approach Papers and their methodological characteristics. Per IEG’s request, the meta-evaluation sought not only to look back on past evaluations but also to present IEG leadership with suggestions on how to improve the quality and credibility of its evaluations. As such, a focus on innovative developments and approaches within evaluations was deemed important. The assessment focused on the credibility of evaluations (excluding utility and independence). More particularly, it focused on aspects of credibility that could be gleaned from the reports and Approach Papers. The exercise did not cover attributes of credibility that could not be assessed on the basis of the reports and Approach Papers, such as consultations between evaluators and counterparts, expertise and evaluation team composition, quality assurance process, and peer review.1

Development of the framework began with a set of relevant Bank Group documents, notably World Bank Group Evaluation Principles (2019). The document discusses the credibility of evaluations as “grounded in expertise, objectivity, transparency, and rigorous methodology [emphasis added]. Ensuring credibility requires that evaluations be conducted ethically and be managed by evaluators who exhibit professional and technical competence in working toward agreed dimensions of quality. Independence is a prerequisite for credibility” (World Bank Group 2019, 5). The document also makes the point that the “rigor of evaluation design and of the corresponding data collection and analysis enhances the confidence with which conclusions can be drawn. Rigor is a prerequisite for the credibility of evaluation findings and, in turn, for evaluation use” (World Bank Group 2019, 13).

The meta-evaluation’s focus on the methodological attributes of evaluations thus links to the perspectives on quality and credibility elaborated above. The approach also builds on the definition of evaluation quality from a methodological perspective developed by Vaessen (2018).2 According to Vaessen, quality from a methodological perspective can be understood as a function of validity (internal, external, construct, and data analysis validity), reliability (the idea that the evaluation process can be verified and in part replicated), consistency (the need for a logical flow among the evaluation rationale, questions, design, data collection and analysis, and findings), and focus (balancing depth and breadth of analysis in evaluation).

In addition to the resources outlined above, the meta-evaluation also drew from the Big Book on Evaluation Good Practice Standards, published a decade ago by the Evaluation Cooperation Group (ECG 2012). This resource proved valuable to the development of the assessment framework as it provided guidelines on how to “organize the evaluation principles by type, i.e., general and specific, as well as to address overlaps noted in the good practice standards and to resolve differences in terminologies” (ECG 2012, 4). For the purposes of the meta-evaluation, chapter VI-A, “GPS on Self-Evaluation,” on good practice standards on country strategy and program evaluations, provided the most relevant guidance.3 The good practice standards outline 16 principles on the process of evaluation and methodological best practices. They are supported by a corresponding set of operational principles, including “Guidance Note 1: Attributing Outcomes to the Project” (annex III.3).

The assessment framework further benefited from five other resources. First, the Organization for Economic Co-operation and Development—Development Assistance Committee framework provided useful inspiration on assessing the rationale, purpose, and objectives of evaluations. The framework also offered useful guidance on scoping evaluations, developing an intervention logic, gauging the validity and reliability of information sources, and clearly linking evidence to evaluation questions.4 Second, attributes and operationalization schemes from the UN Evaluation Group’s Norms and Standards for Evaluation (2016) informed the development of the assessment framework. These were combined with checklists and approaches used by evaluation functions from international organizations such as United States Agency for International Development and the Norwegian Agency for Development Cooperation. Third, the framework drew on insights from three professional evaluation societies (the American, Canadian, and UK evaluation associations) to refine its assessment of methodological standards and quality. Fourth, a set of criteria published by knowledge institutions and repositories such as Campbell and 3ie were used in refining the framework’s evaluation of methodological quality. Finally, a number of guidance books, handbooks, and seminal papers were used to develop and operationalize the framework.5

The assessment framework was finalized after a series of meetings with the members of the meta-evaluation team (Frans Leeuw, Julian Gayfer, and Ariya Hagh) under the guidance of IEG’s methods adviser. The framework operationalized seven main attributes of methodological quality in evaluations: scope and focus, reliability, construct validity, internal validity, external validity, data analysis validity, and consistency.

The assessment framework was then applied to a stratified random sample of eight evaluations. Evaluations were rated on each of the attributes, using the following scale: “adequate, inadequate, partial, or nonapplicable.” The inventory of methods did not assign scores and was devised as an objective means of gathering aggregate-level information from the full universe of evaluations between FY15 and FY19. Appendix C provides a full elaboration of the framework, its operationalization, and the various facets it incorporated.

Evaluation question 2. Which data are required for such an assessment framework?

The data used in the meta-evaluation were collected and analyzed in several steps. As noted earlier, the assessment included an inventory exercise covering the universe of 28 programmatic and corporate process evaluations (Approach Papers) and evaluation reports completed between FY15 and FY19.6 It included both programmatic (N = 20) and corporate (N = 8) evaluations. Programmatic evaluations focus on activities, programs, and operations that have been financed or implemented by the Bank Group, or both, to support clients in achieving their national development goals, the Sustainable Development Goals, and the Bank Group’s twin goals of reducing poverty and boosting shared prosperity. Corporate evaluations focus on the Bank Group’s internal processes, systems, and behaviors, which are designed to improve the organization’s efficiency and effectiveness.

The full universe of evaluations was used in an inventory exercise of methodological aspects referenced in both Approach Papers and evaluation reports. First, automated content analysis was used to provide preliminary insights on the prevalence and distribution of methodological approaches cited. Next, manual coding was used to generate a more granular measure of said attributes. Finally, the output data were aggregated and broken down by type of method, the range of methods employed, and the level of congruence between proposed and delivered methods.

The inventory of evaluation methods was conducted according to a coding scheme classifying research methods as conventional or innovative, with the latter emphasizing the use of approaches such as machine learning, network modeling, geospatial methods, and qualitative comparative analysis.7 The assessment of conventional methods included both qualitative and quantitative approaches commonly used in evaluation reports. After coding the range of methods used in both Approach Papers and evaluation reports, the full sample was then disaggregated according to the type of evaluation (corporate versus programmatic) and the prevalence of innovative or conventional methodological approaches. The results from this exercise were converted into a matrix (table 2.1).

This matrix was used to generate a sample of reports for in-depth review. To ensure that both methodological diversity and variations among evaluation types were preserved, reports were randomly selected from each of the four cells in line with the proportional distribution of evaluations in the evaluation universe. The reports selected for in-depth review are shown in bold in table 2.1. Stratified randomization ensured that at least one report was selected from each cell, examining a range of both corporate and programmatic evaluations employing both conventional and more innovative evaluative methods. Given the disparity between the number of corporate and programmatic evaluations, two reports were chosen from the former and six from the latter category. The results of the in-depth review are explored in chapter 4.8

Table 2.1. Division Matrix of Evaluation Reports

Report Type	Method Type
Report Type	Broadened or innovative	Conventional or standard
Corporate	Learning and results Self-evaluation systems Engaging citizens Knowledge flow and collaboration Convening power	Program-for-Results SCD/CPF process IFC client engagement
Programmatic	Financial inclusion Electricity access Creating markets Data for development Support for shared prosperity Health services Carbon finance Forced displacement Early childhood development Fostering regional Integration	Facilitating trade Ending poverty Capital market development Urban transport Water supply and sanitation Higher education Rural nonfarm economy Pollution management Competitiveness and jobs Urban resilience

Source: Independent Evaluation Group.

Note: Bolded text represents reports selected for in-depth review. This table provides the topics of the reviewed evaluations. For the full titles and information, see table FM.1.

Next, in-depth review (including coding and scoring) was conducted in several stages by Frans Leeuw and Julian Gayfer on the eight sampled evaluations (on the basis of reports and Approach Papers). The first stage involved a test to gauge the workability of the framework’s operationalization guidance: two IEG reports and their corresponding Approach Papers were selected for this purpose. Leeuw and Gayfer independently coded the selected reports, subsequently comparing scores in a meeting to evaluate the consistency of ratings and ensure intercoder reliability. The results of this test indicated that the operationalization of the assessment framework appeared to be consistent, relevant, and reliable. Having established this, Leeuw and Gayfer independently analyzed all eight evaluations in the sample, assigning scores to each according to the seven attributes under consideration.9 These results were again compared, and after adjudication among Leeuw and Gayfer, the final scores were assigned. Finally, nine interviews with IEG staff were conducted with task team leaders and senior IEG evaluators to complement the findings.10

This is a common limitation of meta-evaluations.
We use the term programmatic evaluations in this report.
The Big Book also pays attention to self-evaluations in chapter VI-B.
The meta-evaluation specifically drew on a number of the elements listed in sections 2 and 3 (OECD-DAC 2010, 2, 3, 11–14).
Among others, see Farrington 2003; Dfid 2012; NONIE 2009; Bamberger, Rugh, and Mabry 2011; Cook and Campbell 1979; Leeuw and Schmeets 2016; and Hedges 2017.
Note that no Approach Paper was available for the ending poverty (FY15) evaluation. As such, this evaluation was excluded from some of the analyses conducted.
These methods are also referred to as “broadened” in the meta-evaluation. See appendix E for more details.
See appendix A for a full list of selected reports and the procedure used to draw the sample of evaluations for in-depth assessment.
Output from this scoring exercise can be found in appendix D. Discussions surrounding the revision of attribute scores can be shared by request.
To ensure adequate confidentiality standards, notes from the interviews were made available only to the external experts conducting the meta-evaluation. These notes will be destroyed one year after the finalization of the meta-evaluation.

Article
Blog
comment compare
Custom decscriptions
Data
Evaluation
Multimedia
Event
Expert
General Documents
Homepage spotlight feature
Home page content spotlight
ICRR Reports
IEG Timeline
MAR
News
Basic page
Podcast
Reader chapter
Reader publication
Reports
Series
Survey Banner
Topic
Upcoming Report
Upload Mar
Xml Import