In our previous blog we laid out a number of ethical, conceptual, and methodological challenges that are affecting our programmatic evaluation work during the pandemic, with the promise to share a framework to think through some of these challenges. This framework is organized around four questions and multiple decision points as we work our way through a fairly standard programmatic evaluation design (around a particular theme, corporate issue or sectoral area of work) as applied at the Independent Evaluation Group (IEG), and similar to the evaluation functions of other Development Partners. To make things as practical as possible, we have also created a decision tree (shown below) that synthesizes key questions we can ask ourselves as we think through various options.
Making Choices about Evaluation Design in times of COVID-19: A Decision Tree

1. Should we adapt our evaluation questions and scope?
One key question ahead of every evaluation is whether key decision makers within the organization that we are trying to influence (and other potential audiences) are likely to listen and be able to act on the evaluation’s findings. In times of COVID-19, institutional priorities and decision-makers’ knowledge and accountability needs are shifting. We must consider what evaluation scope or angle can bring the most value at this time.
We also need to consider whether the objectives and activities of an intervention are clear and mature enough for a meaningful assessment of their effectiveness. In a ‘business as usual’ scenario, the early stages of the COVID-19 response may not meet this threshold. As this is clearly not ‘business as usual’, we should consider tackling questions that are often understudied in our evaluations but would bring particularly useful evidence right now. For example, given the criticality of development partners’ coordination in tackling the pandemic and its aftermath, and given the fact that significant knowledge gaps persist on this issue, and also taking into consideration that we are more likely to have access to development partners than to beneficiaries, we can consider putting more emphasis on answering questions around coordination, coherence, and partnership management.
Even as we adapt and redirect our focus, another overarching question remains about feasibility. We need to consider whether we have the capacity or the resources to collect and analyze the data that are needed to respond to our evaluation questions of interest. This leads to a central question in times of COVID-19: Given the key practical and ethical constraints that influence our capacity to collect information, are we likely to generate well-substantiated evaluation findings?
2. Can we improve what remains feasible?
Programmatic evaluations tend to have at least two main level of analysis: the global level and the country level. At the global, we study patterns of regularity in the portfolio, assess the magnitude of efforts, classify and describe the types of interventions, take stock of the overall performance and build a basis for the ‘generalizability’ of our findings. We typically do this by conducting a thorough “Portfolio review and analysis” that consists of assembling a database of project and sector-level indicators as well as extracting and coding textual data from hundreds of project documents at design and completion stages. We also sometimes conduct statistical analysis using secondary datasets or carry out large surveys of agency staff. Many of these methods remain feasible in COVID-19 times. We have an opportunity to strengthen these methods, for instance by:
Frontloading (smart) desk reviews. Evaluators are increasingly reviewers and synthesizers of existing knowledge. Portfolio reviews, strategy reviews, structured academic and institutional literature reviews (including the use of existing knowledge repositories from 3ie, Campbell, Cochrane and others), the development of evidence gap maps, and so on, can be frontloaded and conducted in more rigorous and smart ways.
Strengthening theory-based content analysis. Sometimes we use a theory of change that makes explicit the key causal steps and assumptions regarding how an intervention is intended to work to guide portfolio analysis. In an iterative process, we extract information from project documents and in combination with existing literature develop a causal narrative on how these interventions work in practice. There is scope to apply this type of approach more widely and explore how the interplay between existing literature, project documents and an evolving causal framework can help us better understand how interventions contribute to development outcomes. This is also a timely opportunity to invest in a more multidisciplinary approach to theory-based evaluation. Involving different sector experts can help improve the analysis of more complex and innovative projects.
Experimenting with text analytics using Artificial Intelligence (AI). Machine Learning and other variations of AI can be applied to existing project documents to help delimit the evaluand (e.g. identifying a multisectoral portfolio of interventions under a specific theme), and conduct evaluative analysis showing the linkages among interventions, outcomes and contextual factors (e.g. extracting text information on the basis of a simple taxonomy or a more elaborate conceptual framework or theory of change). There are several factors to consider when experimenting with AI, including: (1) whether the nature of the intervention lends itself to it; (2) whether the size of the portfolio, the potential for replicability, or the possibility of generating new insight justifies the investment; (3) whether the conceptual framework is robust enough to enable a mix of supervised and unsupervised learning; (4) whether you have the right partnership between the evaluation team and the data scientists to create a robust platform for experimenting, adapting, and learning. For instance, at IEG we are currently piloting the use of AI for portfolio delimitation and theory-based content analysis on a range of interventions that can contribute to reducing stunting. At design, this pilot ticks all the boxes: the multisectoral approach to reducing stunting is a perfect candidate for testing the capacity of Machine Learning to go beyond the standard use of sector codes and indicators; the well-established theory of change provides a great training backbone for AI; and the IEG evaluators, with domain and methodological expertise, are teaming up with a consortium of data scientists versatile in several types of AI and knowledgeable about evaluation.
3. Can we find ways around what is infeasible?
During the pandemic, country-level evaluative analysis is likely to be the most impacted. Travel restrictions, shifting institutional priorities and institutional access (due to the former or imposed “lockdown” conditions), require rethinking the use of empirical methods such as conducting interviews, focus groups or direct observation in the country. The rich case studies that we tend to use to ground- truth other findings, to provide qualitative evidence of impact, and to explain the factors driving patterns that we observe at the portfolio level, are no longer possible. Can we find ways around this?
“Desk-based” case studies, including virtual (or phone) interviewing to collect data at the institutional level (for example among different groups of operations colleagues, ministries, sub-national government entities, or development partners), can be feasible. However, we would lose the possibility to meet certain stakeholders, in some cases it would lower the quality of interview data (e.g.by making it harder to build rapport with the interviewee, explore sensitive topics, or “read the air”), we would also lose the options of unobtrusive observation of projects/institutions, conduct inductive analysis on site, snowball sampling on site, and so on.
One way around this is to rely more heavily on the expertise of (local) consultants with the right substantive and contextual expertise in their respective countries. The use of in-country expertise (which is already a key aspect of the “business as usual” scenario) can become an even more essential building block of our evaluations. However, local consultants will need to follow health and safety guidelines and abide by ethical principles for reaching out to key informants. In addition to relying more heavily on local expertise as a short-term measure, there is an opportunity for a long-term investment in more context-sensitive evaluation, anchored in solid ethical principles. For example, faced with blanket travel restrictions, CLEAR in South Africa is training local consultants in conducting institutional M&E diagnoses virtually. If we think a bit ‘out of the box’, one could even argue that the current crisis presents an opportunity for strengthening country-led evaluation initiatives in favor of the currently crowded space of donor-driven evaluation.
When it comes to reaching out to front-line workers, local officials and administrators, and especially beneficiaries, we should use “an abundance of caution” as prescribed by our colleagues at 3ie. We should also follow best practices for phone surveys as laid out for example by colleagues at the Poverty Action Lab and the World Bank.
4. Can we tap into alternative sources of evidence?
For some evaluations, we also have an opportunity to capitalize on existing sources of data that we do not typically tap into, including geospatial, financial or social media data. While these data sources can be considered “big data” and could potentially serve evaluative analysis at the global level, this type of more in-depth analytical work is often not possible for all cases and may lead to bias and limitations in comparability. On the other hand, evaluators should take the advice of evaluation scholars such as Albert Hirschman and Ray Pawson seriously: one should (and can) ask big questions of “small-scale” interventions. These techniques are indeed often more suitable to be applied within the framework of a case study at the country level (or in nested case studies of interventions in countries). Although many of these data sources require some type of ground-truthing to strengthen the analysis, they can still help generate rigorous evaluative evidence in the absence of such triangulation.
Some recent examples in IEG include: the use of geospatial budgetary data from Boost and other sources to assess Bank Group targeting of its funding (in relation to national public spending) in the framework of the Mexico Country Program Evaluation; the use of drone imagery to assess land use patterns in rural communities in Niger in the framework of the ongoing evaluation on Bank Group support to reverse natural resources degradation; the use of satellite imagery data to assess the effectiveness of road improvements in Mozambique in the framework of the ongoing evaluation on Urban Spatial Growth, and the use of twitter data to assess the Bank Group’s influence in online debates on the Sustainable Development Goals in the framework of the evaluation of the Bank Group’s Global Convening.
Examples of tapping into new data for evaluative analysis abound and several areas of inquiry merit further attention from evaluators. Apart from text analytics and AI (discussed above) we briefly highlight two other areas. First, new sources of geospatial data can help link geolocation data of interventions (e.g. see the work by Aid Data to all sorts of new and existing geospatial data of evaluative interest. See also the example of the Geo-Enabling initiative for Monitoring and Supervision or GEMS , which aims to leverage field-appropriate and low-cost ICT for digital real-time data collection and analysis. Second, the use of social media data which, among many other things, allow us to gauge the sentiment of specific groups on particular topics where relevant segments of the population have high access to social media. Many social science disciplines have been working with social media analytics for some time and have useful guidance to share (e.g., here).
To conclude, as we think through these questions and the underlying options available to evaluators in the unprecedented circumstances created by COVID-19, we will have to navigate multiple trade-offs, put ethics front and center, be willing to get out of our methodological comfort zones, and be ready to try and fail.
image credit: Shutterstock/ vichie81