In our previous blog we laid out a number of ethical, conceptual, and methodological challenges that are affecting our programmatic evaluation work during the pandemic, with the promise to share a framework to think  through some of these challenges. This framework is organized around four questions and multiple decision points as we work our way through a fairly standard programmatic evaluation design (around a particular theme, corporate issue or sectoral area of work) as applied at the Independent Evaluation Group (IEG), and similar to the evaluation functions of other Development Partners. To make things as practical as possible, we have also created a decision tree (shown below) that synthesizes key questions we can ask ourselves as we think through various options.

Making Choices about Evaluation Design in times of COVID-19: A Decision Tree

(enlarge & download as a PDF)


1. Should we adapt our evaluation questions and scope?

One key question ahead of every evaluation is whether key decision makers within the organization that we are trying to influence (and other potential audiences) are likely to listen and be able to act on the evaluation’s findings. In times of COVID-19, institutional priorities and decision-makers’ knowledge and accountability needs are shifting. We must consider what evaluation scope or angle can bring the most value at this time.

We also need to consider whether the objectives and activities of an intervention are clear and mature enough for a meaningful assessment of their effectiveness. In a ‘business as usual’ scenario, the early stages of the COVID-19 response may not meet this threshold. As this is clearly not ‘business as usual’, we should consider tackling questions that are often understudied in our evaluations but would bring particularly useful evidence right now. For example, given the criticality of development partners’ coordination in tackling the pandemic and its aftermath, and given the fact that significant knowledge gaps persist on this issue, and also taking into consideration that we are more likely to have access to development partners than to beneficiaries, we can consider putting more emphasis on answering questions around coordination, coherence, and partnership management.

Even as we adapt and redirect our focus, another overarching question remains about feasibility. We need to consider whether we have the capacity or the resources to collect and analyze the data that are needed to respond to our evaluation questions of interest.  This leads to a central question in times of COVID-19: Given the key practical and ethical constraints that influence our capacity to collect information, are we likely to generate well-substantiated evaluation findings?

2. Can we improve what remains feasible?

Programmatic evaluations tend to have at least two main level of analysis: the global level and the country level. At the global, we study patterns of regularity in the portfolio, assess the magnitude of efforts, classify and describe the types of interventions, take stock of the overall performance and build a basis for the ‘generalizability’ of our findings. We typically do this by conducting a thorough “Portfolio review and analysis” that consists of assembling a database of project and sector-level indicators as well as extracting and coding textual data from hundreds of project documents at design and completion stages. We also sometimes conduct statistical analysis using secondary datasets or carry out large surveys of agency staff. Many of these methods remain feasible in COVID-19 times. We have an opportunity to strengthen these methods, for instance by:  

Frontloading (smart) desk reviews. Evaluators are increasingly reviewers and synthesizers of existing knowledge. Portfolio reviews, strategy reviews, structured academic and institutional literature reviews (including the use of existing knowledge repositories from 3ie, Campbell, Cochrane and others), the development of evidence gap maps, and so on, can be frontloaded and conducted in more rigorous and smart ways.

Strengthening theory-based content analysis. Sometimes we use a theory of change that makes explicit the key causal steps and assumptions regarding how an intervention is intended to work to guide portfolio analysis. In an iterative process, we extract information from project documents and in combination with existing literature develop a causal narrative on how these interventions work in practice. There is scope to apply this type of approach more widely and explore how the interplay between existing literature, project documents and an evolving causal framework can help us better understand how interventions contribute to development outcomes. This is also a timely opportunity to invest in a more multidisciplinary approach to theory-based evaluation. Involving different sector experts can help improve the analysis of more complex and innovative projects.

Experimenting with text analytics using Artificial Intelligence (AI). Machine Learning and other variations of AI can be applied to existing project documents to help delimit the evaluand (e.g. identifying a multisectoral portfolio of interventions under a specific theme), and conduct evaluative analysis showing the linkages among interventions, outcomes and contextual factors (e.g. extracting text information on the basis of a simple taxonomy or a more elaborate conceptual framework or theory of change). There are several factors to consider when experimenting with AI, including: (1) whether the nature of the intervention lends itself to it; (2) whether the size of the portfolio, the potential for replicability, or the possibility of generating new insight justifies the investment; (3) whether the conceptual framework is robust enough to enable a mix of supervised and unsupervised learning; (4) whether you have the right partnership between the evaluation team and the data scientists to create a robust platform for experimenting, adapting, and learning. For instance, at IEG we are currently piloting the use of AI for portfolio delimitation and theory-based content analysis on a range of interventions that can contribute to reducing stunting. At design, this pilot ticks all the boxes: the multisectoral approach to reducing stunting is a perfect candidate for testing the capacity of Machine Learning to go beyond the standard use of sector codes and indicators; the well-established theory of change provides a great training backbone for AI; and the IEG evaluators, with domain and methodological expertise, are teaming up with a consortium of data scientists versatile in several types of AI and knowledgeable about evaluation. 

3. Can we find ways around what is infeasible?

During the pandemic, country-level evaluative analysis is likely to be the most impacted. Travel restrictions, shifting institutional priorities and institutional access (due to the former or imposed “lockdown” conditions), require rethinking the use of empirical methods such as conducting interviews, focus groups or direct observation in the country. The rich case studies that we tend to use to ground- truth other findings, to provide qualitative evidence of impact, and to explain the factors driving patterns that we observe at the portfolio level, are no longer possible. Can we find ways around this?

“Desk-based” case studies, including virtual (or phone) interviewing to collect data at the institutional level (for example among different groups of operations colleagues, ministries, sub-national government entities, or development partners), can be feasible. However, we would lose the possibility to meet certain stakeholders, in some cases it would lower the quality of interview data ( making it harder to build rapport with the interviewee, explore sensitive topics, or “read the air”), we would also lose the options of unobtrusive observation of projects/institutions, conduct inductive analysis on site, snowball sampling on site, and so on.

One way around this is to rely more heavily on the expertise of (local) consultants with the right substantive and contextual expertise in their respective countries. The use of in-country expertise (which is already a key aspect of the “business as usual” scenario) can become an even more essential building block of our evaluations. However, local consultants will need to follow health and safety guidelines and abide by ethical principles for reaching out to key informants. In addition to relying more heavily on local expertise as a short-term measure, there is an opportunity for a long-term investment in more context-sensitive evaluation, anchored in solid ethical principles. For example, faced with blanket travel restrictions, CLEAR in South Africa is training local consultants in conducting institutional M&E diagnoses virtually. If we think a bit ‘out of the box’, one could even argue that the current crisis presents an opportunity for strengthening country-led evaluation initiatives in favor of the currently crowded space of donor-driven evaluation.

When it comes to reaching out to front-line workers, local officials and administrators, and especially beneficiaries, we should use “an abundance of caution” as prescribed by our colleagues at 3ie. We should also follow best practices for phone surveys as laid out for example by colleagues at the Poverty Action Lab and the World Bank.

4. Can we tap into alternative sources of evidence?

For some evaluations, we also have an opportunity to capitalize on existing sources of data that we do not typically tap into, including geospatial, financial or social media data. While these data sources can be considered “big data” and could potentially serve evaluative analysis at the global level, this type of more in-depth analytical work is often not possible for all cases and may lead to bias and limitations in comparability.  On the other hand, evaluators should take the advice of evaluation scholars such as Albert Hirschman and Ray Pawson seriously: one should (and can) ask big questions of “small-scale” interventions. These techniques are indeed often more suitable to be applied within the framework of a case study at the country level (or in nested case studies of interventions in countries). Although many of these data sources require some type of ground-truthing to strengthen the analysis, they can still help generate rigorous evaluative evidence in the absence of such triangulation.

Some recent examples in IEG include: the use of geospatial budgetary data from Boost and other sources to assess Bank Group targeting of its funding (in relation to national public spending) in the framework of the Mexico Country Program Evaluation; the use of drone imagery to assess land use patterns in rural communities in Niger in the framework of the ongoing evaluation on Bank Group support to reverse natural resources degradation; the use of satellite imagery data to assess the effectiveness of road improvements in Mozambique in the framework of the ongoing evaluation on Urban Spatial Growth, and the use of twitter data to assess the Bank Group’s influence in online debates on the Sustainable Development Goals in the framework of the evaluation of the Bank Group’s Global Convening.

Examples of tapping into new data for evaluative analysis abound and several areas of inquiry merit further attention from evaluators. Apart from text analytics and AI (discussed above) we briefly highlight two other areas. First, new sources of geospatial data can help  link geolocation data of interventions (e.g. see the work by Aid Data to all sorts of new and existing geospatial data of evaluative interest. See also the example of the Geo-Enabling initiative for Monitoring and Supervision or GEMS , which aims to leverage field-appropriate and low-cost ICT for digital real-time data collection and analysis. Second, the use of social media data which, among many other things, allow us to gauge the sentiment of specific groups on particular topics where relevant segments of the population have high access to social media. Many social science disciplines have been working with social media analytics for some time and have useful guidance to share (e.g., here).

To conclude, as we think through these questions and the underlying options available to evaluators in the unprecedented circumstances created by COVID-19, we will have to navigate multiple trade-offs, put ethics front and center, be willing to get out of our methodological comfort zones, and be ready to try and fail.


image credit: Shutterstock/ vichie81



Thanks for putting up this thoughtful blog which benefits many evaluators like me across globe to think out of box and adapt our evaluation and reviews in the pandemic situation like now.


This is very helpful from a technical point of view and points to the immense tools well-resourced evaluators have at their disposal. This article does however miss the human element in some ways from the perspective of stakeholders who need to be engaged in any evaluation process. In the midst of the pandemic many 'frontline' public service or NGO counterparts will be otherwise occupied with responding to the pandemic with limited resources, time and may even be ill themselves. Many programme beneficiaries will be preoccupied with their own health, wellbeing and economic survival and may less than willing to engage with either local or overseas evaluation consultants, even remotely where possible. From an empathetic point of view, is it appropriate to engage with stakeholders at this time at all for an exercise where there is no immediate benefit for them? Technological approaches are great; however I think additional 'stakeholder-focused' questions need to be asked where the "consider postponing' option is concerned.


Apparently the work assumed the situation before Covid-19 as normal and acceptable. When we take the progress achieved in getting the objectives of the 2030 Agenda for SDGs, the related evaluations have shown poor results that anticipated the non accomplishment of the goals. The extent to which this contest is shared, when we are going to evaluate we should assume that the Covid-19 crisis exploded in the middle of an ongoing crisis of the financial capitalism. Although this theme wasn't the purpose of the study, it was supposed to be mentioned, because under the circumstances we are going to build something on a wrong basis. Take for instance the Desk-based work that will be carried out taking from the financial narrative that in my view is HOMOLOGATED to the standard of the Establishment, but no loger accepted by both ordinary people and professionals, entrepreneurs. At the end of the day, an unwritten but a vital question is whether we have to re-start or to reconstruct.
In my opinion, the post-Corona Virus is un expected and good opportunity to review and revise the ways to get the development goals. On the matter a useful reference is the theory of change as we have discussed in THE THEORY OF CHANGE APPLIED TO FINANCE FOR DEVELOPMENT http://reader.ilmiolibro.kata


Referring to above comment, I have been told: what to do? Neither a revolution nor change the rules of the finance game, but to review and revise the approach to market by the Big Financial Actors, which will be followed by the other financial intermediaries. The above-mentioned book provide the algorithm (Development Model via Business Approach) to to operate, which can be summarised as follows: Re-examining the development finance agencies’ approach because having changed the development objectives without updating the tools box to achieve them have left things as before. The promotion of Private Sector means also to face the capitalists’ risk-adverse to invest in the real economy, which seems to be an important factor in Africa: “African investors remain risk-averse” and …. the funding requests lack speaking the language required for investment”… . How to promote capitalists’ investment behaviour? For example, taking the IFC Programme for MENA Countries, it means either : (1) getting them into a development scheme (learning by doing), or (2) providing the seed capital to set up a National Fund in each Country.


One thing the COVID Pandemic has done is to enable us explore areas we might not have explored before, as we try to navigate how evaluation can be conducted effectively. While the post explored various ways data/analysis can be conducted to generate results at these times, I am curious to find out how evaluation methodologies, such as randomized assignment, Difference in Difference or regression discontinuity can be adapted to the various methods being highlighted above. Also, I believe the use of on-ground evaluators will strengthen evaluation activities in individual countries. This approach will lead to more researchers being trained to conduct in-country evaluations due to travel restrictions in some countries.

Add new comment

Restricted HTML

  • Allowed HTML tags: <a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h2 id> <h3 id> <h4 id> <h5 id> <h6 id>
  • Lines and paragraphs break automatically.
  • Web page addresses and email addresses turn into links automatically.