Conducting evaluations in times of COVID-19 (Coronavirus)

In these uncertain times, when a pandemic is affecting our professional and personal lives, we are faced with many “unknown unknowns”. There are risks and causal relationships that we have yet to imagine, let alone conceptualize. Yet, important decisions must be made now, with imperfect information. This is an uncomfortable space to be in. It is especially uncomfortable for those of us working in evaluation functions in the field of international development—where we typically have the luxury of time, operate within clear theories of change, and do our best work when we can mix methods, use multiple sources of data, and conduct in-depth interviews with a range of stakeholders. These circumstances push us to rethink how to go about our evaluative work. In this blog post we discuss three main challenges—ethical, conceptual, and methodological—that will affect our capacity to conduct evaluations. In a subsequent blog post we will discuss in more detail some methodological options that we can consider tapping into.

This discussion focuses on the evaluation context of (independent) evaluation functions established in international organizations or national ministries that mostly work on higher-level programmatic evaluations of sectoral, thematic, or country programs. It does not consider COVID-19 related interventions that are being designed, fast-tracked, and implemented right now—our evaluation colleagues in the field of humanitarian assistance and emergency responses are better placed to provide advice on this topic than we are. (Useful resources for this topic are available, for example, via the ALNAP library of resources on humanitarian evaluation, learning and performance.) We also do not focus on evaluative work that is embedded in projects and is relatively “close to the ground” in terms of collecting data in communities from citizens. The impact evaluation community has been quite proactive in discussing the ethical and methodological challenges of doing fieldwork in the framework of intervention-specific impact evaluations in times of COVID-19 (see, for example, advice on this subject from some of our colleagues in the World Bank). Programmatic evaluations face different kinds of challenges as they primarily collect data at the institutional level, for example amongst ministries, sub-national government entities, or development partners.

From an ethical point of view, evaluation work plans will inevitably need adjustments. Careful consideration of the risk-reward ratio of evaluation activities will become more pressing than usual. First and foremost, where evaluation is a key component for assessing whether and how public health interventions and other priority interventions (e.g. social protection and social safety nets for (poor) citizens) work, evaluators need to be in the fray not only to be able to collect the best possible data and conduct the best possible assessments to inform decision makers during the crisis, but also to substantiate critical debates that will take place once the crisis is winding down. Let us also remind ourselves that in several of the countries where our organizations operate, especially in fragile and conflict-affected states, COVID-19 is only one amongst many critical challenges they face. Critical operational and evaluative work should thus also continue. Extending the United Nations Programme Criticality Framework to our evaluation prioritization might be worth considering. Needless to say, if evaluation teams need to go out and collect data, necessary precautions should be taken to protect staff and respondents.

For other sectors and for other types of evaluative work (the majority of the programmatic evaluations conducted by evaluation functions fall into this category), evaluators should first and foremost be concerned about not putting unnecessary pressure on an overextended public system. The best approach would therefore be to reduce (and prioritize) direct interactions with operational colleagues working in crisis-related sectors, while stepping up efforts to conduct evaluative work differently given the constraints. A more subtle ethical consideration is to ask whether the necessary conditions can be met for evaluations to be useful—and used. Can we expect our audience to listen? Evaluators should not be tone-deaf and rely on well-oiled institutional mechanisms that regulate the production of evaluations to preserve “business as usual”.

Moving forward with evaluation work plans and evaluation design, a conceptual shift needs to take place. In a global pandemic of the proportions that we are currently experiencing, the effects reverberate well beyond the public health sector and the (in-)direct health effects of COVID-19 on citizens. The global pandemic, the associated government-imposed containment measures, and behavioral changes of businesses and citizens during the crisis may have significant and lasting effects on a whole range of issues that are of societal importance and the well-being of citizens, including the size and structure of the economy, employment rates, food security, poverty levels, and so on. Now and in the near future, evaluators will have to reflect on, and factor in, both the direct and indirect causal effects of the pandemic in any type of sector or any type of thematic issue that is subject to planned (and ongoing) evaluations. In addition, because many public resources are being diverted to the crisis, ongoing interventions may not be implemented as designed, as part of the resources may be diverted to addressing crisis-related needs. This has implications on how evaluators should look at such interventions.

From a methodological perspective, we identify four main challenges for evaluators.

The first challenge concerns the current restrictions on empirical data collection at the institutional level. Due to travel restrictions, shifting institutional priorities, and institutional access (as a result of imposed “lockdown” conditions) some of the key stakeholders are not available for interviews. As a result, evaluators may resort to convenience sampling and be prone to selection bias.

Second, any evaluative exercise will be significantly constrained due to the inability to conduct on-site data collection. Evaluators will struggle to develop a rich and contextualized perspective of the evaluand. Data collection strategies such as unobtrusive observation, building rapport with stakeholders (observing local customs and cultural norms), as well as all sorts of inductive inquiry (including in-situ snowball sampling of interviewees) will not be possible. Remote interviewing (by phone, teleconferencing) constitutes only a partial solution to this challenge. It will only partly alleviate the access problem and is prone to bias (especially when interviews cover complex or sensitive topics).

The third challenge is closely related to the former two—it is fair to say that even before COVID-19 some programmatic evaluations may have been subject to some sort of “central government bias” in their data collection. Depending on the nature of the evaluation, many interviews are likely to involve stakeholders in national government directly involved in the planning, financing and implementation of interventions. Interviews with stakeholders at lower levels of government (e.g. sub-national) and especially in rural areas may be more difficult to plan in the current circumstances, reinforcing the earlier stated bias.

Finally, a fourth challenge concerns unlocking the potential of desk-based review and analysis. Evaluators are increasingly reviewers and synthesizers of existing knowledge and data. Portfolio reviews, strategy reviews, structured academic, and institutional literature reviews (and so on), to the extent possible can be frontloaded and conducted in more rigorous and smart ways. This also goes for the analysis of existing data, both conventional (e.g. corporate data, survey data) as well as “big” data (e.g. imagery data, social media data). An example of the latter concerns the field of text analytics in combination with (un)supervised machine learning techniques, which offers new ways of conducting evaluative content analysis of existing textual information from documents and the internet.

In our next blog post, we will discuss some of these aspects in more detail and offer a framework that can help evaluators think through methodological issues when designing and conducting evaluations in the current circumstances.