Back to cover

Process Tracing Method in Program Evaluation

Chapter 3 | Learning Lessons from Process-Tracing Case Studies

A process-tracing evaluation can stand alone, with the findings confined to the bounds of the case studied. If the evaluators can find relatively strong confirmatory evidence for a disaggregated pToC, the process-tracing evaluation has relatively high internal validity. However, depending on the purpose of an evaluation, it can be relevant to ask whether the lessons learned from a process-tracing evaluation are externally valid—that is, can these lessons tell us anything about how similar contributions might be produced in other programs or policies found in similar or different contexts? One way evaluators can achieve this type of external validity is through conducting two or more process-tracing case studies of similar interventions in parallel, using the emerging pToC for each as a source of the parameters for a process-level comparison. If the evaluators find similar processes in the cases being compared, this constitutes evidence that the pToC can be applied more broadly, at least to the cases studied. Assessing similarities and differences in relevant contextual conditions between the cases can enable the evaluators to make further generalizations—albeit cautious ones—to other cases in similar contexts (Camacho and Beach 2023).

If evaluators are able to study only one case, they can still use their findings to make cautious generalizations to other cases. To determine whether an intervention that worked in one instance might also work in others, they must gather a number of pieces of information relating to various questions. Process tracing has a comparative advantage in providing answers to these questions (Woolcock 2022). For example, the Abdul Latif Jameel Poverty Action Lab at the Massachusetts Institute of Technology employs a generalizability framework for its impact evaluations that seeks to explain (i) the detailed theory behind a type of intervention drawn from particular evaluation case studies, (ii) what local conditions must hold for the theory to apply, (iii) how strong the evidence is for the hypothesized behavioral change, and (iv) the evidence that the implementation process for the intervention can be carried out well (Bates and Glennerster 2017). Process tracing can help evaluators answer all these questions, but what broader lessons can be drawn from a process-tracing evaluation of an intervention depends on a number of considerations: (i) case selection, (ii) the extent to which the inner workings of the intervention are typical or unique in a given context, and (iii) the extent to which the process tracing has unpacked the contextual factors (key facts) that play out in the case studied.

Consideration 1: Case Selection

If evaluators have chosen a typical (that is, representative) case to examine, based on comparisons with other cases, then their findings provide some evidence that the intervention studied might work in similar ways in other cases. However, an individual process-tracing case study provides evidence only of how an intervention worked within the case studied. This means that without some form of processual comparison across multiple cases, we would not know whether a selected case was “typical” or not. Another strategy can be to select cases based on a comparative assessment of the contribution involved, with evaluators selecting either “success” or “failure” cases.

We selected our case as a potential successful case based on preliminary knowledge of the extent to which World Bank work had influenced the client government’s developmental policy. We also selected it because of its potential to drive lesson learning, given that it can be potentially considered a typical case of the World Bank’s use of its core diagnostic work paired with policy dialogue and convening, which can be used to inform major reforms in middle-income countries. Confirming that the case was actually “typical” would require comparison with how similar interventions took place in other cases in similar contexts.

If evaluators study no additional cases using process tracing, they cannot make a firm evidence-based conclusion that the intervention they have studied will work in similar ways in other cases. For instance, although a case might look similar to other cases based on the intervention involved and its potential contribution to the results, the cases might have different inner workings, or the intervention might play out differently under different circumstances. When dealing with complex interventions whose workings depend very much on context, it can therefore be risky to assume that the interventions will work in similar ways in cases other than those studied (Woolcock 2022). A single case study can, however, act as a plausibility probe in relation to a type of intervention by providing evidence that it at least worked in one case (Falleti and Lynch 2009; Woolcock 2022), making it more plausible that the intervention studied might work in other cases, subject to how sensitive it is to even slight differences in context.

Consideration 2: Details of the Inner Workings of the Intervention

For all but the simplest of interventions, how things work in practice will differ to some degree in different cases. How then can an evaluation conclude that processes were similar enough in two or more cases to confirm that the intervention will work in a similar way in other cases? A good pToC should enable evaluators to make what can be termed “process comparisons” in which they use the pToC’s description of actors and actions and causal principles for key episodes to assess whether the actors and the particular actions they performed in a given case were functionally equivalent to what other actors were doing in another case.

In the example IEG evaluation, we tested (lightly), in other cases within the framework of the World Bank’s country program evaluation, the lessons emerging from process tracing about the actors and actions and causal principles present in the case. We did this to check whether we could find functionally equivalent instances that had created similar results. We found, for example, that leveraging good data optics in the form of benchmarking and relative rankings as a very effective way of getting World Bank diagnostics noticed operated similarly in no less than four other cases. Similarly, we determined that in-depth and prolonged engagement on finding the right framing for reforms that speak hard truths while avoiding red lines operated similarly in several other cases within the same country across sectors and types of diagnostic. As a third example, we found that the use and cultivation of domestic champions of reforms and solutions proposed by the World Bank worked in similar ways in other cases within the same country.

Here, process tracing was useful in unearthing how various mechanisms played out in more detail in a case and in helping name and identify the contextual conditions under which they played out. It was then much easier to evaluate whether our pToC worked in a similar way in other instances and whether it yielded similar outcomes. Process tracing also revealed key lessons on how to draft a tailored knowledge product that questions the status quo without being so controversial that decision makers will not hear the message—for example, the need to engage both strategically and tactically with key stakeholders over a longer period than was typical when drafting knowledge products to build trust and rapport. The engagement required astute understanding of the client country’s political economy, notably, a thorough reading of the stakes and power of various individuals and how to choose and cultivate relationships with champions.

Consideration 3: Knowledge of Contextual Factors

The crux of the external validity challenge in process tracing is the following: When can generalizations about how things might work be made to other cases that have not been directly studied? Even when other cases are not examined, looking at key episodes sheds light on contextual conditions that might be required for the actions and links in a particular pToC to function. Contextual conditions are factors that determine whether particular pathways within key episodes function in causally analogous ways in other cases. Making the contextual conditions and the inner workings of interventions explicit enables readers of an evaluation to determine whether these contextual conditions are also present in other cases of interest to them.

In the example IEG evaluation, the relevant contextual conditions operated at various levels: macro, meso, and micro. At the macro level, client country authorities had a window of opportunity to rethink the development framework within which the country had operated for multiple decades because of a noticeable decline in the country’s growth despite continued high levels of public investment. Lingering episodes of civil discontent that put a certain amount of pressure on the authorities to envision an alternative also contributed to opening up this opportunity. At the meso level, a long-standing trusted partnership between the country’s authorities and the World Bank, as well as a tradition of “challenging each other to learn together” that had been built through decades of engagement, enabled the World Bank to have an impact through its analytical and data work. At the micro level, the inner workings that the World Bank team activated (long-term engagement on one knowledge product, cultivation of champions, and co-creation of a diagnostic framework, among others) were possible only because of a high degree of project autonomy granted by World Bank management and because of the entrepreneurial attitude of the project’s team leader.

An evaluation can also probe a population of cases in a larger class (for example, all low-capacity states) more systematically using follow-up studies that trace only the most critical episodes of a pToC. Evaluators can use comparative methods, such as qualitative comparative analysis, to map key similarities and differences in contextual and other conditions across cases, enabling them to select diverse cases strategically within a population of potential cases (Raimondo 2023).

Although each of these more superficial follow-up studies may have relatively low internal validity, finding confirmatory evidence for how things work across a strategically selected set of cases can increase evaluators’ confidence in the external validity of the pToC employed to study them (Beach and Pedersen 2019). If different processes are found to be operative in a particular case, evaluators should then compare it with the case previously studied to identify what differences might explain why things worked differently in the two cases.