Back to cover

The Rigor of Case-Based Causal Analysis

Chapter 1 | Designing for Causal Inference and Generalizability

How to Infer Causality

To a certain extent, the myths discussed in the introduction persist because of a lack of awareness of the various causal theories that underlie different techniques for and approaches to impact assessment (Cartwright 2022). Answering critical evaluation questions regarding what works in interventions, for whom, under what circumstances, how, and why (which is the crux of the impact evaluation enterprise) requires a combination of these techniques and approaches to causal inference. A brief recap is in order.

Traditional accounts of causality are based on the idea that cause and effect are regularly found together, and that causality can be identified by observing patterns of regularity (Befani 2012). In practice, methods of statistical causal modeling are used to check the presence of both intervention and outcome in a large number of cases, under the assumption that causal association grows stronger as the number of cases in which both are present increases. However, because it is impossible to consider all possible cases that would share only cause and effect (referred to as Mill’s method of agreement), researchers resort to an alternative solution: comparing cases that are by all accounts similar except for cause and effect (Mill’s method of difference, also known as counterfactual thinking), in which the causal inference concentrates on establishing a singular cause-effect relationship between an independent variable (an intervention) and an outcome (the dependent variable). The main causal question of interest focuses on the attribution of a marginal (net) effect to the intervention. Although the preferred methodology for establishing these types of causal relationships is (quasi-)experiments that seek to create a counterfactual situation (Befani 2012), thought experiments such as those in rapid impact assessments or the emerging application of virtual reality also involve counterfactual thinking (Rowe 2019; Gürerk et al. 2014). These techniques are best suited to measuring the average effect of an intervention on an outcome of interest. However, they often black-box the steps that logically link the intervention to the observed outcome.

An alternative causal theory, equally relevant to evaluation, focuses on causation as being, by its nature, attributable to multiple different but plausible combinations of factors. According to this theory, multiple causal pathways can lead to the same outcome, and a combination of factors are at play in each pathway. According to this conception of causality, most often an effect has no single cause; rather, a combination of causally relevant factors generates a particular outcome. Moreover, multiple combinations of factors may lead to the same outcome, and a given outcome may result from either the presence or absence of a particular factor or set of factors, depending on the context (Rihoux and Ragin 2009, 8). This theory is best leveraged to answer causal questions that are particularly salient in evaluation, such as, for whom and under what circumstances did a particular intervention work or not work? and What role did the intervention play, among other factors, in producing a particular outcome? Cross-case research and evaluation methods are best suited to answer these types of questions.

Finally, generative or mechanisms-based causal theories, inspired by scientific realism (Bhaskar 1975; Glennan 1996; Pawson 2013; Schmitt 2020), seek to identify the intervening causal process (or causal chain), made up of interlocking causal mechanisms, between an independent variable (the intervention) and the outcome of interest. The key causal question of interest in approaches based on these theories is, why and how has the intervention made a difference in the outcome of interest? Approaches such as process tracing, realist evaluation, contribution analysis, impact pathway analysis, and causal mediation have a comparative advantage in answering causal questions of this type (for more details, see Beach and Pedersen 2019; Befani 2012, 2021; and Raimondo and Beach, forthcoming).

In the last two of these approaches, the strength of the causal inference obtained depends critically on the specificity of the causal theory that is posited as underlying the intervention and explicit and detailed description of the theorized causal packages and mechanisms that explain why a particular intervention would contribute to a given outcome (Raimondo and Beach, forthcoming). For approaches of these two types, typical causal theories consisting of a few boxes and arrows do not suffice as support for strong causal inferences. Instead, explicitly laying out the underlying causal assumptions and mechanisms are fundamental parts of the design of such approaches, as the next section shows.

How to Generalize from Case-Based Evidence

Another myth that continues to inhibit the adoption of case-based methods involves the issue of generalizability of findings from case studies. A common argument against using case-based approaches in evaluation is that one cannot validly generalize from a single case. Yet as Flyvbjerg (2006, 219) argued early on, this argument “if not directly wrong, is so oversimplified as to be grossly misleading.” Woolcock (2013, 2022) and others argue that case studies have a comparative advantage in providing key facts necessary to determine a causal claim’s level of generalizability, especially in instances of complex interventions. Case studies help elucidate (i) the contextual factors that help explain whether an intervention that works in one instance will also work in another; (ii) the process mechanisms that help establish what parts of an intervention worked or proved to be broken; (iii) the ingredients needed for a particular intervention to work, including resources, skills, capacities, and laws; and (iv) an intervention’s trajectory of change, including how long it takes for change to materialize, progress, or deteriorate. In turn, Bennett’s (2022) answer to whether case studies generalize is that it depends on prior knowledge of causal mechanisms, understanding of populations of cases, and patterns in contextual factors that enable or disable causal mechanisms. Generalization here relies on the soundness and power of the causal explanation as opposed to laws of large numbers.

In addition, three principles can be usefully leveraged for determining whether findings emerging from case-based methods of causal analysis are generalizable. The first of these should inform the design of the methodology. Specifically, at the outset of an evaluation, what Rihoux and Ragin (2009) call an area of homogeneity must be defined to establish the boundaries and scope within which cases will be selected for analysis. Then, when selecting cases for inclusion in the analysis, the evaluator should strive to maximize variation within this particular homogeneity space. Cases included in the analysis must be sufficiently similar to one another to be comparable along certain dimensions (that is, they must have enough in common to be productively compared); as the saying goes, comparing apples and oranges is not useful. However, within this well-defined area of homogeneity, it is important that evaluators maximize the diversity of cases across the minimum established number of cases that can feasibly be studied given the time and resource constraints bounding the evaluation. This will enhance the potential for (modest) generalizability to cases that belong to the same population as the cases examined and that share sufficient contextual elements to enable their shared elements to be used to explain variations in outcome (for more details, see Rihoux and Ragin 2009, 21–25, and Bennett 2022).

Second, the strength of the generalizability of findings that emerge from case-based methods of causal analysis will depend on the patterns of convergence of evidence identified in the cases studied across different contexts (different countries, projects, and so on). If patterns are found to recur across cases within a sample, despite variability in the cases selected, the likelihood of finding similar patterns elsewhere increases.

Third, as explained by Rihoux and Ragin (2009, 12), a “good index of the quality of causal explanations could be precisely their ability to withstand refutation when confronted with new cases.” If patterns of findings in a particular causal analysis converge with existing evidence present in the related literature, for instance, or if they converge with one or more out-of-sample cases, then the credibility of claims regarding the generalizability of those findings increases.

These three considerations were central to the design of the Emission Reduction Purchase Agreement (ERPA) causal analysis conducted here, as the next section explains.

Overarching Design

The overarching design of the case-based causal analysis used in the current study followed the logic of theory testing that Trochim (1985, 1989) popularized as pattern matching. As figure 1.1 illustrates, pattern matching involves an attempt to connect two patterns: a theoretical pattern and an observed (or empirical) pattern. The bottom part of the figure shows the theoretical realm. In this study, the causal theory (or theory of change) originated from a review of the existing literature (consisting of both a structured literature review and a lighter review of specific themes related to carbon finance).1 These were supplemented with consultations with carbon finance experts and validation from the World Bank’s Carbon Finance Unit. The conceptualization task involved transforming the ideas that emerged from the literature review and consultations into a graphical representation, ultimately generating a set of propositions for each part of the theorized causal process.

The top part of the figure depicts the empirical realm. The empirical strategy used in the study applied two different case-based methods (with a comparative advantage in providing evidence for causal analysis) to the 16 ERPA cases. These consisted of a within-case causal analysis, following the logic of process tracing, and a cross-case causal analysis, following the logic of qualitative comparative analysis (QCA). For each ERPA case, the study team traced the contribution of the World Bank and other critical actors and variables throughout the process of intervention development, implementation, and follow-through. Data collection broadly included review of documents related to the intervention, field visits, and a series of interviews with key stakeholders engaged throughout the ERPA cycle and beyond. Patterns of convergence and divergence that emerged across cases were systematically analyzed using the logic of QCA, ultimately generating a robust empirical base.

The middle part of the figure represents the inferential task, which attempts to link or match the theoretical and empirical patterns. To the extent to which the theoretical and empirical patterns match one another, the posited causal theory is validated, and the same empirically observed pattern can be predicted to exist in cases similar to those studied. A causal analysis of this type uses techniques imbued with the idea that causality is complex (Cartwright 2004, 2007). These techniques accept the premise that different causal pathways, made up of different combinations of variables (or causal packages), can lead to the same outcome. This notion, known as multiple conjunctural causation in the literature (for example, De Meur and Rihoux 2002), considers causality as context- and conjuncture-specific and refutes a number of assumptions at the core of common statistical approaches to causal inference. Notably, the following are not assumed:

  • Linearity and permanent causality
  • Homogeneity of the unit of analysis
  • Additivity
  • Causal symmetry (Rihoux and Ragin 2009, 9)

In the current study, this inferential task was performed for each part of the causal process through QCA, with formalization using Boolean logic for a few links of the process in which causality was theorized to be particularly complex. The overall design of the analysis had four key elements: a detailed causal theory, a defensible case selection, identification of a select number of variables to be systematically scrutinized and investigated in the empirical inquiry, and a detailed plan for systematic collection of qualitative information within and across cases. The next section details these four elements.

Figure 1.1. Overall Evaluation Design: Pattern Matching

Image

Figure 1.1. Overall Evaluation Design: Pattern Matching

 

Source: Independent Evaluation Group.

Structure of the Causal Theory

The causal theory developed for the current study, composed of 15 causal steps, was constructed iteratively, drawing from a structured review of the literature on the design and effectiveness of carbon finance interventions and consultations with key experts within and outside the Bank Group. It diverges from more traditional representations of causal theories, which follow an input-activity-output-outcome logic. Instead, the causal theory’s structure encompasses three different elements, which the figure traces: the theory’s 15 causal steps, the World Bank’s expected contribution at each step, and assumptions about other contributing factors that mediate the causal relationship among the parts of the theorized causal process (figure 1.2).

Figure 1.2. Structure of the Causal Theory

Image

Figure 1.2. Structure of the Causal Theory

 

Source: Independent Evaluation Group.

The causal theory developed for the current study is rather complex and seeks to generate unique theoretical patterns. More complex theoretical patterns usually make it more difficult to construe sensible alternative patterns that would predict the same result (Befani 2021; Beach and Pedersen 2019). For each link in the theorized causal process, the causal theory makes explicit the Bank Group’s expected contribution and the specific causal assumptions that define the circumstances under which it is more or less likely that a particular causal process will occur.

Defensible Case Selection

Case selection is a critical part of designing case-based causal analysis and must be executed carefully to maximize the chances of both valid causal inference and (modest) generalizability of the analysis’s findings (Bennett 2022; Rihoux and Ragin 2009). Several considerations must be weighed carefully in choosing a case selection strategy. As explained earlier, cases must be selected within a specific area of homogeneity, so that they have enough in common to be compared and so that characteristics shared across cases can be used to explain the variabilities in outcomes. For that reason, in the current analysis, a most-similar-different-outcome selection strategy was applied first. In the current example, two major elements were homogeneous across ERPA cases selected for the study. First, the process of asset creation was almost identical across cases because the United Nations had codified it in the Clean Development Mechanism (CDM). To be eligible for World Bank support, projects in all of the ERPAs studied had to abide by a few rules and complete a number of steps to generate emission reduction credits. Second, the type of support, that is, the intervention itself, was rather homogeneous across cases. World Bank support consisted of advocating for carbon finance with governments and specific entities involved in ERPA-related projects, in providing technical assistance to those entities at various steps in the process of creating assets (in the form of greenhouse gas reductions that could be bought by high-polluting countries), and in promoting due diligence to ensure compliance with the CDM process. Additionally, cases representing different degrees of success in carbon finance were selected, relying on external databases that captured whether specific ERPAs had achieved their emission reduction targets and on a preliminary screening of the entire portfolio of World Bank–supported ERPAs. Moreover, to increase the likelihood that the findings from the analysis would have internal validity, multiple cases in the same country and involving the same category of technology were included.

But, potential generalizability of the findings relies in part on how well the cases included in the study represent the broader universe of World Bank ERPA interventions. Therefore, case selection in this study sought to reach a maximum degree of heterogeneity over a minimum number of cases. It was informed by a preliminary review of the World Bank’s entire portfolio of ERPAs. An additional consideration was the need to accommodate other components of the evaluation, notably the inclusion of country-level case studies for which the countries had already been selected (based on other relevant selection criteria). In cases that involved the constraints of preselected countries, the following additional selection criteria for ERPA cases were used:

  • Ensuring representation of the four primary categories of technologies used in ERPAs, with the objective that a case study selected for inclusion in the current study involved at least one case from each of four categories: afforestation or reforestation, hydropower, other (nonhydro) renewable energy, and waste management
  • Ensuring representation of various levels of country capacity for carbon finance, with the objective that a case study selected for inclusion in the current study involved countries with at least four different levels of country capacity
  • Ensuring representation of various levels of maturity of the CDM process and carbon market, with the objective that a case study selected for inclusion in the current study involved cases that spanned at least a 20-year horizon
  • Considering the need to keep the number of case studies selected manageable for in-depth analysis
  • Considering practical challenges for organizing data collections (for example, selecting among cases in China that were in geographic proximity)

As table 1.1 illustrates, the final case selection ensured that the unfolding of the causal process within the cases selected could be compared (i) within countries and across technologies; (ii) within technologies and across countries; and (iii) within technologies and within countries, across both positive and negative outcomes.

Table 1.1. Case Selection

Technology

Chile

China

Colombia

Ethiopia

Uganda

Total

Afforestation or reforestation

6

Hydropower

4

Other (nonhydro) renewable energy

3

Waste management

3

Source: Independent Evaluation Group.

Note: Dots represent presence of specific technologies in different country cases.

Contributory Factors Selection

In keeping with the set-theoretic research tradition and QCA in particular, the choice of variables (also known as conditions) for inclusion in the study was both theoretically and inductively informed, with insight gained from knowledge generated during three pilot cases, for identifying the key elements in the cases studied that needed to be considered. The imperative of avoiding the “many variables, few cases” dilemma, common in approaches of the type used here, also guided the selection of variables (Befani 2016; Rihoux and Ragin 2009).

Selection of variables for study proceeded in three stages: first, all possible explanatory variables and assumptions that could have influenced the likelihood a particular ERPA project would move from one step to the next along the theorized causal process were listed and embedded within the causal theory developed for the study. These variables and assumptions were then categorized into a broad typology of variables and assumptions recurrent in several parts of the causal process. Next, this typology was pilot tested for three cases in Chile. After completion of the pilot, the selection process for variables was revised and systematized, resulting in the ultimate choice of five variables for study, grouped into two main categories:

Contribution of key players in the process:

  1. Efficacy of Bank Group contribution (main intervention of interest)
  2. Capacity of project entities (implementing agency or project owners)
  3. Support of external players (government entities, third parties, trader associations, other donors)

Enabling environment:

  1. Conduciveness of policy environment (for example, regulations, other carbon-related policies, government subsidies)
  2. Conduciveness of market environment (for example, carbon market conditions)

Selection of this subset of variables increased the likelihood the study would identify the core elements of the causal mechanisms at work in the cases studied while preserving the parsimony required by the approach. As described in more detail later in the paper, case data collection was thus primarily deductive and involved trying to identify the presence or absence of the five variables selected for study. However, inductive inquiry was also incorporated, to ensure that additional explanatory variables that explained the outcomes of interest were not missed. All case authors were instructed to tease out additional explanations not included in the causal theory selected for the study. For instance, semistructured interviews were conducted, involving a sequential purchase of information approach, starting with broad open-ended questions on the factors that facilitated or hindered the process of asset creation and the outcomes of interest, followed by structured questioning on the subset of variables of interest.

Systematic Qualitative Data Collection

The robustness of the findings in studies like the current one depends greatly on the quality, consistency, and reliability of the data collected. The granularity and context specificity of the findings they generate make case-based approaches particularly useful. Evaluative case studies thus generally involve thick descriptions of the interventions studied, with rich examples. But, in the current study, the comparative nature of the approach and the relatively large sample size of 16 cases demanded a consistent approach to data collection across cases. To ensure the right balance between granularity of any resulting causal explanations and the consistency and reliability of the data collected across cases, the study team developed a structured case study template made up of a number of questions to be answered in a detailed case narrative and a matrix for synthesizing and structuring the qualitative data collected, as presented in figure 1.3. The template was used in the pilot in Chile and then refined based on the pilot experience.

For each case included in the study, investigators gathered evidence (that is, data) through reviewing project documents and conducting interviews with key stakeholders and site visits during a field mission lasting one to two weeks. Data collection involved eight local investigators and four case leaders. The case leaders identified and selected local investigators for the cases in the four remaining countries included in the study based on their knowledge of the CDM process and specific technologies used within each country. Both a methods expert and a study coordinator trained all investigators. The training sought to ensure investigators had a good understanding of the study objective and case study template and to advise them on how to conduct the tracing work involved in the investigative process, including how to identify the right stakeholders and documents to consult, how to look for “fingerprints” of the process, and how to judge the probative value of any evidence obtained (for example, seeking access to the full evidentiary record and gauging the trustworthiness of the sources; Beach and Pedersen 2019; Raimondo and Beach, forthcoming). The training was recorded and shared with the investigators for future reference during data collection.

Two levels of quality assurance were put in place to ensure the completeness, consistency, and accuracy of data collected. First, each case leader reviewed the work of investigators working on the same country. Second, the study coordinator checked all the cases for quality, comparing the level of evidence gathered across cases to ensure comparability.

Figure 1.3. Template for Qualitative Data Collection for Each Case

Image

Figure 1.3. Template for Qualitative Data Collection for Each Case

Source: Independent Evaluation Group.

  1. For a more detailed explanation of the differences between various forms of literature review, see the Independent Evaluation Group’s Methods Paper Series publication on conducting structured literature reviews in evaluation (Fenton Villar 2022).