Back to cover

Machine Learning in Evaluative Synthesis


Faced with an ever-growing pool of evidence-rich text reports, evaluators are increasingly interested in extracting and synthesizing insights from these reports in a more efficient and reliable manner. A shift from manual identification and extraction of information to a more automated process is warranted in many cases, specifically in an institutional environment with a steady accumulation of reports that follow fairly standardized formats and types of content.

Three issues necessitate such a shift. First, manual categorization can be time consuming, which can limit evaluators to classifying either a smaller number of evaluation documents or a smaller number of factors and issues within the documents than they otherwise would. Second, differences among evaluators’ backgrounds and individual classification decisions can introduce inconsistencies in how insights of the same type are classified. These inconsistencies can result in potential over- or underestimation of the prevalence of certain factors and issues, introducing unintended differences in classification that bias the resulting output. Third, manual classification does not readily lend itself to updating existing data sets with new documents and inputs that might become available after the initial classification has been completed. Machine learning for text classification provides an intuitive solution to these problems.

The automation of information extraction and classification opens up exciting avenues for streamlining evaluative synthesis, enabling evaluators to render in seconds what would otherwise require hours or even days of labor-intensive manual identification and coding. Machine learning methods can accelerate content extraction, provided that practitioners train the extraction tool properly. In the context of the text analytics explored in this paper, machine learning involves a combination of unsupervised and supervised text-mining techniques that transform raw text data into a matrix of terms, which is then classified according to a taxonomy of issues pertinent to the analysis at hand. Integration of existing theoretical priors and evaluator experiences can ensure an appropriate balance between the granularity and generalizability of the insights extracted from project documents. Such an approach offers evaluators a powerful analytical tool for better understanding the various determinants of project success, potential challenges to project implementation, and practical lessons for future projects, among other matters.

Automated methods provide three main advantages over conventional approaches. First, they permit faster and more systematic analysis of a set of documents than manual coding alone can achieve. Machine learning does not invalidate systematic manual review; rather, automatic classification and extraction of knowledge can provide a first step to inform further analysis. Second, automated methods can place a larger quantity of relevant data at the disposal of evaluation officers, who can then draw insights from a broader set of inputs than would have been available had manual approaches alone been used. Third, once properly trained, classification algorithms can form the underlying infrastructure for real-time or just-in-time analysis to inform decision-making, whereas using a purely manual process would not produce the required analysis for weeks or even months. Such algorithms can allow faster and custom manipulation of elements included in analyses based on user needs. In fact, providing real-time insights (for example, to the chair of an investment review meeting) could be the next use for this approach. The integration of machine learning into evaluative synthesis would represent a relatively low-cost intervention that would provide economies of scale for both current and future evaluations. As an investment, the approach would offer a tool that can be reused and modified for future analyses.1 Machine learning can catalyze positive feedback loops, translating insights from identified project challenges into lessons that feed into project design and improve the quality of project implementation and project performance in the long term.

This paper builds on these points as follows. Chapter 1 provides an overview of machine learning and discusses relevant applications in the field of evaluation, briefly outlining previous work and potential future applications. In chapter 2, we use the case of the Finance and Private Sector Evaluation Unit of the Independent Evaluation Group as an example to illustrate the benefits of machine learning for text classification in evaluation. A summary of the results of this experiment and a brief discussion of potential next steps conclude the paper.

1 The diagnosis of delivery challenges, their rank-ordering by salience, and viable strategies for iterative amelioration of future projects are some examples of ways in which multiuse machine learning applications can be employed.