Back to cover

Machine Learning in Evaluative Synthesis


The analysis of the implementation challenges private sector projects face has traditionally involved manual identification and categorization of project documents by evaluation officers. An approach of this type offers nuance, but that nuance comes at a significant cost in terms of time and effort expended. The labor required to manually classify project performance parameters and assess the factors that explain why a particular project did (or did not) successfully achieve its intended development outcomes is both intensive and extensive and calls for a more efficient approach. Such an approach should take advantage of evaluators’ established experience in diagnosing critical challenges and impediments to project performance as well as recent advances in machine learning. These advances allow practitioners to overcome the challenges manual classification presents by extracting and classifying vast quantities of text in ways that would otherwise be prohibitively laborious. As a demonstration of this concept, we discuss the use of automated content analysis to identify and classify factors and issues commonly faced in the implementation of private sector projects, sorting them according to a curated taxonomy. We describe our approach, which started with the development of a taxonomy of project factors and issues identified by subject area experts. This subsequently provided the basis for employing a combination of machine learning algorithms to iteratively fine-tune the taxonomy. The factors and issues were then classified into 5 overarching categories and 51 subcategories. We show that once machine learning models are sufficiently well trained, they are able to correctly identify the majority of factors and issues under consideration in the taxonomy, including not only their probability of occurrence in a particular paragraph, but also whether those factors and issues affected a particular project positively or negatively. The experiment suggests new avenues for machine-assisted classification of large corpora of documents for use in portfolio analysis and evaluative synthesis.