Back to cover

Machine Learning in Evaluative Synthesis


This paper has discussed the advantages and challenges of using machine learning in evaluative synthesis; more specifically, it has looked at the identification and classification of project-level implementation factors and issues. Our analysis showed that with the right combination of manual and automated approaches, machine-learning-based information classification can lead to significant efficiency gains without the loss of accuracy in information extraction and classification. Indeed, the incorporation of quality control practices can even result in gains in accuracy in certain cases. We discussed the concrete experience of IEG’s Financial and Private Sector Micro Unit as a basis for a systematic discussion of this process. We first discussed the principles for generating a taxonomy for classification. We then applied a combination of unsupervised- and supervised-learning techniques to generate word clusters, keywords, and examples from evaluation documents as features for classification. These were integrated into a taxonomy and used to classify the features into multiple categories of factors and issues.

Following several rounds of cross-validation and calibration, we were able to achieve accuracy rates for classification comparable to those achieved by human coders in this field (about 70 percent accuracy) but at substantially higher levels of efficiency, because the model we designed can perform the classification task at a much faster rate than human coders. As expected, our model classified features into well-defined subcategories such as “legal or regulatory factors,” “political factors,” and “market pricing” with much higher accuracy (that is, fewer incorrect classifications) than it did into broader subcategories such as “commitment and motivation.” In instances in which we specified subcategories imprecisely, the model faced greater difficulties in converging on the correct subcategories into which to classify the features. Furthermore, the use of overly broad keywords also initially resulted in misclassification errors. Subsequent refinements to the model and inputs from subject experts helped improve the training data, enabling the model to efficiently generate more relevant tags for features it classified.

Currently, the output of our extraction and classification process is captured in a data visualization tool (based on Microsoft’s Tableau platform), which generates descriptive statistics on implementation factors and issues disaggregated by geographic area and private sector industry. In addition, the output is used for writing synthetic evaluative analyses. The inclusion of readily accessible and searchable parameters for factors and issues allows project practitioners in the Bank Group to observe commonalities and patterns across large numbers of successful and unsuccessful projects and disaggregate the output according to sectoral or regional factors where useful. Such a combination of features thus allows the model to be used to leverage decades of institutional experience in project implementation and apply it to both evaluative synthetic analysis and project design more efficiently and systematically than has been possible before.

As with any other form of analysis, the accuracy of our model’s results is contingent on the quantity and quality of inputted data, as well as the presence of adequate supervision and cross-validation. Given these conditions, automated parsing and tagging of project information shows promise as an intuitive improvement over a manual approach. The output from our taxonomy allows evaluators to access the entire universe of project insights from all available project evaluations and learn about salient factors influencing project performance. With future revisions and refinements to the taxonomy (particularly with the inclusion of more examples in the training set), the classification accuracy rates achieved by the model will continue to improve. Taken together, the gains in efficiency and benefits in regard to data accessibility that result from the use of machine learning techniques will allow evaluators and practitioners to better incorporate lessons from the past into future practice.