Brenda Barbour: Data are vital for understanding the progress and impact of development strategies. New technologies coupled with increased computing power are creating opportunities for gathering and analyzing ever larger amounts of data from a greater range of sources. In addition, remote data collection has played an important role in getting around the restrictions put in place to prevent the spread of the coronavirus. But innovative use of technology began before the pandemic.
Welcome to What Have We Learned? The Evaluation Podcast. I am your host, Brenda Barbour of the World Bank group's independent evaluation arm, IEG.
In this episode, we will explore the influence of new technologies on data gathering and use and its implications for learning from evidence and what this may mean for evaluating development. My guest is Jos Vaessen, IEG's methods advisor. Jos has been working on evaluation research for over two decades. Since 2016, he has been leading IEG's methods advisory function and has presented and taught evaluation research in global development in a range of national and global fora.
Welcome, Jos, to What Have We Learned?
Jos Vaessen:Thank you, Brenda.
BB: Jos, Could you speak to how data can help in generating actionable and relevant evaluative insights?
JV: What we see in recent years as you are very much aware of is that there has been really an increase in computational capacity. There has been availability of new algorithms and techniques, the field of data science, and there has been an increased availability of all sorts of new types of data such as high frequency, satellite imagery data, financial transaction data, core records data and so on. And this really has opened up a whole new opportunity space for evaluation to broaden and deepen its analysis, but also in some ways to increase the rigor of evaluative analysis.
BB: How do you make sure that you're using that innovative approach to really answer a new or difficult question versus there are skeptics who might think, well, we're just applying these approaches because it's fun to apply these approaches. So how do you determine when it's the right time to apply such an approach?
JV: Yeah, that's a very good question. Now I think when these data are becoming available and when there is a general sentiment that it's flashy or important to use these new data, there's very much of, let's say a temptation to fall into the trap of what we would call a data-driven approach. And what we really need is to avoid that and to ensure that we remain very firmly wedded to what I would call a questions-driven approach where are evaluation questions and also the conceptual frameworks and the theories of change that we use, that they really help us decide on how and where particular data come in. So we ensure that we respond to the questions that we set out to ask and also substantiate those findings and associated recommendations that really matter to our audiences.
BB: Please tell us about how IEG uses data in its evaluations. What are some of the more innovative ways to collect and analyze and use data?
JV: Evaluation is applied policy oriented research under time, budget, data and institutional constraints. So in other words, when innovating we really have to be careful and efficient in what data we collect and analyze and how these data can help us to respond to evaluative questions.
We need it to be efficient so we can do the work within the time and resource constraints of the exercise and we need it also to be rigorous so in the end we can have confidence in the findings and we can also defend our findings vis-à-vis various audiences, and the findings can withstand the test of scrutiny. Now this is quite a difficult equilibrium to attain. And the methods team I think has always tried to help evaluation teams find that balance.
It has helped evaluation teams think through their evaluation design and offered methodological solutions that we have found to work in the context of other evaluations in IEG or outside of IEG. Now, once the evaluation team has defined the scope and sharpened the evaluation questions, what usually follows or should follow is a more detailed dialogue around the choice and use of particular methods and in a way that choice and use should reflect a good fit between relevance of methods and efficiency and rigor and data collection and analysis.
So in that dialogue between methods team and evaluation teams, the methods team has always encouraged teams to broaden the use of methods and data in the different ways. One way is to make a distinction between on the one hand methods that change the way we do things in a more fundamental way across evaluations and on the other hand, something that we could call boutique innovations.
BB: What's that? Tell me what a boutique innovation is.
JV: A boutique innovation is something that I would say concerns the use of specific methods, whether they are innovative from the perspective of IEG or even innovative for the evaluation field as a whole. And they can help us address a specific question in more depth for more detail usually. And these boutique innovations, they are not necessarily part and parcel of all evaluations, but really can help strengthen the rigor or depth of analysis in particular cases. For example, the use of process tracing, which we have used among other things to understand how citizen engagement in the World Bank Group works in a particular context or the use of sentiment analysis, which we have used to understand citizen's perceptions regarding the World Bank Group as a whole or World Bank Group supported interventions more specifically.
Boutique innovations are slightly in contrast with what one could call more fundamental ways we have been changing our evaluative practice, which is a more broader wave of innovation if you like. So examples of that would be the use of more sophisticated text analytics as well as under certain conditions use of machine learning in the process of portfolio identification. Another example, which I think has changed evaluative practice cross IEG more broadly to some extent is the use of computer-assisted qualitative data analysis software, such as NVivo, which really helps us to organize and analyze our qualitative data in a much more systematic manner. So I think that distinction is an important one. And I think we have been able to make progress on both of these types of innovation.
BB:So on the boutique type, when you're doing sentiment analysis for example, what are the data sources? Because I assume those are data sources outside of the World Bank Group. And then for the machine learning, can you tell us more about a specific example of an evaluation that used machine learning and how they used it and what was the result?
JV: Sentiment analysis, we have used in a number of evaluations. We usually apply it to understand the perceptions of particular audiences with regard to the World Bank Group, particular intervention supported by the World Bank Group, or particular topics, particular areas of institutional reform, for example. To do that analysis, we often use data from social media platforms such as Twitter data, or we can also use data from, let's say more conventional media platforms. And recently we acquired the license of a platform called Talkwalker, which allows us to do this work more integrally.
BB: And on machine learning, can you give us an example of when you've used that?
JV: Yes. For example, the World Bank Group support to combating malnutrition. So we really have to try to understand what interventions, in what ways are addressing this problem. the evaluation team that worked on this topic really did an amazing job in developing and applying a very clear conceptual framework and translating that conceptual framework into a systematic approach of identifying lending and non-lending work under this topic.
They used exploratory techniques like unsupervised machine learning techniques to try to find association between words or clusters of words, to try to see where can we find text around these topics. And they also used more of a supervised machine learning approach where basically on the basis of a clear taxonomy of words, they gradually trained an algorithm to identify the right types of projects that fit under this framework that they developed. So there really was an elaborate exercise of developing relevant work of the Bank Group under this topic.
BB: How has artificial intelligence changed what we can get out of data?
JV: I see a number of potential benefits of using new data science applications, including what is generally referred to as artificial intelligence. The first aspect is efficiency. We are increasingly dealing with big data, very large data sets of numerical or textual data. Such data sets, we cannot analyze manually anymore. We really need to have something like a machine learning algorithm to help us go through these data and extract and classify knowledge that is useful for evaluation purposes.
So by training an algorithm, we can extract the right type of knowledge that we can use for evaluative purposes in a way that is much more efficient than if we do it manually.
A second potential benefit would be the validity of findings or the accuracy of the analysis. In many cases, a well-trained machine learning algorithm can extract data more consistently from a large dataset than what would be possible if we would do it manually.
A final benefit would be the potential for data science techniques to help us enhance the breadth of analysis. We can now access imagery data much more easily than before. We have computational capacity to analyze these data. So imagine that we have a data set with millions of pixels. We can now use machine learning algorithms to classify these pixels into particular categories of meaning, for example, different land use patterns. And we can then use these data to actually measure changes over time and changes across different geographical places. And that can really help us to respond to evaluative questions.
BB: That makes sense. I want to turn my attention to the COVID-19 pandemic because I imagine the restrictions on travel and the requirements for social distancing have amplified the need for innovative data collection. How have evaluation teams had to modify their research design to conduct some of their field work virtually or creatively reimagine their data collection?
JV: So basically the issue is as a result of the COVID-19 pandemic, it was no longer possible to directly collect data in the countries where the World Bank Group is active.
We lost the advantage of direct face-to-face interviewing. We lost the possibility of direct observation, building rapport with stakeholders to really develop a rich understanding of the complexity, the institutional and intervention complexity of the World Bank Group's work. All of this created the gap that we as evaluators had to deal with. we try to deal with this gap in at least three ways. First of all, by conducting remote interviews.
There have been disadvantages as well as advantages to conducting remote interviews. Sometimes remote interviews can enhance access to certain stakeholders. In other cases, stakeholders do not have access to the internet or do not respond to our outreach attempts. In some cases, remote interviews can lead to candid and safe conversations. In other cases, they really don't. So the overall balance in terms of sampling and response bias is not really entirely clear and really depends on the context.
The second aspect of dealing with the COVID-19 data collection constraints is really about conducting desk reviews more smartly. We just talked about portfolio identification and how to do that more smartly. Now we have all also had quite some experience with using text analytics and machine learning, not just for portfolio identification, but also for content analysis. And that was very interesting. And now we are also focusing a little bit more on conducting evaluative synthesis a little bit more systematically.
And then finally we have been tapping into new data sources and we have been using conventional statistics, but also new data science applications to analyze these data. So in some cases, this has helped us to answer evaluative questions more efficiently, but also with rigor and with depth. To give you an example, we can use imagery data now as proxies for some outcome variables of World Bank Group interventions. And we have done that in a number of examples. One was a particular exercise of a geospatial impact evaluation of World Bank supported road improvements in Maputo, which was a case study for the urban spatial growth evaluation. And here we used imagery data to estimate changes in the buildup environment.
In addition to that, we also used imagery data to construct a spatial counterfactual. Basically we delineated intervention zones and similar comparison zones. And by using the spatial counterfactual model in combination with a more conventional statistical counterfactual analysis, we were able to estimate the effect of road improvements on economic activity in the area.
I think an overall conclusion to draw from this whole process of methodological adaptation in the times of COVID-19 is basically twofold. first of all, there is really nothing that can fully substitute data collection and analysis on the ground. In some cases we really need to be on the ground talking, observing, triangulating and learning very fast about context specific complexity of how the Bank Group operates in partnership with clients and other partners and how this leads to change.
At the same time, we have come a long way in discovering that we can use existing data, including existing evaluative evidence, for example, through the use of synthesis. That we can use those data much more efficiently and much more effectively without going actually to the countries to collect original empirical data. So in a way we now know much better what we can do with existing data. And we have really upped our game in terms of using existing data to respond to evaluative questions.
BB: That is really exciting, but I wonder, are there risks with applying these techniques ?
JV: I think the best way to look at these innovations, data science techniques and the ability to analyze all the new data with these techniques is to see them as complimentary tools for evaluative analysis. And there are several reasons why I'm saying this. First of all, there's the challenge of what we call construct validity. We are measuring certain things, but what we measure is not necessarily the same as the complex phenomenon that we are interested in. Think about a complex phenomenon like poverty that has many dimensions and we're using very simple measures like the income that people earn or the assets that they have as proxies for that more complex phenomenon.
That also applies to, let's say when we use data science to actually use imagery data as a proxy for, let's say the assets that they have. We use, let's say photos of the state of the roofs of houses as a proxy of what type of house they have and which again is a proxy of their asset base, which again is a proxy of their, let's say poverty level. So we are dealing with abstraction of complex phenomena. So we have to be always aware that we are measuring something which is not necessarily the same as the phenomenon that we're interested in. A second big limitation is that we do not know actually the validity of certain type of data.
The textual data would be a good example here. Say we have a data base of hundreds of thousands of project related documents, now we cannot just go and analyze these documents without knowing where do these documents come for from, and what are these documents really about? And international organizations such as the World Bank Group or any of the other multilateral development banks or United Nations agencies, we all know that documents are written with a certain style, with a certain language use and they have certain information, but they also do not have particular information included in these documents. So we have to be aware what are the inherent biases in the original data before we start looking for broader patterns in these data.
And finally, with all these beautiful data sets becoming available, financial transaction data, mobile phone use data, imagery data from drones or from satellites, we have to always keep in mind that these data can help us analyze parts of evaluative questions, but we should not lose focus here. It is the questions that determine how data come in and when we use certain data and it is not the data that determines what types of questions we are going to answer.
BB: So what advice would you give to an evaluator who's conceiving a new evaluation and trying to decide whether their evaluation questions would require going to the country, going to the client to gather data or could use a more in-house innovative approach?
JV:First of all, I would advise evaluation team leaders or evaluation team members to consult with colleagues. Many other evaluation colleagues are grappling with the same problems and they probably have found some solutions and it's always good to learn from experiences of others. I would also advise the evaluation team leader or evaluation team members to consult the various resources that we have developed.
For example, we have recently published the first papers in what we call the IEG Methods Paper Series. We have also developed a lot of guidance materials and course materials.
BB:I hope our listeners will check out the methods papers you mentioned and other resources. Let's turn to the future. How do you think the field of evaluation will evolve and what are the new opportunities and challenges that future will bring?
JV: This is a really challenging question. For sure the future of evaluative inquiry is uncertain and I will not even begin to offer a comprehensive answer, but I think there are some points that I would like to briefly touch upon. The first point is that we have to acknowledge that the data revolution is here. So from the perspective of evaluation, this means it is not a question about getting on the train or getting not on the train. It's really a question about how do we get on the train? How do we use old and new data and how do we use new data science applications in a thoughtful and meaningful way.
We have to try out new things. We have to also be prepared to sometimes fail, learn from our failures and only retain those things that do work.
BB: Many thanks for joining me, Jos. It has been a fascinating conversation. To learn more about IEG's experience using data in innovative ways and to read any of the reports mentioned in this episode, please visit ieg.worldbank.org. Don't forget to subscribe to future podcasts. This has been What Have We Learned? The Evaluation Podcast. Thank you for listening.