We believe that evaluators are first and foremost truth-tellers. This fundamental principle has guided the scale up of our use of AI. The accuracy, reliability, and validity of the evidence we use to formulate our findings are the foundations of our credibility. They should also be at the heart of how we use AI.

Our new AI strategy is the culmination of a multi-year journey of discovery into the use of Data Science and AI for rigorous evaluation design to answer increasingly complex questions of development effectiveness. The strategy captures what we have learned along the way about the potential of these new technologies and their risks, with guidance on how we will manage both. We hope that it can serve as a compass for the evaluation community and beyond.

Quality first

Our strategy puts quality first—before efficiency. 

IEG’s use of AI starts with an uncompromising commitment to high quality data and evidence in all its applications. Data is the fuel for any AI system. It requires investment in curating and orchestrating data sources, testing and calibrating prompts, and ensuring that the model’s knowledge base is valid. Without this investment, AI applications will not distinguish between triangulated evidence and biased promotional information—giving them equal value, leading to the classic ‘garbage in, garbage out’ issue.

AI capabilities enable us to make sense of and integrate a wider range of data sources, both structured and unstructured, qualitative and quantitative, or stemming from community interviews as well as large scale surveys. However, it should not give us license to forego the pillars of rigorous evaluation, such as avoiding biases, triangulating our sources, and looking for the most plausible answers given the best available evidence.

At IEG we are entrusted with providing answers to consequential questions: “what works, for whom, under what circumstances”—whether it is on jobs and labor market reforms in IDA countries, gender equality, or how to embed learning in lending. If we skip these steps and blindly rely on AI for feedback or a sense of direction, we risk misguiding operational teams and leading astray World Bank Group decision-makers and clients.

Thoughtful experimentation

We have always prioritized thoughtful experimentation in our journey of discovery, and the advent of Generative AI (GenAI) has not changed our approach. Whether with text or images, we are dedicated to thoroughly testing our applications, and to embedding human judgement in our workflows. 

We are also building our own custom-made web applications to give us control over the knowledge base that we query, and the ability to build quality checks in the workflow and to have clarity on the accuracy of the model.

Our semi-automatic portfolio identification tool iπ is a case in point. It has been trained, built, and programmed with human-checks at its core. Our next tool, the coder, will put conceptual integrity at the core of its semi-automatic classification approach, enabling evaluators to leverage the speed of GenAI to classify project document extracts along literature-backed typologies and theories of change. 

Because we are evaluators first and foremost, IEG develops AI applications for impact, with very clear use cases and users in mind. This means developing with users and for them, which takes time, consultations, and iterations. These tools are based on years of experience in incrementally embedding data science and AI at the core of our practice. 

Transparency, practical ethics, and peer governance

While we are intransigent in our commitment to quality, we also know that the use of AI changes how we must approach quality assurance and governance. The asymmetry of information between the doers (analysts, data scientists and evaluators) and the checkers (team leaders, managers, leadership team) is significant. The hierarchy can’t be omniscient. 

The IEG strategy proposes a novel way of governing AI adoption with peer governance principles and adherence to strong values and practical ethics at its core. 

Practical ethics means going beyond acknowledging the potential for biases and risks, referring to high-level norms and standards or pretending to practice ‘responsible AI’. It means concretely and painstakingly testing for biases or making choices not to use AI for highly sensitive use cases such as the analysis of interviews with vulnerable populations. It means adopting a tiered approach with specific protocols for applications that are new or high-risk. 

Practical ethics also requires safe spaces for discussing controversial use cases, new applications or models, and exchanging amongst peers not just what is working but also what is not. IEG’s data science community of practice has long played this role. Our AI strategy elevates this group of analysts, data scientists and evaluators to play a critical role in our governance of AI, along with an AI Review Board which will play an equally important role in fostering a culture of responsible AI use.

As evaluators, we are wedded to transparency. We disclose our reports with long methods appendixes where we explain our design choices and their limitations. Our use of AI is no different. We explain what we use AI for, including the model and the validation techniques we employed, and we disclose our robustness checks and accuracy metrics. 

IEG produces rigorous, triangulated, independent evidence about what works, for whom, under what circumstances and why, to inform the continual cycle of learning and improved development effectiveness. Our use of AI does not change this commitment – it helps us realize it.