Selfies in Evaluation: Improving the Project Self-evaluation System
How to make self-evaluation more candid and less burdensome
"Selfies" – self-portraits taken with cellphones, at arm’s length and posted on social media – are being generated at an astounding rate. This provides, for the sociologist or artist, an immense and diverse gallery of photos for study. It must be admitted, though, that the average quality of selfies is low. And the objectivity of these self-portrayals is questionable, with images skewing towards the vain, the lewd, and the blurred.
Here, then, is the dilemma of project self-evaluation, as practiced at the World Bank Group and other development agencies. These agencies need to track project experience, both for accountability and for learning. Self-evaluation, in principle, provides information about many more projects than could possibly be visited by a limited corps of evaluators. It also promotes the laudable practices of self-examination and lesson-learning by development practitioners. But of course it is difficult for anyone to be dispassionately objective about one’s own work. Individuals, teams, and organizations also face strong incentives to present their work in the best possible light.
To counteract these natural biases, the World Bank's self-evaluation system incorporates independent validation. The project team submits a completion report, which includes self-ratings of outcomes, Bank performance, and other dimensions of project experience. IEG does a desk review of the completion report and, in about a third of the cases, downgrades the outcome rating (only 2% of outcome ratings are upgraded.)
The status quo: a self-evaluation system that doesn’t rely on self-evaluation
There are two problems with this approach. First, IEG’s desk-based validation process is based just on an assessment of the appraisal document and the completion report. But completion reports rarely document the full evidence base that generated the reported results. So it is as if a financial auditor only checked a company’s balance sheet without even glancing at the underlying accounting data. When IEG does a detailed field-based evaluation, with access to more evidence, 23% of outcome ratings are downgraded from the validation stage, and 8% are upgraded. (Part of this, though, may relate to changes in performance since the project closed.)
The second problem is that, paradoxically, World Bank Group staff and managers ultimately do not rely on the self-evaluations. Since the self-rating is universally reviewed and subject to revision by the IEG rating, the self-rating is essentially superfluous for the purpose of tracking project or Bank performance.
Paradigm change: validate the system, not the documents
Is there an alternative? One possibility would be a paradigm change: instead of validating each individual completion report, let’s validate the integrity of the self-evaluation system as a whole. The idea is to use a system of selective review, with incentives and disincentives, to both motivate and verify accurate self-reporting.
A familiar example is the way many tax agencies, including the US’s Internal Revenue Service, deal with tax returns. The agency does not do a full audit of every return. Rather, it does a quick check of each return for internal consistency and for “red flags,” which, in past experience, have been associated with differences between the taxpayer’s estimate of taxes due and the agency’s. Disincentives are put in place to discourage inaccurate reporting.
Another example is Indonesia's PROPER system for monitoring industrial pollution (). Companies are rated on a five point scale. The Ministry of Environment celebrates companies with exemplary performance that goes well beyond regulatory standards, and posts the names of those found to be deliberately contravening pollution laws (if they fail to respond to a warning). Companies that establish a record of good performance are visited by field inspectors with a lower frequency than those that consistently or egregiously do poorly.
Clear objectives and robust monitoring = less scope for disagreement
Could this work for project evaluation? The trick is first to put in place incentives that truly reward accurate self-reporting. Second, use standards and procedures that better align evaluators’ and WBG staff evaluation standards and criteria.
Much of the scope for disagreement on ratings can be eliminated simply by ensuring that project objectives are unambiguously defined, indicators are clearly linked to objectives, and monitoring systems robustly track indicators. A sample-based audit could then verify the accuracy of the self-reports. If there is good agreement between self-ratings and independent ratings in the sample, then all the self-ratings would be considered validated.
Given proper incentives, clear objectives, and reliable evidence, perhaps we can move towards a self-evaluation system that produces higher quality "selfies" -- and a more accurate snapshot of the organization.