Jos Vaessen, Maria Elena Pinglo, and Victor Vergara also contributed to this blog.
The availability of geospatial data can be vital to better understand development issues and to ensure development efforts are directed to the places where they are most needed. Geospatial data refer to any data containing information about a specific location on the Earth's surface. This encompasses a wide variety of data types such as project activities’ coordinates, political boundaries, crop patterns, road networks, and geolocated survey indicators.
Imagery data -such as satellite imagery- have also been traditionally used for geospatial analysis. However, until recently, their use remained mostly constrained to the domain of military applications given the vast computational resources needed to store and process these data. This has drastically changed due to advances in machine learning algorithms and an increase in computational capabilities, which have made this data type more generally accessible. Satellite data are particularly relevant for geospatial analysis in that they are often publicly available at a global scale, can be used to understand a broad range of phenomena, and are available over long periods of times making them suitable for time-series analysis. Although less widespread than satellite imagery, digital photos - such as streetscape images of urban scenes - are also becoming an important data source for geospatial analysis, especially when combined with the application of Artificial Intelligence (AI) techniques, which can assign meaning to features from these images.
The Independent Evaluation Group (IEG) has been exploring the use of new techniques of geospatial analysis -including the use of satellite and digital images- to understand change in spatial phenomena of interest over time, and to help answer questions on relevance and effectiveness of development interventions.
Geospatial data can be used to describe how some spatial phenomena changed over a period of time, by creating a chronological series and quantifying the amount of change. This technique has many potential areas of application, such as understanding changes in weather phenomena or in deforestation patterns.
IEG conducted a study to evaluate an urban development project implemented in Bathore (Albania). The study aimed at ascertaining the extent of urban growth of upgraded neighborhoods. The analysis relied on the use of publicly available satellite images during the period 1999-2010. The team trained an algorithm to classify the images’ pixels across four classes of land cover: built-up environment, forest, water, and agricultural land. This supervised classification algorithm helped group the image data into the four categories and allowed the team to see the evolution of land cover classes in the area and to detect a clear increase in the built-up environment. (See Fig. 1.)
Fig. 1. Land Use/Land Cover classification of the area of interest for the period 1999-2010
IEG has been piloting an approach across its Country Program Evaluations (CPEs) to help ascertain whether the Bank (and other development partners) are targeting the areas - such as regions or provinces - where there is a greater need.
The analysis relies on building a customized dataset using multiple variables with geocoded data, (such as project locations), macro variables (such as population and GDP per capita), and sector-specific variables (such as education and energy). These data are typically derived from a variety of sources, including geocoded survey data and official statistics, publicly available gridded datasets, and remote sensing data.
A key advantage of this approach is that it leverages both traditional and novel data sources, which are then combined to produce granular, subnational estimates. This is critical to move beyond national averages, which can hide, in many cases, regional disparities, for example, in terms of the level of access to basic services.
As geospatial data are typically available for locations beyond the specific project boundaries, it is also possible to use these data to build a spatial counterfactual, which measures what would have happened in the absence of the intervention. This methodological design is based on the identification of both a “treatment” and a “control” area (i.e., areas that benefited from the program and those that did not). It requires sufficient relevant data on outcomes and key factors affecting the outcomes, including how the factors interact among one another, before and after the implementation of the project.
IEG conducted a geospatial impact assessment on the change in urban density over time around a road improvement project in Maputo (Mozambique) as part of the Managing Urban Spatial Growth evaluation. The study used a combination of machine learning techniques and econometrics. Applying a “difference-in-differences” approach, the team assigned as “treatment area” the plots that were within a buffer distance from both sides of the road improvement project, and as “comparison” or “control” area contiguous land from the north of the treatment area that was not included in any of the project’s road improvement activities. The team used a grid cell as the unit of analysis, and data sources included project locations, satellite images, digital elevation models, road networks, and points-of-interest. The study demonstrated that the horizontal density, i.e., building outwards through new urban areas, increased over time in the project area compared to the control area. At the same time, there were no statistically significant differences between the project and control areas in terms of changes in vertical density, i.e., building upwards and filling open spaces between existing buildings. (See Fig. 2).
Fig. 2: Horizontal and vertical growth of Maputo (Mozambique)
Geospatial analysis can be instrumental towards helping identify and understand the geographical impact of interventions and directing development efforts where they are most needed.
Such analyses are, however, not devoid of limitations. Geospatial data are stored in specific formats, requiring specialized knowledge and expertise to manipulate the data. Additionally, although computational capabilities have greatly increased recently, some applications—especially those based on the manipulation of large amounts of image data—remain computationally intensive and might require access to additional computing resources. Finally, it is important to note that many geospatial data are essentially proxies of more complex phenomena (e.g., poverty, environmental degradation) of interest. Therefore, geospatial analysis is not a substitute for but rather a complement to on-the-ground (qualitative) data collection and analysis, with the latter still vital for strengthening the validity of findings.
This is super cool! As I was…
This is super cool! As I was reading I thought the quasi-experimental set up lent itself to a regression discontinuity design, but the diff-in-diff makes sense too. Love it!
Congratulations, this was an…
Congratulations, this was an interesting read, especially the exploration of vertical/horizontal growth of buildings . Do you also think including socio-economic data with geospatial tools could also bring more insights ?
Add new comment