Back to cover

Poverty Mapping: Innovative Approaches to Creating Poverty Maps with New Data Sources

Chapter 3 | Global System for Mobile Communications, Smartphone, and Wi-Fi Connectivity Data

This section discusses the methodological implications of using Global System for Mobile Communications, smartphone, and Wi-Fi connectivity indicators as a main data source for generating poverty maps.

Definition

The set of data sources examined here comprises indicators on connectivity (for example, internet speed and network coverage) and technology use (for example, the prevalence of high-end smartphones or certain mobile phone operating systems); this section explores their usefulness in creating poverty maps in different locations. The use of these indicators assumes that connectivity data provide strong predictive information on income levels and can be used to predict the socioeconomic situation in that location. For example, an area with fewer smartphones and lower Wi-Fi connectivity would suggest lower wealth levels relative to an area with a higher prevalence of high-end smartphone use and fourth generation (4G) internet connectivity.

Data Sources

The variables that are particularly useful for this type of analysis include network access (2G, 3G, or 4G networks, Wi-Fi connectivity, and so on), the mobile operating systems used (Android, iOS, Windows), and the brands of smartphone used (Apple, Samsung, Motorola, and so on). Some of this information is publicly available but at different levels of geographic and temporal disaggregation, depending on the country of interest.1 Additionally, technology companies such as Facebook tend to possess more granular data, which can be extremely useful for this type of analysis. Some of these data are publicly available for research purposes (for example, network coverage maps from Facebook), but data that might identify users remain proprietary and confidential. Such proprietary data may be accessible, however, through an agreement with the owner of the data and a clear statement on its intended use.

Methods

Methods used to generate poverty maps vary greatly depending on the type of data used.

The simplest method, typically applied in the case of models relying only on connectivity data, is ridge regression. Ridge regression is an extension of OLS linear modeling, which is particularly useful for multivariate regression problems where the explanatory variables are suspected to be highly correlated (exhibiting multicollinearity; Hastie, Tibshirani, and Friedman 2009). This method aims to avoid overfitting a model when there are many predictors.

Other approaches apply more complex models, such as convolutional neural networks (CNNs) and transfer learning, to a combination of connectivity data and satellite imagery to derive micro-estimates of wealth (Chi et al. 2021). CNNs are deep-learning algorithms that assign weights to various features of an image. Transfer learning is a machine-learning method in which a model developed and trained for a task is reused as the starting point for a model on a different task. Transfer learning approaches have the advantage of reducing the time and computing resources needed to train a new model.

Applicability Considerations

There is some potential for the use of connectivity data in the context of evaluation. Publicly available data (such as Facebook’s advertising data) can be used to create poverty maps through relatively simple techniques (such as regression analysis) and that do not require specialized software or additional computing resources. Research on the use of connectivity data, however, is still incipient and fairly limited, and therefore the limitations of these models are not yet fully understood. Preliminary research suggests that this modeling approach performs better in urban areas and is highly dependent on penetration rates.2

Another method uses Facebook’s poverty estimates—derived using the method described under Example 2 below—which are publicly available in a tabular format for 135 countries at a very granular spatial resolution. But these estimates are only available for 2021. The method could be replicated for other years; however, this would require access to Facebook’s proprietary data, and substantial expertise in machine learning and appropriate computing resources to run image-based models.3

Examples

Example 1: Poverty Maps Using Facebook’s Publicly Available Advertising Data

Fatehkia, Coles et al. (2020) developed a model using Facebook advertising data to estimate household wealth in India and the Philippines. The authors used the Facebook Marketing Application Programming Interface to query the number of Facebook users matching certain criteria to obtain insights into the spatial distribution of users by device type (for example, iOS, Windows, Android), access to connectivity (for example, 2G, 3G, 4G, Wi-Fi), and use of high-end devices (the latest releases of Apple iPhones and Samsung Galaxy phones).

The Facebook Marketing Application Programming Interface only provides an estimate of the monthly active users matching the specified criteria at different levels of geolocation (the most disaggregated level is the city level). In addition to Facebook penetration data, the fraction of users in each location with access to these different features was computed using a ridge regression approach, with the assumption that these insights provide signals on the underlying distribution of poverty. For comparison, the authors also collected nighttime and daytime satellite data, which were processed using CNNs to extract relevant features. The estimates obtained by this model were validated against a wealth index, which was constructed using principal component analysis based on data from the DHS.

In the case of the Philippines, the authors concluded that a model featuring Facebook data performed roughly similarly to a model based only on satellite data (with slightly better performance in urban areas). This conclusion is important because models based on Facebook’s public data are considerably simpler to implement than models using satellite data. In India, however, where Facebook penetration is lower, satellite data performed better.

Example 2: Poverty Maps Using Facebook’s Proprietary Connectivity Data

Chi et al. (2021) developed the first micro-estimates of wealth that cover the populated surface of all 135 low- and middle-income countries at 2.4 kilometer resolution. The estimates were generated by applying machine-learning algorithms to vast and heterogeneous data from satellites, mobile phone networks, topographic maps, and aggregated and anonymized connectivity data from Facebook. Data sources included road density, land cover, elevation, slope, precipitation, population, nighttime lights, satellite imagery, and specific features derived from Facebook’s proprietary data.

The authors found the resulting estimates of wealth to be quite accurate. Depending on the method used to evaluate performance, the model explained 56–70 percent of the actual variation in household-level wealth in low- and middle-income countries. In particular, information on mobile connectivity was highly predictive of subregional wealth, with 5 of the 10 most important features in the model related to connectivity.

This approach was further validated on “ground truth” measurements of wealth from DHS and local or regional surveys, where available. This validation was conducted using spatial markers in the survey data to link each village to the various data sources used in the study.4 Considering the impact of the COVID-19 pandemic on the launch of new development interventions and the importance of detailed wealth estimates for better targeting, Facebook has provided free access to these estimates for public use.5

  1. See the International Telecommunication Union’s Statistics page at https://www.itu.int/en/ITU-D/Statistics/Pages/stat/default.aspx; see Facebook’s Marketing Application Programming Interface web page at https://developers.facebook.com/docs/marketing-apis/; see Meta’s Data for Good webpage at https://dataforgood.facebook.com/dfg/tools.
  2. The fraction of users of each product (such as smartphones or Wi-Fi) varies greatly across countries and tends to be higher in urban areas. Penetration rates are typically computed as the ratio of users to the estimated population of the area of interest. If the penetration rate is low, the data might not be representative of the entire population. More important, the association with poverty levels is significantly weaker in such cases.
  3. The following geographically disaggregated data from Facebook require a license or are restricted: number of cells towers, number of Wi-Fi access points, number of mobile devices, number of Android devices, and number of iOS devices.
  4. Indeed, based on the strength of these results, the government of Nigeria is using these estimates as the basis for social protection programs. Likewise, the government of Togo is using these estimates to target mobile money transfers to hundreds of thousands of the country’s poorest mobile subscribers.
  5. The estimates can be found and downloaded from the Humanitarian Data Exchange website: https://data.humdata.org/dataset/relative-wealth-index.