Back to cover

Poverty Mapping: Innovative Approaches to Creating Poverty Maps with New Data Sources

Chapter 4 | Call Detail Record Data

This section discusses the methodological implications of using CDRs as a data source for generating poverty maps.

Definition

CDRs obtained from mobile network operators provide highly granular real-time data that can be used to assess socioeconomic behavior, including consumption, mobility, and social patterns. CDRs have been successfully used to predict poverty in some countries with (i) models that attempt to predict welfare based on call activity only, and (ii) combined models that use telephone data and remote sensing covariates.

Data Sources

CDR data include encrypted user ID, location area code, cell ID, time stamp, and event ID. Location area code and cell ID jointly determine the geographical location (coordinates) of the cell tower. The event ID records the type of the transaction: call in, call out, text messaging, and web browsing. In addition, researchers can typically infer the type of phone in use (including the brand), the vendor, the model, and the system. This information can be used as a proxy for the user’s disposable income. For the purpose of poverty mapping, CDRs can be used in conjunction with other poverty estimates or satellite imagery (daytime, nighttime, or both).

Methods

Raw CDR data are typically noisy and require preprocessing before they can be analyzed. Once cleaned, CDRs can be used to create geographical segments by constructing Voronoi polygons or grids at the desired resolution level.1 Estimating poverty rates from CDR features is an example of a supervised machine-learning problem, one in which input data are used to predict known outputs—in this case, CDR features and poverty rates. Once the model is built, it can be applied to new input data for which the corresponding output is unknown. In this case, the unknown output would be either different geographies or different points in time. Supervised machine-learning problems are either classification based, in which the output variable is one of a discrete set of classes (for example, poor or not poor), or regression based, in which the output variable is a continuous real number expressed as a decimal, ratio, or percentage.

Applicability Considerations

The use of CDRs for poverty mapping in the context of evaluation is fairly limited. Although these data can be obtained at zero or low cost, CDRs are largely proprietary and can be accessed only through an agreement with mobile network operators. Furthermore, to ensure the representativeness of data, agreements are needed with multiple mobile network operators with coverage in the area of interest. Some countries, however, have made anonymized CDR data sets publicly available (for example, Senegal); in these instances, there may be greater opportunities for using CDRs for poverty mapping.

Some specific expertise is needed to derive poverty maps from CDRs, including experience in advanced data cleaning and manipulation and in geospatial analysis. No advanced computing resources are likely to be needed. All parts of the analysis can be completed with a combination of Excel, Python, or R (open source), and geospatial software such as QGIS (open source).

Examples

Example 1: Guatemala Poverty Map

The World Bank conducted a CDR analysis focused on five administrative departments in the southwest region of Guatemala, using mobile phone data to predict observed poverty rates and generate poverty maps. The study used encrypted CDR data for August 2013, aggregated at the municipality level. To test the validity of the CDR analysis, the findings were compared with World Bank poverty estimates based on Guatemala’s National Living Conditions Survey for 2006 and 2011 and the 2002 Population and Housing Census.

The findings from the study indicate that CDR-based research methods may replicate poverty estimates obtained from traditional forms of data collection at a fraction of the cost. Although the poverty estimates produced by CDR analysis did not perfectly match those generated by surveys and censuses, the results show that more comprehensive data could greatly enhance their predictive power. CDR analysis has especially promising applications in low-income countries where limited fiscal and budgetary resources complicate the task of survey data collection.

Example 2: Rwanda Poverty Map

Blumenstock, Cadamuro, and On (2015) constructed a poverty map of Rwanda using an anonymized database containing records of billions of interactions on Rwanda’s largest mobile phone network. These data were complemented with follow-up phone surveys of a geographically stratified random sample of 856 individual subscribers, which included questions regarding asset ownership, housing characteristics, and several other basic welfare indicators. Given the geographic information contained in the CDR data, the authors were able to map each data point to small divisions created using Voronoi polygons. The data were analyzed using a supervised-learning algorithm to generate wealth predictions at a very fine degree of spatial granularity. Out-of-sample predictions were generated for the characteristics of the remaining 1.5 million Rwandan mobile phone users who did not participate in the survey. By comparing the model’s results with other sources of data, the study showed that CDR data were predictive of individual-level asset-based wealth in Rwanda.

In 2018, Blumenstock expanded this analysis by applying a simplified version of the previous model to Afghanistan. The objective was to demonstrate the accuracy of a model that could be replicated and generalized to different countries. The study relied on several rounds of interviews with 1,234 Afghan citizens. As in the case of Rwanda, each respondent’s survey responses were matched to their corresponding CDRs. The simplified model was applied to both Rwanda and Afghanistan with results similar to those observed in the original model. The author further investigated whether a model trained with data from one country (in this case, Rwanda) could be used to predict the wealth of a different country (Afghanistan). However, the results were only slightly better than those that would be obtained by random guesses, indicating the need to retrain the model with country-specific data.

  1. Voronoi polygons partition a plane into regions proximate to items or objects within a defined set.