Dr. Johanna Choumert-Nkolo, Henry Cust, Callum Taylor


In the blog post series, we present the results from our recent paper:

Choumert-Nkolo J., Cust H., Taylor C. (2019) Using paradata to collect better survey data: Evidence from a household survey in Tanzania. Review of Development Economics

Our first post in this series, ‘What are paradata?’, can be found here and our second post ‘Timestamps – what are they good for?’, can be found here.

In this third post, we turn to another key element of paradata, namely Global Positioning System (GPS) coordinates processed using Geospatial Information Systems (GIS). GIS is everywhere in the modern world, for example if you’ve ever used GPS on your mobile device that’s enabled by GIS. It also has a wide range of other applications, including the analysis of road safety, geology, and natural disasters. GIS can also be a useful tool for researchers, particularly those deploying surveys in developing countries.

The collection of GPS coordinates is seen as a must for any serious fieldwork project. Whilst GPS coordinates are often used to calculate distances, for instance between households and local services, little attention has been placed on how GIS can be used to improve data collection practices and ultimately data quality. Such GIS data can be useful for assessing adherence to protocols and improving the overall accuracy and quality of data collected.

Specifically, GIS can be used to track the movements of interviewers during and between interviews and thereby ensure that they are following the agreed fieldwork protocols. In our paper, we show how GIS can be used to review random walks sampling methods and identify any idiosyncrasies that are taking hold in interviewers’ random walk and household selection. Any unexpected patterns, such as interviews taking place in the same perimeter or outside the village boundary, may reveal misbehaving field staff or other issues such impassibility of a road or avoidance of certain areas of the village, which will ultimately increase bias arising from the selection of respondents.  In surveys which require a listing stage or a random walk to select respondents, GIS data can reveal in-field practices that could potentially lead to sampling bias, thereby bringing into question the overall validity of the data.

GPS paradata can also be combined with other forms of paradata such as timestamps to give a clearer picture of the activities and behaviours of field staff.

For instance, in our paper we analyse GPS coordinates taken at the start and end of each interview, and provide evidence of how mapping this data, and combining it with other paradata such as timestamps, can inform data quality practices.

The figure below shows the paths taken by interviewers through a village in the sample area of our paper. The darker green area represents the main residential area; the points represent the location of interviews; the dotted line, the route interviewers took through the village; and, the numbers, arbitrary identifiers for interviewers. The order of interviews can be determined from the colour of the dots with white being the first moving through grey, then to black which is the final interview of the day. The white and black line represent the main road that runs through the middle of the village. This map was used to assess protocols for ensuring households are randomly selected, as well as to investigate patterns in interviewers’ behaviour in their household selection or adherence to field protocols.

With the growth of GIS and supplementary data in the developing world, the potential for geocoded data to be used during fieldwork and for analysis can soon be realised.