Dr. Johanna Choumert-Nkolo, Henry Cust, Callum Taylor

 

Researchers and policy-makers increasingly recognise the growing importance of data in the design, implementation, and evaluation of development programmes. Generally, however, there has been relatively little research examining the methods used to collect high quality data.

In this mini blog post series, we present the results from our new paper in which we shed light on how to use paradata to improve survey data quality:

Choumert-Nkolo J., Cust H., Taylor C. (2019) Using paradata to collect better survey data: Evidence from a household survey in Tanzania. Review of Development Economics

In this first post, we present the concept of “paradata”. Although paradata are widely used in survey methodology research they are still much less familiar to development economists. Therefore, despite their potential as a powerful tool for improving data quality, paradata are clearly underused.

Survey paradata are data about the data collection process, such as survey timings, locations, and response rates; while survey questionnaire data are the actual responses of the individuals interviewed, and auxiliary data are complementary administrative data or census data. Researchers typically focus on survey questionnaire data which are then used for their analysis; yet, paradata constitute an invaluable tool to improve the quality of questionnaire data and understand potential measurement errors. Whilst paradata are not new, the advent of electronic data collection (Computer Assisted Personal Interviewing – CAPI) has helped collect more systematic paradata and formalise their use.

Examples of paradata are provided below:

 

Paradata Measure
Timestamps Date and time of contact
Number of interviews per day, average interview length
Time per question, time per section
Interviewers’ performance
Analysis of responses according to the day or time in the day
Field teams’ workload (budgeting, human resources)
Time between interviews
Measurement errors (respondents or interviewers who rush / low understanding of the questionnaire resulting in a long interview)
Interview interruptions (time gaps between sections / disturbing the flow of the questionnaire)
GPS coordinates Track the movements of interviewers during and between interviews
Identify coverage bias, e.g. in random walk sampling
Data correction, data entry, keystrokes Navigation throughout the questionnaire (e.g. time, change of answers)
Counts of household visits/contact attempts Level of effort among interviewers
Cost / response rate analysis
Inform on the best time to visit respondents for future surveys and follow-up surveys
Non-response rate Acceptability of the survey overall or for specific populations
Interviewer trends
Non-response bias (completed interviews, reasons for refusal, interviewer’s observations, …)
Audio recording[1] Audio audit, number of interruptions
Interviewers’ characteristics (Gender, age, experience, etc.) Interviewers’ trends on various outcomes
Random number generator Respondent selection, order of response list, order of questions

[1] Audio-recording should be used carefully and only with informed consent of respondents.

Paradata are a great tool to monitor data quality in real time and to manage data collection costs and resources. Using paradata to monitor fieldwork allows researchers to identify issues or idiosyncrasies developed by specific interviewers and to take actions while fieldwork is on-going within minimal time lag.

To conclude, paradata should be used to improve data quality for (i) testing and piloting questionnaires and field protocols to manage time and resources more effectively, (ii) monitoring fieldwork and data quality on a day to day basis, (iii) and for cleaning data and assessing data quality.

In the next blog post, we will focus on timestamps and provide specific examples on how we have used them to monitor data quality for a field survey we conducted in Tanzania.