Data Quality Project

The Data Quality Project was a joint project between SALDRU and DataFirst intended to improve the quality of the analyses produced in South African household studies. The focus of the project was on documenting the strengths and weaknesses of the existing nationally representative datasets, improving the techniques used in the analysis of these datasets, and improving communication between researchers working on these issues.

The project undertook a number of activities:

  1. The construction of a web-based interface for disseminating information about datasets (the DataFirst Data Portal)
  2. Research on particular datasets and on the comparability of datasets
  3. Training workshops
  4. A national workshop on data quality issues
  5. Awarding student scholarships


The Time Use Survey
The Project organized several workshops. The first was a two-day hands-on workshop run in 2008 introducing the Time Use Survey. This survey is an under-utilised resource, because it is more complex and “cleaned” less well than some of the other Statistics South Africa data sets. The purpose of the workshop was to provide practical help in running analyses on that survey. The audience of this workshop was largely postgraduate students who might be thinking of using the data set for doing their research projects.

A second workshop dealt with the statistical technique of “bootstrapping”. The 2008 workshop attracted strong participation from academics from other parts of the country. This workshop was re-run in 2009, largely for postgraduate students at the University of Cape Town.

Asset Indices
A workshop dealing with asset indices was run in 2009 which attracted interest from postgraduate students within UCT and from the region (Stellenbosch).

Data Quality Issues
The Data Quality Project hosted a one-day workshop in January 2008 dealing with data quality issues. The workshop attracted a strong delegation from Statistics South Africa, including two Deputy Directors-General as well as a number of other senior officials. The National Research Foundation (and the South African Data Archive in particular) was also represented, as were many key academic researchers.

One highlight of the meeting for the academic researchers was an input by Prof Stoker on sampling design issues of many of the earlier data sets (the October Household Surveys as well as the Labour Force Survey). Much of this information was simply not available in the official documentation supplied with the datasets.

The Statistics South Africa officials found some of the findings (and inconsistencies) arising from analyses of the data very interesting, because it provided feedback both on the usefulness as well as some of the limitations of their work. A detailed report from the meeting was circulated to all participants.


  1. Analyses of particular data sets
  2. Comparative analyses of different data sets
  3. Tools for the analysis of “noisy” data

Papers and publications

Reweighting the OHS and LFS National Household Survey Data to create a consistent series over time: A Cross Entropy Estimation Approach
Branson, N. (2008). Master’s dissertation, School of Economics, University of Cape Town.

The measurement of employment status using cohort analysis, 1994-2004
Branson, N. and Wittenberg, M. (2007). South African Journal of Economics, 75(2): 313-326.

Earnings Inequality in South Africa: Decomposing Changes Between 1995 and 2006
Heap, A. (2009), Master’s dissertation, School of Economics, University of Cape Town.

Household Transitions in Rural South Africa, 1996-2003
Wittenberg, M. and Collinson, M. (2007). Scandinavian Journal of Public Health, 35 (suppl69): 130-137

Research Note: Errors in the October Household Survey 1994 available from the South African Data Archive
Wittenberg, M. (2006). South African Journal of Economics, 74(4):766-768.

Dissecting post-apartheid labour market developments: decomposing a discrete choice model while dealing with unobservables
Wittenberg, M. (2007). Economic Research Southern Africa Working Paper No. 46.

Testing for a common latent variable in a linear regression
Wittenberg, M. (2007). MPRA Working Paper 2550.

The October Household Survey 1994
Wittenberg, M. (2008). Data Quality Project, University of Cape Town.

Income in the October Household Survey 1994
Wittenberg, M. (2008). Data Quality Project, University of Cape Town.

Nonparametric estimation when income is reported in bands and at points
Wittenberg, M. (2008). Economic Research Southern Africa Working Paper No. 94.

Weighing the value of asset proxies: The case of the Body Mass Index in South Africa
Wittenberg, M. (2009). SALDRU Working Paper.

Estimating expenditure effects without expenditure data using asset proxies
Wittenberg, M. (2009). SALDRU and DataFirst Working Paper.

An introduction to maximum entropy and minimum cross-entropy estimation using Stata
Wittenberg, M. (2009). Stata Journal, StataCorp LP, 10(3): 315-330.

Sample Survey Calibration: An Information-theoretic perspective
Wittenberg, M. (2009). SALDRU and DataFirst Working Paper.