Datasets

Datasets produced by myself and coauthors, or which are otherwise not easily publicly accessible.



South African industry codes and concordance tables

Tables from Industry classification in the South African tax microdata with Amina Ebrahim. Concordance tables come from a combination of official sources and our own hand-matching; please see the paper. If you use these tables, please cite the paper.
[Industry codes: Stats SA SIC 7 | Stats SA SIC 5 | SARS Activity Codes | SARS Profit Codes]
[Concordance tables: SIC 5 to SIC 7 | SIC 7 to SIC 5 | Activity Codes to SIC 5 | Profit Codes to SIC 5]
[Github repo with CSV versions & additional resources]




Machine-readable CPI

Statistics South Africa’s Historic CPI series is very useful but is trapped inside a pdf. I wrote a small script to extract and reshape the data into long .csv and .dta files.
[Code and data]




2011 census shapefiles

The published Stats SA 2011 census Small Area Layer (SAL) shapefiles (available from e.g. DataFirst) do not include polygons for areas where a Small Area (SA) contains 10 or fewer individuals. This can be frustrating to work with. Helene Verhoef kindly provided me with the following Stats SA shapefiles which include (empty) polygons for these sparsely populated SAs. Please acknowledge Stats SA if you use them.
[Shapefiles]