Useful links

A collection of useful links, partly for my own reference.


A short note with some tips I’ve put together for working with “large” datasets in Stata, which links to other resources.

Stata2R from Kyle Butts, Nick Huntington-Klein and Grant McDermott provides a very helpful introduction to data.table and fixest in R, primarily designed for Stata users.

Grant McDermott’s Data Science for Economists course is an excellent introduction to tools such as Git and tasks such as webscraping.


The single best source for South African microdata is the DataFirst Data Portal.

Statistics South Africa collects and produces an invaluable array of South African data. Unfortunately their website can be difficult to navigate. A few particularly useful pages:

Catalogue of public data sources courtesy of Open Data South Africa. Their website, with data organised by theme, is easier to navigate than the (overwhelming) catalogue.

South African COVID-19 data:

Data on electricity supply/generation and load shedding (rolling blackouts) in South Africa:

Geospatial data:

  • If you’re working with South African census data, Adrian Frith’s census mapping website is very useful.
  • The SALDRU YouthExplorer is also a very useful source for geolocated South African data. Apart from youth-focused socio-economic statistics, it also provides downloadable point data for the coordinates of “service points” such as schools, police stations, post offices, healthcare facilities, SASSA offices, and a variety of other facilities.
  • South African police station coordinates with their boundaries are available from SAPS.
  • The Copernicus Climate Data Store is a definitive source of weather data.
  • The Gridded Population of the World (GPW) data from SEDAC is very useful if you need the spatial distribution of human population across a continuous raster surface, rather than one defined by administrative boundaries.