Courses and Events
Generating Synthetic Data for Statistical Disclosure Control
02/05/2017 - 03/05/2017
Birkbeck College, Malet Street, London
View in Goolge Maps (WC1E 7HX)
Course No. ADRCE-Training 037 Drechsler
Course places are limited and registration by 25th April is strongly recommended.
Summary of Course:
This short course will provide a detailed overview of the topic, covering all important aspects relevant for the synthetic data approach. Starting with a short introduction to data confidentiality in general and synthetic data in particular, the workshop will discuss the different approaches to generating synthetic datasets in detail. Possible modelling strategies and analytical validity evaluations will be assessed and potential measures to quantify the remaining risk of disclosure will be presented. To provide the participants with hands on experience, the course will include practical sessions using R, in which the students generate and evaluate synthetic data based on real data examples.
The course covers:
By the end of the course participants will:
Computer Software and Computer workshops:
This event includes computer workshops.
The practical implementation of the approach will be illustrated using the statistical software R. Participants will use R to generate and evaluate synthetic data.
Dr Jörg Drechsler is distinguished researcher at the Department for Statistical Methods at the Institute for Employment Research in Nürnberg, Germany. He received his PhD in Social Science from the University in Bamberg in 2009 and his Habilitation in Statistics from the Ludwig-Maximilians-Universität in Munich in 2015. He is also an adjunct assistant professor in the Joint Program in Survey Methodology at the University of Maryland. His main research interests are data confidentiality and nonresponse in surveys. He received several awards for his research on synthetic data and recently published a book on this topic.
The course intends to summarize the state of the art in synthetic data. The main focus will be on practical implementation and not so much on the motivation of the underlying statistical theory. Participants may be academic researchers or practitioners from statistical agencies working in the area of data confidentiality and data access. Basic knowledge in R is expected. Some background in Bayesian statistics is helpful but not obligatory.
This is a two-day course. On Day one, the Registration will start from 9.30 and formal teaching will commence at 10.00 and finish at around 17.00. On Day two, it will start at 9.00 and finish at around 16.00.
Event Outline (Programme):
1. A Brief History of Data Confidentiality
2. Some Basics Regarding Multiply Imputed Synthetic Datasets
3. Analyzing Synthetic Datasets
4. Generating Synthetic Datasets
5. Recent Extensions of the Synthetic Data Approach
6. Chances and Obstacles of the Approach
Some background regarding general linear modelling is expected. Familiarity with the concept of Bayesian statistics is helpful but not required. The statistical software R will be used to illustrate the implementation of the approach.
Familiarity with basics in R would be useful. Participants not familiar with the software can team up with experienced R users during the practical sessions.
The course is based on the following book:
Drechsler, J. (2011) Synthetic datasets for statistical disclosure control. Theory and implementation. Lecture notes in statistics, 201, New York: Springer
Some useful papers are:
Karr, A. F., Kohnen, C. N., Oganian, A., Reiter, J. P., and Sanil, A. P. (2006). A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60, 224–232.
Kinney, S. K., Reiter, J. P., Reznek, A. P., Miranda, J., Jarmin, R. S., and Abowd, J. M. (2011), Towards unrestricted public use business microdata: The synthetic Longitudinal Business Database, International Statistical Review, 79, 363 - 384.
Raghunathan, T. E., Reiter, J. P., and Rubin, D. B. (2003). Multiple imputation for statistical disclosure limitation. Journal of Official Statistics 19, 1–16.
Reiter, J. P. (2003). Inference for partially synthetic, public use microdata sets. Survey Methodology 29, 181–189.
Reiter, J. P. (2012), Statistical approaches to protecting confidentiality for microdata and their effects on the quality of statistical inferences, Public Opinion Quarterly, 76, 163 - 181.
Rubin, D. B. (1993). Discussion: Statistical disclosure limitation. Journal of Official Statistics 9, 462–468
Woo, M. J., Reiter, J. P., Oganian, A., and Karr, A. F. (2009). Global measures of data utility for microdata masked for disclosure limitation. Journal of Privacy and Confidentiality 1, 111–124.
Participants will receive written course notes.
Podcast for some of our previous courses can be found at https://adrn.ac.uk/about/network/england/training-podcasts/
Our courses are very popular and are often oversubscribed. If you cannot attend a course you have registered for, it is essential to kindly notify us a minimum of 30 days in advance so that your place can be released for another attendee. Details of our cancellation policy are here: http://store.southampton.ac.uk/help/?HelpID=1 . Please see our full course list here: http://store.southampton.ac.uk/browse/product.asp?compid=1&modid=5&catid=113.
Dr Jörg Drechsler
University of Southampton/ADRC-E
Intermediate (some prior knowledge)
Thanks to ESRC funding we are able to offer this course at reduced rates as follows. The fee per day is: 1. £30 - For UK registered postgraduate students 2. £60 - For staff at UK academic institutions, Research Council UK funded researchers, UK public sector staff and staff at UK registered charity organisations 3. £220 - For all other participants 4. Free Place for ADRC/ADRN/ADS staff All fees include event materials, lunch, morning and afternoon tea. *They do not include travel and accommodation costs.* Our courses are very popular and are often oversubscribed. If you cannot attend a course you have registered for, it is essential to kindly notify us a minimum of 30 days in advance so that your place can be released for another attendee.
Website and registration
Related publications and presentations