Session 2b, Confidentiality Issues

This session will be held in the Erskine Building, Room 031

13:10 — 13:30

Census tables: richness, structure and risk

Lisa Henley
Statistics New Zealand

Mike Camden
Statistics New Zealand

Detailed tables of counts from a census are highly valued by planners for their richness of information, but bring risk of disclosing particulars about individuals. Statistical agencies aim to sift the richness from the risk. We will look at the structures inside some tables, and suggest ways to measure the important features of these structures. We will examine the risks that come with sparseness, and assess the effects on the richness of rounding and suppression methods.

13:30 — 13:50

Generating Synthetic Unit-Record Data from Published Marginal Tables

Alan Lee
University of Auckland

We survey methods for generating synthetic data sets without making use of unit-record data. The methods we describe allow the creation of unit-record data in the form of high-dimensional tables whose marginals match publicly available marginal tables. We consider methods based in integer and quadratic programming which allow the construction of tables which exactly match the public tables, and also methods based on iterative proportional fitting which match the public tables approximately.

We describe a set of R functions which implement the methods under study, and apply the methods to data from the 2001 Census of Population and Dwellings.

13:50 — 14:10

Global Recoding, Information Loss, and Confidentiality

Alistair Gray
Statistics Research Associates Ltd

Fraser Jackson
Victoria University of Wellington

Stephen Fienberg
Carnegie Mellon University

The problem of providing informative summaries of contingency tables has been addressed many different ways. We regard categories as primary elements in the definition of the space over which the table is defined and this talk suggests a method of exploring how they match the information structure in the data. It develops a systematic single category collapsing procedure which is based on finding the member of a class of restricted models which maximizes the likelihood of the data and uses this to find a parsimonious means of representing the table. The focus is on information rather than statistical testing. A feature of the procedure is that it can easily be applied to tables with up to millions of cells providing a new way of analysing large data sets in many disciplines. An obvious application is confidentiality of Census tables where there is a tradeoff between preserving the information in the table for the user and preserving nondisclosure for the respondent. This talk is based on the findings of the OSRDAC 2005/06 project “Impacts of global recoding to preserve confidentiality on information loss and statistical validity of subsequent data analysis” to be published on the SNZ website.

Presentation Program