Global Recoding, Information Loss, and Confidentiality

Alistair Gray
Statistics Research Associates Ltd

Fraser Jackson
Victoria University of Wellington

Stephen Fienberg
Carnegie Mellon University

The problem of providing informative summaries of contingency tables has been addressed many different ways. We regard categories as primary elements in the definition of the space over which the table is defined and this talk suggests a method of exploring how they match the information structure in the data. It develops a systematic single category collapsing procedure which is based on finding the member of a class of restricted models which maximizes the likelihood of the data and uses this to find a parsimonious means of representing the table. The focus is on information rather than statistical testing. A feature of the procedure is that it can easily be applied to tables with up to millions of cells providing a new way of analysing large data sets in many disciplines. An obvious application is confidentiality of Census tables where there is a tradeoff between preserving the information in the table for the user and preserving nondisclosure for the respondent. This talk is based on the findings of the OSRDAC 2005/06 project “Impacts of global recoding to preserve confidentiality on information loss and statistical validity of subsequent data analysis” to be published on the SNZ website.

Session 2b, Confidentiality Issues: 13:50 — 14:10, Room 031

