Abstracts

Session 2a, Sample Surveys

This session will be held in the Erskine Building, Room 031


10:50 — 11:10

Assessment of Imputation Methods for Integrated Business Data

Ricardo Enrico Namay II
Statistics New Zealand

A comparison of three imputation methods for an integrated business data was carried out. Initially, multiple imputation, mean imputation and donor imputation were tested. Because of computational limitations, the study was consequently restricted to mean and donor imputation. The paper demonstrates how imputation methods can be computationally compared with respect to several dimensions: order and distribution preservation, plausibility of individual values, preservation of correlations, and aggregate statistics. Through random sampling and linear programming, this paper also proposes a method to construct a rectangular subsample that replicates the pattern of missing values of the dataset to be imputed while at the same time preserving relative imputation class sizes.


11:10 — 11:30

Stratified sampling for skewed populations: beyond the cumulative square root rule

Michael Hayward
University of Canterbury

Stratified sampling is a widely used sample selection technique, particularly for skewed populations encountered in business, agriculture, income, and wealth. An important consideration in stratification design is strata delineation, and is often based of the cumulative square root of frequencies work of Dalenuis and Hodges (1959) and Cochran (1977). This talk will cover investigations in to the success of the cumulative square root approach for a range of skewed populations and comparisons with other recent methods.

References:

  • Cochran, W. G. (1977). Further aspects of stratified sampling. In Sampling techniques (3rd ed.) (pp.115-149). New York, USA: John Wiley & Sons.
  • Dalenius, T., & Hodges, J. L. (1959). Minimum variance stratification. Journal of the American Statistical Association, 54, 88-101.

11:30 — 11:50

Optimal Survey Design When Nonrespondents are Subsampled for Followup

A. James O’Malley
Harvard Medical School

Healthcare surveys often first mail questionnaires to sampled members of health plans and then follow up mail nonrespondents by phone. The high unit costs of telephone interviews make it cost-effective to subsample the followup. We derive optimal subsampling rates for the phone subsample for comparison of health plans. Computations under design-based inference depart from the traditional formulae for Neyman allocation because the phone sample size at each plan is constrained by the number of mail non-respondents and multiple plans are subject to a single cost constraint. Because plan means for mail respondents are highly correlated with those for phone respondents, more precise estimates (at fixed overall cost) for potential phone respondents are obtained by combining the direct estimates from phone followup with predictions from the mail survey using small-area estimation (SAE) models.


11:50 — 12:10

Estimation in Multiple Frame Surveys

Alastair Scott
University of Auckland

Patricia Metcalf
University of Auckland

In a multiple frame survey, independent samples are drawn from a number of frames whose union is equal to the population of interest, although individual frames might only give partial coverage. For example, the 4th Auckland Diabetes Heart and Health Survey combined samples from two frames, the standard Statistics New Zealand list of census mesh blocks and the Electoral Roll. In this case, the first frame would provide complete coverage of the population of interest (all Auckland residents between the ages of 18 and 74). However the survey specifications stipulated that at least 1000 Maori and 1000 Pacific Islanders be included in the sample and using the Electoral Roll provides a reasonably easy way of achieving this objective.

In this talk, we look at a class of estimation procedures that have two desirable properties. Firstly, they can be implemented using only standard software for single frame surveys and, secondly, the same set of weights is used for all variables. We examine the performance of several members of the class on data from the Auckland Diabetes Heart and Health Survey.


Presentation Program