Model-based Methods for Truncating Survey Weights

In unequal-probability-of-inclusion sample designs, correlations between the probability of inclusion and the sampled data can induce bias. Weights equal to the inverse of the probability of selection are often used to counteract this bias. Highly disproportional sample designs have large weights, which can introduce unnecessary variability in statistics such as the population mean estimate. Weight trimming or stratum collapsing models reduce large weights to fixed cutpoint values to reduce variance, but these approaches are usually ad-hoc with little systematic attention to the effect on MSE of the resulting estimates.

I have helped to develop model-based estimators for weight trimming using two broad approaches: Bayesian variable selection modeling, which we have termed ``weight pooling,'' and Bayesian hierarchical modeling, which we have termed ``weight smoothing.'' Both approaches begin by transforming the case weights into dummy variables that stratify the by equal or approximately equal probabilities of inclusion. These ``inclusion strata'' may correspond to formal strata from a disproportional stratified sample design, or may be ``pseudo-strata'' based on collapsed or pooled weights derived from selection, poststratification, and/or non-response adjustments. These inclusion strata are ordered by the inverse of the probability of selection. Under this paradigm, an fully-weighted data analysis can be viewed the posterior predictive distribution of a population quantity under a model in which interaction terms are present between the weight stratum indicators and the underlying model parameters of interest.

``Weight pooling'' models collapse together these inclusion strata. Collapsing only the largest valued strata mimics weight trimming by assuming the underlying data from these combined strata are exchangeable In a regression setting, this model can be posed as a variable selection problem, where dummy variables for the inclusion strata interact with the regression parameters; subtracting from or adding to the inclusion strata design matrix allows for a greater or lesser degree of weight trimming. By averaging over all possible of these ``weight pooling'' models, we can compute an estimator of the population parameter of interest whose bias-variance tradeoff is data-driven.

``Weight smoothing'' models treat the underlying weight stratum means as random effects, and induce weight trimming by smoothing stratum means for which the data provide little evidence of difference, and separating means which the data suggest should be separated (Holt and Smith 1979, Ghosh and Meeden 1986, Little 1991, 1993, Lazzeroni and Little 1998, Rizzo 1992).

Both methods allow for the possibility of ``partially-weighted'' data that uses the data itself to appropriately modulate the bias-variance tradeoff, and also allows estimation and inference from data collected under disproportional probability-of-inclusion sample designs to be based on models common to other fields of statistical estimation and inference.

"Model-Based Alternatives to Trimming Survey Weights," Elliott, MR and Little, RJA (2000), Journal of Official Statistics, 16, 191-209 consider both weight pooling and weight smoothing models for the estimation of population means.
"Bayesian Weight Trimming for Generalized Linear Regression Models," (Elliott 2007), Survey Methodology, 33, 23-34 extends the weight smoothing models of Elliott and Little (2000) to accommodate linear and generalized linear models
"Model Averaging Methods for Weight Trimming," (Elliott 2008), Journal of Official Statistics, 24, 517-540 extends the weight pooling models of Elliott and Little (2000) to accommodate linear models and to allow for all contiguous inclusion strata to be considered for pooling. The latter induces a high degree of robustness into our model, protecting against "over-pooling" that simpler models suffered from in Elliott and Little (2000).
"Model Averaging Methods for Weight Trimming in Generalized Linear Regression Models," (Elliott 2009), Journal of Official Statistics, 25, 1-20 extends Elliott (2009) to allow for weight pooling of generalized linear models. R code used to for weight smoothing regression models