In unequal-probability-of-inclusion sample designs, correlations between the probability of inclusion and
the sampled data can induce bias.
Weights equal to the inverse of the probability of selection are often used to counteract this bias.
Highly disproportional sample designs have large weights,
which can introduce unnecessary variability in statistics such as the population mean estimate.
Weight trimming or stratum collapsing models reduce
large weights to fixed cutpoint values to reduce variance, but these approaches are usually ad-hoc with
little systematic attention to the effect on
MSE of the resulting estimates.
I have helped to develop model-based estimators for weight trimming using
two broad approaches: Bayesian variable selection modeling, which we have termed ``weight pooling,''
and Bayesian hierarchical modeling, which we have termed ``weight smoothing.''
Both approaches begin by transforming the case weights into dummy variables that stratify the
by equal or approximately equal probabilities of inclusion.
These ``inclusion strata'' may correspond to formal strata from a disproportional stratified sample
design, or may be ``pseudo-strata'' based on collapsed or pooled weights derived from selection,
poststratification, and/or non-response adjustments.
These inclusion strata are ordered by the inverse of the probability of selection.
Under this paradigm, an fully-weighted data analysis
can be viewed the posterior predictive distribution of a population quantity
under a model in which interaction terms are present between the weight stratum indicators and the underlying model
parameters of interest.
``Weight pooling'' models collapse together these inclusion strata.
Collapsing only the largest valued strata mimics weight trimming
by assuming the underlying data
from these combined strata are exchangeable
In a regression setting, this model can be posed as a
variable selection problem, where dummy variables for the inclusion strata interact with the regression
parameters; subtracting from or adding to
the inclusion strata design matrix allows for a greater or lesser degree of weight trimming.
By averaging over all possible of these ``weight pooling''
models, we can compute an estimator of the population parameter of interest whose bias-variance tradeoff
is data-driven.
``Weight smoothing'' models treat the underlying weight stratum means as random effects, and induce weight trimming
by smoothing stratum means for which the data provide little evidence of difference, and separating means which the data
suggest should be separated (Holt and Smith 1979, Ghosh and
Meeden 1986, Little 1991, 1993, Lazzeroni and Little 1998, Rizzo 1992).
Both methods allow for the possibility of ``partially-weighted'' data that uses the data itself to
appropriately modulate the bias-variance
tradeoff, and also allows estimation and inference from data collected under disproportional
probability-of-inclusion sample designs
to be based on models common to other fields of statistical estimation and inference.
"Model-Based Alternatives to Trimming Survey Weights," Elliott, MR and Little, RJA (2000),
Journal of Official Statistics, 16, 191-209
consider both weight pooling and weight smoothing models for the estimation of population means.
"Bayesian Weight Trimming for Generalized Linear Regression Models," (Elliott 2007),
Survey Methodology, 33, 23-34
extends the weight smoothing models of Elliott and Little (2000) to accommodate linear and generalized
linear models
"Model Averaging Methods for Weight Trimming," (Elliott 2008),
Journal of Official Statistics, 24, 517-540
extends the weight pooling models of Elliott and Little (2000) to accommodate linear models
and to allow for all contiguous inclusion strata to be considered for pooling.
The latter induces a high degree of robustness into our model,
protecting against "over-pooling" that simpler models suffered from in Elliott and Little (2000).
"Model Averaging Methods for Weight Trimming in Generalized Linear Regression Models," (Elliott 2009),
Journal of Official Statistics, 25, 1-20
extends Elliott (2009) to allow for weight pooling of generalized linear models.
R code used to for weight smoothing regression models