Combining Data from Multiple Surveys

The methodology developed here assumes that there is a "gold-standard" survey that lacks small area identifies to be paired with one or more other surveys that contain small area identifiers but suffers in quality due to lower response rate, inadequate sampling frame, and/or mode bias. We develop a novel extension of dual-frame estimation using propensity scores that allows the complementary strengths of each survey to compensate for the weakness of the other. We apply this method to the National Health Interview Survey (NHIS) and the Behavioral Risk Factors Surveillance System (BRFSS). The NHIS is a nationally representative, face-to-face survey with a high response rate; however, it cannot produce state or sub-state estimates of risk factor prevalence because sample sizes are too small and small area identifiers are unavailable to the public. The BRFSS is a state-level telephone survey that excludes non-telephone households and has a lower response rate, but does provide reasonable sample sizes in all states and many counties and has publicly-available small area identifiers (counties).

We obtain NHIS-adjusted 1999-2000 county-level estimates of male smoking prevalence and mammogram usage rates among females 40 and older. We consider evidence that these NHIS-adjusted estimates reduce the effects of selection bias and non-telephone coverage in the BRFSS using data from the Current Population Survey Tobacco Use Supplement.

"Obtaining Cancer Risk Factor Prevalence Estimates in Small Areas: Combining Data from Two Surveys," Elliott, MR and Davis, WW (2005), Applied Statistics, 54, 595-610. (correction: Applied Statistics, 54, 958).