Next: NOTES ON CONFIDENTIAL VARIABLES
Up: No Title
Previous: 2000 STUDY DESIGNCONTENT
STUDY POPULATION
The study population for the 2000 Pre- and Post-Election Study is defined
to include all United States citizens of voting age on or before the 2000
Election Day. Eligible citizens must have resided in housing units in the
forty-eight coterminous states. This definition excludes persons living in
Alaska or Hawaii and requires eligible persons to have been both a United
States citizen and eighteen years of age on or before the 7th of November
2000.
>> DUAL FRAME SAMPLE DESIGN
The 2000 NES is a dual frame sample with both an area sample and an RDD
component. The RDD frame provides coverage of telephone households while the
area sample provides full coverage of all U.S. households including those
without telephones. Each of these sample designs will be described in the
following sections. The 2000 NES data set contains 1006 area sample cases
and 801 telephone sample cases.
>> FTF SAMPLE DESIGN - MULTI-STAGE AREA PROBABILITY
The area sample is based on a multi-stage area probability sample selected
from the Survey Research Center's (SRC) 1990 National Sample design.
Identification of the 2000 NES sample respondents was conducted using a four
stage sampling process--a primary stage sampling of U.S. Metropolitan
Statistical Areas (MSAs) or New England County Metropolitan Areas (NECMAs)
and non-MSA counties, followed by a second stage sampling of area segments, a
third stage sampling of housing units within sampled area segments and
concluding with the random selection of a single respondent from selected
housing units. A detailed documentation of the 1990 SRC National Sample,
from which the 2000 NES sample was drawn, is provided in the SRC publication
titled 1990 SRC National Sample: Design and Development.
The 2000 NES sample design called for an entirely new cross-section sample to
be drawn from the 1990 SRC National Sample; no panel component was included
in 2000. The 1990 SRC National Sample is a multi-stage area probability
sample. The 2000 NES sample was drawn from both the 1990 SRC National Sample
strata (MSA PSUs) and the 1980 SRC National Sample strata (non-MSA PSUs).
The modification of the 1990 design in which the 1980 strata definitions were
used for the non-MSA counties fully represents the non-MSA domain of the 48
contiguous states. This modification was made for cost and interviewing
efficiency reasons related to the availability of interviewers in these areas
who work on some of SRC's large panel studies. The following sections will
focus on the 1990 SRC National Sample design.
Selection Stages for the 2000 NES FTF Sample: 1990 SRC National Sample
------------------------------------------------------------------
Primary Stage Selection
The selection of primary stage sampling units (PSUs) for the 1990 SRC
National Sample, which depending on the sample stratum are either MSAs, New
England County Metropolitan Areas (NECMAs), single counties, independent
cities, county equivalents or groupings of small counties, is based on the
county-level 1990 Census Reports of Population and Housing (1). Primary stage
units were assigned to 108 explicit strata based on MSA/NECMA or non-
MSA/NECMA status, PSU size, Census Region and geographic location within
region. Twenty-eight of the 108 strata contain only a single self-
representing PSU, each of which is included with certainty in the primary
stage of sample selection. The remaining 80 nonself-representing strata
contain more than one PSU. From each of these nonself-representing strata,
one PSU was sampled with probability proportionate to its size (PPS) measured
in 1990 occupied housing units.
The full 1990 SRC National Sample of 108 primary stage selections was
designed to be optimal for surveys roughly three to five times the size of
the 2000 NES. To permit the flexibility needed for optimal design of smaller
survey samples, the primary stage of the SRC National Sample can be readily
partitioned into smaller subsamples of PSUs such as a one-half sample or a
three-quarter sample partition. Each of the partitions represents a
stratified subselection from the full 108 PSU design. The 2000 NES sample of
44 PSUs is a stratified random subsample of PSUs from the "A" half-sample
partition of the 1990 SRC National Sample. Because of the small size of this
NES sample, both the number of PSUs (selected primary areas) and the
secondary stage units (area segments) in the National half-sample were
reduced by subselection for the 2000 NES sample design. The 18 self-
representing areas in the 1990 SRC National half-sample were all retained for
the 2000 NES sample (8 of these remained self-representing in the 2000 NES
and 10 represent not only their own MSA but their "pair" among the twenty
additional self-representing primary areas of the full 1990 SRC National
Sample design). Nineteen of the 26 nonself-representing half-sample MSAs and
7 of the 14 half-sample non-MSAs were retained by the subselection for the
2000 NES sample (or 26 of 40 NSR PSUs).
Table 1 identifies the 44 PSUs in the 2000 NES sample by MSA status and
Region and also indicates the number of area segments used for the 2000 NES
sample (see next section on second stage selection).
Table 1: PSU Name and Number of Area Segments in the 2000 NES Sample
Showing 1990 SRC National-Sample Stratum and MSA Status.
==============================================================================
National Sample PSU National Sample PSU Name # of 2000 NES
Segments
==============================================================================
Eight Largest Self-representing PSUs
------------------------------------
120 New York, NY MSA 12
190 Los Angeles-Long Beach, CA MSA130 12
130 Chicago, IL MSA 9
121 Philadelphia, PA-NJ MSA 7
131 Detroit, MI MSA 6
150 Washington DC-MD-VA MSA 6
110 Boston, MA NECMA 6
171 Dallas and Ft Worth, TX CMSA 6
Ten Remaining Largest MSA PSUs
------------------------------
170 Houston, TX MSA 6
191 Seattle-Tacoma, WA CMSA 6
141 St Louis, MO-IL MSA 6
152 Baltimore, MD MSA 6
122 Nassau-Suffolk, NY MSA 6
194 Anaheim-Santa Ana, CA MSA 6
132 Cleveland, OH MSA 6
154 Miami-Hialeah, FL MSA 5(2)
181 Denver, CO MSA 6
196 San Francisco, CA MSA 6
Nonself-representing MSAs: Northeast
-------------------------------------
211 New Haven-Waterbury-Meriden, CT NECMA 6
213 Manchester-Nashua NH NECMA 6
220 Buffalo, NY MSA 6
226 Atlantic City, NJ MSA 6
Nonself-representing MSAs: Midwest
-----------------------------------
230 Milwaukee, WI MSA 6
434 Saginaw, MI MSA 6
239 Steubenville-Wheeling, OH (3) 6
240 Des Moines, IA MSA 6
Nonself-representing MSAs: South
---------------------------------
250 Richmond-Petersburg, VA MSA 6
255 Columbus, GA-AL MSA 6
257 Jacksonville, FL MSA 6
258 Lakeland, FL MSA 6
260 Knoxville TN MSA 6
262 Birmingham, AL MSA 6
273 Waco, TX MSA 6
274 McAllen-Edinburg-Mission, TX MSA 6
Nonself-representing MSAs: West
--------------------------------
280 Salt Lake City-Ogden etc, UT MSA 6
292 Fresno, CA MSA 6
293 Eugene-Springfield, OR MSA 6
Nonself-representing Non-MSAs: Northeast
-----------------------------------------
464 Gardner, MA 6
Nonself-representing Non-MSAs: Midwest
--------------------------------------
466 Decatur County, IN 6
470 Mower County, MN 6
Nonself-representing Non-MSAs: South
-------------------------------------
474 DeSoto Parish, LA 6
477 Chicot County, AR 6
480 Montgomery County, VA 6
Nonself-representing Non-MSAs: West
------------------------------------
482 ElDorado County, CA 6
Total Number of Segments 279
(1) Office of Management and Budget (OMB) June 1990 definitions of MSAs,
NECMAs, counties, parishes, independent cities. These, of course, differ in
some respects from the primary stage unit (PSU) definitions used in the 1980
SRC National Sample so will not be strictly comparable to the 1996 NES Panel
PSUs--particularly in New England where MSAs were used as PSUs in the 1980
National Sample and NECMAs were used as PSUs in the 1990 National Sample.
(2) One selected segment (023) was in a former trailer park that had no
housing units to be listed in January 1996. All had been destroyed in 1992 by
hurricane Andrew and there were no plans to rebuild.
(3) In the 1990 SRC National Sample, U.S. Census Region boundaries were
maintained for purposed of stratification at the Primary State of selection.
Since some MSA definitions cross Region boundaries, such MSAs were split and
the MSA counties recombined in ways that maintained the Region boundary. This
PSU actually contains the Ohio counties from both the Steubenville-Wierton,
OH-WV MSA (Jefferson County, OH) and the Wheeling, WV-OH MSA (Belmont County,
OH) and although it is made up of MSA counties -- it is not a cohesive MSA by
OMB 1990 definition.
Second Stage Selection Area Segments
The second stage of the 1990 SRC National Sample, used for the 2000 NES
sample, was selected directly from computerized files that were extracted for
the selected PSUs from the 1990 U.S. Census summary file series STF1-B.
These files (on CD Rom) contain the 1990 Census total population and housing
unit (HU) data at the census block level. The designated second-stage
sampling units (SSUs), termed "area segments", are comprised of census blocks
in both the metropolitan (MSA) primary areas and in the rural areas of non-
MSA primary areas. Each SSU block or block combination was assigned a
measure of size equal to the total 1990 occupied housing unit count for the
area. SSU block(s) were assigned a minimum measure of 72 1990 total HUs per
MSA SSU and a minimum measure of 48 total HUs per non-MSA SSU. Second stage
sampling of area segments was performed with probabilities proportionate to
the assigned measures of size (PPS).
For the 2000 NES sample the number of area segments used in each PSU varies.
In the self-representing (SR) PSUs the number of area segments varies in
proportion to the size of the primary stage unit, from a high of 12 area
segments in the self-representing New York and Los Angeles MSA PSUs, to a low
of 6 area segments in the smaller self-representing PSUs such as Cleveland,
Miami-Hialeah or Nassau-Suffolk MSAs. All nonself-representing (NSR) PSUs
were represented by 6 area segments each. A total of 279 NES area segments
were selected as shown in Table 1.
Third Stage Selection Housing Units
For each area segment selected in the second sampling stage, a listing had
been made of all housing units located within the physical boundaries of the
segment. For segments with a very large number of expected housing units,
all housing units in a subselected part of the segment were listed. The
final equal probability sample of housing units for the 2000 NES sample was
systematically selected from the housing unit listings for the sampled area
segments.
The 2000 NES sample design was selected from the 1990 SRC National Sample to
yield an equal probability sample of 2269 listed housing units. This total
included 1972 housing units for the main sample and three reserve replicates
of 99 cases each. Table 2 below shows the assumptions that were used to
determine the number of sample housing units. The overall probability of
selection for 2000 NES cross-section sample of households was f=0.00002116 or
0.2116 in 10,000. The equal probability sample of households was achieved
for the 2000 NES sample by using the standard multi-stage sampling technique
of setting the sampling rate for selecting housing units within area segments
to be inversely proportional to the PPS probabilities used to select the PSU
and area segment (Kish, 1965).
Fourth Stage Selection - Respondent Selection
Within each sampled 2000 NES occupied housing unit, the SRC interviewer
prepared a complete listing of all eligible household members. Using an
objective procedure described by Kish (1949) a single respondent was then
selected at random to be interviewed. Regardless of circumstances, no
substitutions were permitted for the designated respondent.
>> AREA SAMPLE DESIGN ASSUMPTIONS, SPECIFICATIONS AND OUTCOMES
The 2000 National Election Study sought a total of 1000 in-person interviews.
It was estimated that this would require a NES sample draw of 1972 housing
units. This assumed an occupancy/growth rate of 0.83, an eligibility rate of
0.94 and a response rate of 0.65. These assumptions were based on the 1998
NES field experience. The overall 2000 NES area sample design
specifications, assumptions and outcomes are set out in Table 2, below. A
sample of 2269 listed housing units was actually selected for the 2000 NES
study. This allowed for three reserve replicates of 99 cases each. There was
no panel component in 2000.
A comparison of the 2000 NES sample outcome figures to the design
specifications and assumptions in Table 2 shows that the actual occupancy,
eligibility, and response rates were very close to the expected rates. The
actual response rate for the Post-Election Telephone sample was 0.86, which
was slightly higher than the assumed rate of 0.85.
Table 2: 2000 NES Area Sample Pre and Post-Election Design
Specifications and Assumptions Compared to Sample Outcome.
==============================================================================
2000 NES 2000 NES 2000 NES 2000 NES
Pre-Election Pre-Election Post-Election Post-Election
Design Sample Design Sample
Specification Outcome Specification Outcome
==============================================================================
Completed 1000 1006 847 693
Interviews
Response Rate 0.65 0.64 .85 0.86
Eligible 1538 1564 1000 805 (4)
Sample
Households
Eligibility 0.94 0.95
Rate
Occupied 1634 1639
Households
Occupancy/ 0.83 0.82
growth Rate
Total Sample 1972 1986
Lines
(4) Initial sample lines (FTF and Phone) are different from the Pre-Election
completed interviews because of the switch in mode for randomly selected
sample cases.
>> 2000 NES RDD (RANDOM DIGIT DIAL)SAMPLE
The RDD telephone component of the 2000 NES is a stratified equal
probability sample of telephone numbers. The sample is not clustered. The
telephone numbers were selected from a commercial listed one hundred series
sampling frame consisting of every possible phone number that can be
generated by appending the 2-digit numbers 00 - 99 to the set of hundred
banks that have at least two listed household telephone numbers. Hundred
banks are the first eight digits of a phone number - area code, exchange, and
the next two digits. Each hundred bank defines a set of 100 possible phone
numbers. Directory listings are used to define the set of listed hundred
series. However both listed and unlisted telephone numbers can be selected
from the sampling frame. A small amount of noncoverage of telephone numbers
results from household numbers that are in hundred banks with 0 or 1 listed
residential numbers. These telephone households as well as non-telephone
households are covered by the area sample component.
An initial sample of 8500 telephone numbers was selected from the
listed frame for the coterminous 48 states. These numbers were pre-screened
by the vendor to remove most business and non-working phone numbers. After
pre-screening, 5760 or 67.8% of the 8500 telephone numbers were returned as
potentially working residential numbers. The potentially working phone
numbers were matched against a file of directory listings to append address
information so that Congressional Districts could be assigned. Before sample
selection, the telephone numbers were stratified by the competitiveness of
the Congressional race (5 levels), whether or not the race was open, and by
Census Division. A half sample was systematically selected from the
stratified file. An initial sample of 2349 cases was selected from the
random half sample and the remaining telephone numbers were assigned to 5
reserve replicates of 106-107 numbers each. The reserve replicates were
available for use in case the working rate or response rate were lower than
expected.
>> 2000 NES RDD SAMPLE DESIGN ASSUMPTIONS, SPECIFICATIONS AND OUTCOMES
The 2000 National Election Study sought a total of 861 telephone interviews.
It was estimated that this would require a NES sample draw of 2349 telephone
numbers assuming a working rate (after pre-screening) of 0.65, an eligibility
rate of 0.94, and a response rate of 0.60. The eligibility rate was based on
the 1998 NES experience. Working rate and response rate assumptions were
based on the Survey Research Center's recent experience with RDD samples. The
overall 2000 NES RDD sample design specifications, assumptions and outcomes
are set out in Table 3, below. A comparison of the 2000 NES RDD sample
design specifications and assumptions to the outcome figures in Table 3
indicates that, although the actual eligibility rate was higher than assumed,
both the working rate and response rates were lower than specified in the
sample design assumptions. This resulted in fewer interviews being taken in
the Pre-Election study. The actual response rate for the Post-Election
telephone sample was 0.86, which was higher than the assumed rate of 0.75.
Table 3: 2000 NES Telephone Sample Design Specifications and
Assumptions Compared to Sample Outcome.
==============================================================================
2000 NES 2000 NES 2000 NES 2000 NES
Pre-Election Pre-Election Post-Election Post-Election
Design Sample Design Sample
Specification Outcome Specification Outcome
==============================================================================
Completed 861 801 645 862
Interviews
Response Rate 0.60 0.56 .75 0.86
Eligible 1435 1418 861 1002 (5)
Sample
Households
Eligibility 0.94 0.96
Rate
Occupied 1527 1475
Households
Working Rate 0.65 0.63
Total Sample 2349 2349
Lines
(5) Initial sample lines (FTF and Phone) are different from the Pre-Election
completed interviews because of the switch in mode for randomly selected
sample cases.
>> 2000 NES POST-ELECTION STUDY SAMPLE OUTCOMES
Of the 1807 respondents interviewed in the Pre-Election Study, 1555
completed Post-Election interviews for an overall response rate of 0.86. FTF
interviews were attempted with 805 of the 1006 persons interviewed FTF in the
Pre-Election study and 693 FTF interviews were obtained for a FTF response
rate of 0.86. Approximately 200 FTF cases were transferred to telephone
interviewing for the Post-Election study in order to reduce field costs.
This was accomplished through a systematic random sample of approximately 20
percent of the area segments. Telephone interviews were attempted with 1002
(201 FTF in the Pre-Election study and 801 Telephone in Pre-Election study)
respondents in the Post-Election study. 862 telephone interviews were
obtained for a response rate of 0.86.
>> 2000 NES DATA - WEIGHTED ANALYSIS
The 2000 NES data set includes a person-level analysis weight, which
incorporates sampling, nonresponse and post-stratification factors. Analysts
interested in developing their own nonresponse or stratification adjustment
factors must request access to the necessary sample control data from the NES
Board.
>> 2000 NES ANALYSIS WEIGHTS - CONSTRUCTION
Household Selection Weight Component
------------------------------------
The joint household selection weight is the same for both the RDD and
the area sample. This weight is an inflation factor equal to 34195.298. It
is equal to the inverse of the joint probability of selection, which is the
sum of the RDD and the area sample probabilities minus their product. It was
not possible from the data available to reliably identify the area sample
respondents who did not have telephone service. The 2000 CPS March
Supplement estimates that 5.5% of U.S. households do not have telephone
service. The household selection weight component therefore slightly
underestimates respondents who live in households that cannot be reached
through the RDD sample frame.
Person-Level Sample Selection Weight Component
----------------------------------------------
The dual frame sample design for the 2000 NES results in a probability sample
of U.S. households. Within sample households a single adult respondent is
chosen at random to be interviewed. Since the number of eligible adults
varies from one household to another, the random selection of a single adult
introduces inequality into respondents' selection probabilities. In
analysis, a respondent selection weight should be used to compensate for
these unequal selection probabilities. The person-level selection weight is
the product of the joint household selection weight and the within household
selection weight. The within household selection weight is equal to the
number of eligible persons in the household and is capped at 3. The use of
the respondent selection weight is strongly encouraged, despite past
evaluations that have shown these weights to have little significant impact
on the values of NES estimates of descriptive statistics.
Nonresponse Adjusted Selection Weight
-------------------------------------
The base weight equals the product of the joint selection weight and the
household level nonresponse adjustment factors. Nonresponse adjustment
factors were constructed at the household level separately for the area
sample and the RDD sample. Nonresponse adjustment cells for the 2000 NES
sample were formed by crossing MSA status by the four Census regions
(Northeast, Midwest, South, and West). A nonresponse adjustment factor equal
to the inverse of the response rate in each cell was applied to the interview
cases. Tables 4 and 5 show the response rates and nonresponse adjustment
factors for the area and RDD samples.
Table 4. Computation of Nonresponse Adjustment Weights -- 2000 NES
Area Sample.
==============================================================================
PSU Type Census Region Response Rate Nonresponse
(%) Adjustment
Factor
==============================================================================
MSAs Northeast 55.28 1.809
Midwest 62.86 1.591
South 61.87 1.616
West 67.82 1.474
Non MSAs Northeast 61.54 1.625
Midwest 65.71 1.522
South 79.55 1.257
West 83.33 1.200
Table 5 Computation of Nonresponse Adjustment Weights -- 2000 NES RDD
Sample.
==============================================================================
PSU Type Census Region Response Rate Nonresponse
(%) Adjustment
Factor
==============================================================================
MSAs Northeast 43.94 2.276
Midwest 62.08 1.611
South 58.72 1.703
West 53.56 1.867
Non MSAs Northeast 50.00 2.000
Midwest 67.90 1.473
South 62.70 1.595
West 67.86 1.474
Post-stratification factor
--------------------------
The 2000 NES weights are post-stratified to 2000 CPS March Supplement
proportions for six (6) ages by four (4) education categories. Table 6
shows the weighted estimates and proportions for the 24 cells for the 2000
CPS and the 2000 NES. The post-stratification adjustment is computed by
dividing the CPS weighted total by the 2000 NES total weighted by the
nonresponse adjusted selection weight. The final two columns show the NES
weighted totals using the final post-stratified analysis weight and the
resulting percents, which match the CPS percents.
Final Analysis Weights
----------------------
The final analysis weight (FINAL_WT) is the product of the household level
non-response adjustment factor, the number of eligible persons, and a person-
level post-stratification factor. The final analysis weight for the 2000
NES sample (FINAL_WT) is scaled to sum to 1807, the total number of
respondents. This weight is trimmed at the 1st and 99th percentiles and then
re-scaled to match the 2000 CPS proportions for the 24 age by education
cells.
Post-Election Attrition Weight
------------------------------
The 1555 Post-Election cases were post-stratified to 2000 CPS March
Supplement proportions for six (6) ages by four (4) education categories (the
same categories used for post-stratifying the Pre-Election cases). The post-
stratification compensates for differential non-response by age group and
education level. Response rates for the Post-Election Study ranged from a
high of 100 percent for persons 70 or older with a college degree or higher
to a low of 76 percent for persons age 30 - 39 who did not graduate from high
school. The panel attrition weight for the Post-Election Study, POST_WT, is
the product of the Pre-Election FINAL_WT and the post-stratification factor
formed by dividing the CPS proportion by the weighted NES proportion for each
of the 24 age by education cells. The weight is scaled to sum to the number
of cases, 1555.
Table 6: 2000 NES Sample Weight: Post-stratification Factors.
==============================================================================
Age Education n 2000 CPS 2000 Prelim 2000 Post- NES Final
Group Level Est in CPS NES wtd strat wtd NES
000s (6) % Est in 000s Adjust n wtd
centered %
==============================================================================
18-29 <High 22 6,411.4 3.438 2,490.3 2.574 62.08 3.44
School
Graduation
High School 88 12,223.7 6.555 9,628.2 1.270 118.53 6.56
Graduate
Some 103 14,524.8 7.789 11,424.0 1.271 140.81 7.79
College
College 68 6,666.9 3.575 6,990.0 0.954 64.73 3.58
Graduate
30-39 <High 21 3,242.8 1.739 1,780.1 1.822 31.48 1.74
School
Graduation
High 108 12,543.8 6.727 10,873.1 1.154 121.56 6.73
School
Graduate
Some 121 10,759.0 5.769 11,727.6 0.917 104.32 5.77
College
College 146 10,786.4 5.784 14,122.3 0.764 104.36 5.78
Graduate
40-49 <High 22 3,478.8 1.865 2,277.5 1.527 33.74 1.87
School
Graduation
High 101 13,087.2 7.018 9,899.0 1.322 126.84 7.02
School
Graduate
Some 129 11,548.5 6.193 13,551.0 0.852 111.85 6.19
College
College 137 11,327.1 6.074 14,505.2 0.781 109.74 6.07
Graduate
50-59 <High 123 3,300.1 1.770 2,192.9 1.505 32.04 1.77
School
Graduation
High 93 9,364.1 5.022 9,558.1 0.980 90.70 5.02
Graduate
Some 96 7,449.2 3.995 10,185.6 0.731 72.12 3.99
College
College 110 7,984.6 4.282 11,542.5 0.716 77.40 4.28
Graduate
60-69 <High 35 4,136.4 2.218 3,429.9 1.206 40.20 2.22
School
Graduation
High School 61 7,201.9 3.862 6,060.7 1.188 69.77 3.86
Graduate
Some 49 3,886.6 2.084 4,280.8 0.908 37.58 2.08
College
College 49 3,880.8 2.081 4,688.9 0.828 37.53 2.08
Graduate
70 + <High School 58 7,298.9 3.914 5,033.8 1.450 70.63 3.91
Graduation
High School 73 7,994.7 4.287 6,327.7 1.263 77.51 4.29
Graduate
Some College 48 4,073.3 2.184 3,811.1 1.069 39.41 2.18
College 46 3,303.4 1.771 4,071.8 0.811 32.07 1.77
Totals 1807 186,470.0 100.0 180,100.0 1807.0 100.0
(6) Because U.S. citizenship is required for NES eligibility, the CPS counts
used for stratification include only U.S. citizens.
>> 2000 NES PROCEDURES FOR SAMPLING ERROR ESTIMATION
The 2000 NES sample design is based on a stratified multi-stage area
probability sample of United States households. Although smaller in scale,
the NES sample design is very similar in it basic structure to the multi-
stage designs used for major federal survey programs such as the Health
Interview Survey (HIS) or the Current Population Survey (CPS). The survey
literature refers to the NES, HIS and CPS samples as complex designs, a
loosely-used term meant to denote the fact that the sample incorporates
special design features such as stratification, clustering and differential
selection probabilities (i.e., weighting) that analysts must consider in
computing sampling errors for sample estimates of descriptive statistics and
model parameters. This section of the 2000 NES sample design description
focuses on sampling error estimation and construction of confidence intervals
for survey estimates of descriptive statistics such as means, proportions,
ratios, and coefficients for linear and logistic linear regression models.
Standard analysis software systems such SAS and SPSS assume simple random
sampling (SRS) or equivalently independence of observations in computing
standard errors for sample estimates. In general, the SRS assumption results
in underestimation of variances of survey estimates of descriptive statistics
and model parameters. Confidence intervals based on computed variances that
assume independence of observations will be biased (generally too narrow) and
design-based inferences will be affected accordingly.
Sampling Error Computation Methods and Programs
-----------------------------------------------
Over the past 50 years, advances in survey sampling theory have guided the
development of a number of methods for correctly estimating variances from
complex sample data sets. A number of sampling error programs which implement
these complex sample variance estimation methods are available to NES data
analysts. The two most common approaches to the estimation of sampling
error for complex sample data are through the use of a Taylor Series
Linearization of the estimator (and corresponding approximation to its
variance) or through the use of resampling variance estimation procedures
such as Balanced Repeated Replication (BRR) or Jackknife Repeated Replication
(JRR). New Bootstrap methods for variance estimation can also be included
among the resampling approaches. See Rao and Wu (1988).
1. Taylor series linearization method:
When survey data are collected using a complex sample design with unequal
size clusters, most statistics of interest will not be simple linear
functions of the observed data. The linearization approach applies Taylor's
method to derive an approximate form of the estimator that is linear in
statistics for which variances and covariances can be directly and easily
estimated (Woodruff, 1971). SUDAAN and Stata are two commercially available
statistical software packages that include procedures that apply the Taylor
series method to estimation and inference for complex sample data.
SUDAAN (Shah et al., 1996) is a commercially available software system
developed and marketed by the Research Triangle Institute of Research
Triangle Park, North Carolina (USA). SUDAAN was developed as a stand-alone
software system with capabilities for the more important methods for
descriptive and multivariate analysis of survey data, including: estimation
and inference for means, proportions and rates (PROC DESCRIPT and PROC
RATIO); contingency table analysis (PROC CROSSTAB); linear regression (PROC
REGRESS); logistic regression (PROC LOGISTIC); log-linear models (PROC
CATAN); and survival analysis (PROC SURVIVAL). SUDAAN V7.0 and earlier
versions were designed to read directly from ASCII and SAS system data sets.
The latest versions of SUDAAN permit procedures to be called directly from
the SAS system. Information on SUDAAN is available at the following web site
address: http://www.rti.org.
Stata (StataCorp, 1997) is a more recent commercial entry to the available
software for analysis of complex sample survey data and has a growing body of
research users. Stata includes special versions of its standard analysis
routines that are designed for the analysis of complex sample survey data.
Special survey analysis programs are available for descriptive estimation of
means (SVYMEAN), ratios (SVYRATIO), proportions (SVYTOT) and population
totals (SVYTOTAL). Stata programs for multivariate analysis of survey data
currently include linear regression (SVYREG), logistic regression (SVYLOGIT)
and probit regression (SVYPROBT). Information on the Stata analysis software
system can be found on the Web at: http://www.stata.com.
2. Resampling methods:
BRR, JRR and the bootstrap comprise a second class of nonparametric methods
for conducting estimation and inference from complex sample data. As
suggested by the generic label for this class of methods, BRR, JRR and the
bootstrap utilize replicated subsampling of the sample database to develop
sampling variance estimates for linear and nonlinear statistics. WesVar PC
(Brick et al., 1996) is a publicly available software system for personal
computers that employs replicated variance estimation methods to conduct the
more common types of statistical analysis of complex sample survey data.
WesVar PC was developed by Westat, Inc. and is distributed along with
documentation free of charge to researchers from Westat's Web site:
http://www.westat.com/wesvarpc/. WesVar PC includes a Windows-based
application generator that enables the analyst to select the form of data
input (SAS data file, SPSS for Windows data base, dBase file, ASCII data set)
and the computation method (BRR or JRR methods). Analysis programs contained
in WesVar PC provide the capability for basic descriptive (means,
proportions, totals, cross tabulations) and regression (linear, logistic)
analysis of complex sample survey data. WestVar Complex Samples 3.0 is the
latest version of WestVar PC that is licensed and distributed by SPSS.
Information on the latest developments can be obtained at
http://www.spss.com.
These new and updated software packages include an expanded set of user
friendly, well-documented analysis procedures. Difficulties with sample
design specification, data preparation, and data input in the earlier
generations of survey analysis software created a barrier to use by analysts
who were not survey design specialists. The new software enables the user to
input data and output results in a variety of common formats, and the latest
versions accommodate direct input of data files from the major analysis
software systems. Readers who are interested in a more detailed comparison
of these and other survey analysis software alternatives are referred to
Cohen (1997).
Sampling Error Computation Models
---------------------------------
Regardless of whether linearization or a resampling approach is used,
estimation of variances for complex sample survey estimates requires the
specification of a sampling error computation model. NES data analysts who
are interested in performing sampling error computations should be aware that
the estimation programs identified in the preceding section assume a specific
sampling error computation model and will require special sampling error
codes. Individual records in the analysis data set must be assigned sampling
error codes that identify to the programs the complex structure of the sample
(stratification, clustering) and are compatible with the computation
algorithms of the various programs. To facilitate the computation of
sampling error for statistics based on 2000 NES data, design-specific
sampling error codes will be routinely included in all public-use versions of
the data set. Although minor recoding may be required to conform to the
input requirements of the individual programs, the sampling error codes that
are provided should enable analysts to conduct either Taylor Series or
Replicated estimation of sampling errors for survey statistics.
Table 7 defines the sampling error coding system for 2000 NES sample cases.
Two sampling error code variables are defined for each case based on the
sample design primary stage unit (PSU) and area segment in which the sample
household is located.
Sampling Error Stratum Code (Variable 000097). The Sampling Error Computation
Stratum Code is the variable that defines the sampling error computation
strata for all sampling error analysis of the NES data. Each self-
representing (SR) design stratum is represented by one sampling error
computation stratum. Pairs of similar nonself-representing (NSR) primary
stage design strata are "collapsed" (Kalton, 1977) to create NSR sampling
error computation strata. Since there was an uneven number of nonself-
representing MSA and non-MSA strata used in the 2000 NES, and since it was
felt that a nonself-representing MSA PSU should be paired with a non-MSA PSU,
one of each of these PSUs stands alone within its Sampling Error Stratum
Code.
For the 1990 SRC National Sample design controlled selection and a "one-per-
stratum" PSU allocation are used to select the primary stage of the 2000 NES
national sample. The purpose in using controlled selection and the "one-per-
stratum" sample allocation is to reduce the between-PSU component of sampling
variation relative to a "two-per-stratum" primary stage design. Despite the
expected improvement in sample precision, a drawback of the "one-per-stratum"
design is that two or more sample selection strata must be collapsed or
combined to form a sampling error computation stratum. Variances are then
estimated under the assumption that a multiple PSU per stratum design was
actually used for primary stage selection. The expected consequence of
collapsing design strata into sampling error computation strata is the
overestimation of the true sampling error; that is, the sampling error
computation model defined by the codes contained in Table 7 will yield
estimates of sampling errors which in expectation will be slightly greater
than the true sampling error of the statistic of interest.
SECU - Stratum-specific Sampling Error Computation Unit code (Variable OOOO97)
is a half sample code for analysis of sampling error using the BRR method or
approximate "two-per-stratum" Taylor Series method (Kish and Hess, 1959).
Within the SR sampling error strata, the SECU half sample units are created
by dividing sample cases into random halves, SECU=1 and SECU=2. The
assignment of cases to half-samples is designed to preserve the
stratification and second stage clustering properties of the sample within an
SR stratum. Sample cases are assigned to SECU half samples based on the area
segment in which they were selected. For this assignment, sample cases were
placed in original stratification order (area segment number order) and
beginning with a random start entire area segment clusters were
systematically assigned to either SECU=1 or SECU=2.
In the general case of nonself-representing (NSR) strata, the half sample
units are defined according to the PSU to which the respondent was assigned
at sample selection (with the exception of the two unpaired NSR strata
mentioned above). That is, the half samples for each NSR sampling error
computation stratum bear a one-to-one correspondence to the sample design NSR
PSUs. The particular sample coding provided on the NES public use data set
is consistent with the "ultimate cluster" approach to complex sample variance
estimation (Kish, 1965; Kalton, 1977). Individual stratum, PSU and segment
code variables may be needed by NES analysts interested in components of
variance analysis or estimation of hierarchical models in which PSU-level and
neighborhood-level effects are explicitly estimated.
Table 7 shows the area sample sampling error stratum and SECU codes to be
used for the paired selection model for sampling error computations for any
2000 NES analyses. Strata 01 through 26 reflect the half sample 1990
National Sample design used for the 2000 NES area sample. It can be seen
from this table that the three-digit 2000 SE code is comprised of, first, the
two-digit SE Stratum code followed by the one-digit SECU code. The RDD sample
cases are assigned to Strata 27 through 66. The RDD sample is a stratified
unclustered design. In order to reflect the stratification of the RDD frame,
the sample was sorted by area code within metropolitan status within Census
Division prior to the assignment of sampling error stratum and SECU codes.
The sorted file was then divided into groups of 20 adjacent cases to form the
strata. Within each stratum, cases were assigned alternately to each of the
pair of SECUs, 10 cases per SECU. This assignment of sampling error stratum
and SECU codes allows for design effects to be estimated for the complete NES
data set as well as separately for the RDD and area sample components.
Table 7: 2000 NES Election Study Sampling Error Codes.
==============================================================================
SE SECU SE Code PSU Segment #s Total Rs
Stratum
==============================================================================
01 1 011 120 015, 031, 047, 063, 079, 099 11
2 012 120 007, 023, 039, 055, 071, 087 11
02 1 021 190 007, 023, 039, 055, 071, 087 11
2 022 190 016, 031, 047, 063, 079, 095 13
03 1 031 130 011, 028, 044, 060 8
2 032 130 004, 020, 036, 052, 068 15
04 1 041 121 002, 018, 034, 050 10
2 042 121 010, 026, 042 6
05 1 051 131 016, 032, 047 11
2 052 131 008, 024, 040 10
06 1 061 150 007, 023, 039 11
2 062 150 015, 031, 047 8
07 1 071 171 010, 026, 042 6
2 072 171 002, 018, 034 7
08 1 081 110 004, 020, 036 6
2 082 110 012, 028, 044 5
09 1 091 170 011, 027, 031, 039 17
2 092 154 003, 007, 011, 015, 019 13
170 007, 019
10 1 101 122 008, 012, 015, 024, 028, 032 18
2 102 152 004, 012, 016, 020, 028, 032 13
11 1 111 141 004, 008, 016, 020, 024, 032 12
2 112 132 001, 005, 009, 013, 017, 021 18
12 1 121 191 001, 005, 009, 017, 021, 025 27
2 122 181 001, 005, 009, 013, 017, 021 20
13 1 131 194 004, 008, 016, 020, 024, 032 17
2 132 196 002, 006, 010, 014, 018, 022 15
14 1 141 220 001, 005, 009, 013, 017, 021 40
2 142 226 002, 006, 010, 014, 018, 022 24
15 1 151 211 004, 007, 011, 015, 020, 023 9
2 152 213 004, 008, 012, 016, 020, 024 17
16 1 161 230 002, 006, 010, 014, 018, 022 45
2 162 434 002, 304, 306, 008, 010, 011 23
17 1 171 239 001, 005, 009, 013, 017, 021 14
2 172 240 002, 006, 010, 014, 018, 022 20
18 1 181 262 002, 006, 010, 014, 018, 022 48
2 182 255 004, 008, 012, 016, 020, 024 17
19 1 191 257 004, 008, 012, 016, 020, 024 23
2 192 258 002, 006, 010, 014, 018, 022 15
20 1 201 273 003, 007, 011, 015, 019, 023 18
2 202 274 002, 006, 010, 014, 018, 022 14
21 1 211 260 003, 007, 011, 015, 019, 023 14
2 212 250 003, 007, 011, 015, 019, 023 21
22 1 221 292 001, 005, 009, 013, 017, 022 20
2 222 293 003, 007, 011, 015, 019, 023 20
23 1 231 464 303, 305, 306, 309, 311, 312 32
2 232 480 301, 302, 303, 305, 306, 307 39
24 1 241 466 301, 302, 304, 305, 306, 308 26
2 242 470 301, 302, 303, 305, 306, 307 43
25 1 251 474 302, 303, 304, 306, 307, 308 40
2 252 477 302, 303, 304, 306, 307, 308 26
26 1 261 280 002, 006, 010, 014, 018, 022 34
2 262 482 301, 303, 304, 305, 307, 308 45
Total: 1006
Generalized Sampling Error Results for the 2000 NES
---------------------------------------------------
To assist NES analysts, the PC SUDAAN program was used to compute sampling
errors for a wide-ranging example set of proportions estimated from the 2000
NES election Survey data set. Sampling errors were computed for the complete
NES data set as well as separately for the area sample and RDD sample
components. For each estimate, sampling errors were computed for the total
sample and for fifteen demographic and political affiliation subclasses of
the 2000 NES sample. The results of these sampling error computations were
then summarized and translated into the general usage sampling error tables
provided in Tables 8 - 10. The mean value of deft, the square root of the
design effect, was found to be 1.098 for the combined sample, 1.076 for the
area sample component, and 1.049 for the RDD sample component. The design
effects were primarily due to weighting effects (Kish, 1965) and did not vary
significantly by subclass size. Therefore the generalized variance tables
are produced by multiplying the simple random sampling standard error for
each proportion and sample size by the average deft for the set of sampling
error computations.
Incorporating the pattern of "design effects" observed in the extensive set
of example computations, Tables 8 - 10 provide approximate standard errors for
percentage estimates based on the 2000 NES. To use the tables, examine the
column heading to find the percentage value which best approximates the value
of the estimated percentage that is of interest. Next, locate the
approximate sample size base (denominator for the proportion) in the left-
hand row margin of the table. To find the approximate standard error of a
percentage estimate, simply cross-reference the appropriate column
(percentage) and row (sample size base). Note: the tabulated values
represent approximately one standard error for the percentage estimate. To
construct an approximate confidence interval, the analyst should apply the
appropriate critical point from the "z" distribution (e.g., z=1.96 for a two-
sided 95% confidence interval half-width). Furthermore, the approximate
standard errors in the table apply only to single point estimates of
percentages not to the difference between two percentage estimates.
The generalized variance results presented in Tables 8 - 10 are a useful tool
for initial, cursory examination of the NES survey results. For more in
depth analysis and reporting of critical estimates, analysts are encouraged
to compute exact estimates of standard errors using the appropriate choice of
a sampling error program and computation model.
Table 8: Generalized Variance Table.
2000 NES election Survey - Combined Sample.
APPROXIMATE STANDARD ERRORS FOR PERCENTAGES
==============================================================================
For percentage estimates near:
Sample n 50% 40% 30% 20% 10%
or 60% or 70% or 80% or 90%
==============================================================================
100 5.49 5.38 5.03 4.39 3.29
200 3.88 3.80 3.56 3.10 2.33
300 3.17 3.10 2.90 2.54 1.90
400 2.74 2.69 2.52 2.20 1.65
500 2.45 2.40 2.25 1.96 1.47
600 2.24 2.20 2.05 1.79 1.34
700 2.07 2.03 1.90 1.66 1.24
800 1.94 1.90 1.78 1.55 1.16
900 1.83 1.79 1.68 1.46 1.10
1000 1.74 1.70 1.59 1.39 1.04
1100 1.66 1.62 1.52 1.32 0.99
1200 1.58 1.55 1.45 1.27 0.95
1300 1.52 1.49 1.40 1.22 0.91
1400 1.47 1.44 1.34 1.17 0.88
1500 1.42 1.39 1.30 1.13 0.85
1600 1.37 1.34 1.26 1.10 0.82
1700 1.33 1.30 1.22 1.06 0.80
1800 1.29 1.27 1.19 1.04 0.78
Table 9: Generalized Variance Table.
2000 NES election Survey - Area Sample.
APPROXIMATE STANDARD ERRORS FOR PERCENTAGES
==============================================================================
For percentage estimates near:
Sample n 50% 40% 30% 20% 10%
or 60% or 70% or 80% or 90%
==============================================================================
100 5.38 5.27 4.93 4.30 3.23
200 3.80 3.73 3.48 3.04 2.28
300 3.10 3.04 2.85 2.48 1.86
400 2.69 2.63 2.46 2.15 1.61
500 2.40 2.36 2.20 1.92 1.44
600 2.20 2.15 2.01 1.76 1.32
700 2.03 1.99 1.86 1.63 1.22
800 1.90 1.86 1.74 1.52 1.14
900 1.79 1.76 1.64 1.43 1.07
1000 1.70 1.67 1.56 1.36 1.02
Table 10: Generalized Variance Table.
2000 NES election Survey - RDD Sample.
APPROXIMATE STANDARD ERRORS FOR PERCENTAGES
==============================================================================
For percentage estimates near:
Sample n 50% 40% 30% 20% 10%
or 60% or 70% or 80% or 90%
==============================================================================
100 5.24 5.14 4.80 4.19 3.14
200 3.71 3.63 3.40 2.96 2.22
300 3.03 2.96 2.77 2.42 1.82
400 2.62 2.57 2.40 2.10 1.57
500 2.34 2.30 2.15 1.88 1.41
600 2.14 2.10 1.96 1.71 1.28
700 1.98 1.94 1.82 1.58 1.19
800 1.85 1.82 1.70 1.48 1.11
References
Alegria, M., Kessler, R., Bijl, R., Lin, E., Heeringa, S.G., Takeuchi, D.T.,
Kolody, B. (2000). To appear in The Unmet Need for Treatment. Proceedings
of a Symposium of the World Psychiatric Association, Sydney, Australia,
October, 1997.
Binder, D.A. (1983), "On the variances of asymptotically normal estimators
from complex surveys," International Statistical Review, Vol. 51, pp. 279-
292.
Brick, J.M., Broene, P., James, P., & Severynse, J. (1996). "A User's Guide
to WesVar PC." Rockville, MD: Westat, Inc.
Cochran, W.G. (1977). Sampling Techniques. New York: John Wiley & Sons.
Cohen, S.B. (1997). "An evaluation of alternative PC-based software packages
developed for the analysis of complex survey data," The American
Statistician, Vol. 51, No. 3, pp. 285-292.
Goldstein, H. (1987). Multi-level Models in Educational and Social Research.
London: Oxford University Press.
Kalton, G. (1977), "Practical methods for estimating survey sampling errors,"
Bulletin of the International Statistical Institute, Vol. 47, 3, pp. 495-514.
Kish, L. (1949). "A procedure for objective respondent selection within the
household," Journal of the American Statistical Association, Vol. 44, pp.
380-387.
Kish, L. (1965), Survey Sampling. New York: John Wiley & Sons, Inc.
Kish, L., & Frankel, M.R. (1974), "Inference from complex samples," Journal
of the Royal Statistical Society, B, Vol.
36, pp. 1-37.
Kish, L., Groves, R.M., & Krotki, K.P. (1975). "Sampling errors for
fertility surveys." Occasional Paper No. 17. Voorburg, Netherlands: World
Fertility Survey, International Statistical Institute.
Kish, L., & Hess, I. (1959), "On variances of ratios and their differences in
multi-stage samples," Journal of the American Statistical Association, 54,
pp. 416-446.
LePage, R., & Billard, L. (1992), Exploring the Limits of Bootstrap. New
York: John Wiley & Sons, Inc.
Mahalanobis, P.C. (1946), "Recent experiments in statistical sampling at the
Indian Statistical Institute," Journal of the Royal Statistical Society, Vol.
109, pp. 325-378.
McCullagh, P.M. & Nelder, J.A. (1989). Generalized Linear Models, 2nd
Edition. Chapman and Hall. London.
Rao, J.N.K & Wu, C.F.J. (1988.), "Resampling inference with complex sample
data," Journal of the American Statistical Association, 83, pp. 231-239.
Rosenstone, Steven J., Kinder, Donald R., Miller, Warren E., & the National
Election Studies 1994 Sample Design: Technical Memoranda, 1994 Election Study
pp. 882-905 in Rosenstone, Steven J., Kinder, Donald R., Miller, Warren E., &
the National Election Studies, AMERICAN NATIONAL ELECTION STUDY, 1994:
ELECTION SURVEY (ENHANCED WITH 1992 AND 1993 DATA) (Computer file).
Conducted by University of Michigan Center for Political Studies. 2nd ICPSR
ed. Ann Arbor MI: University of Michigan, Center for Political Studies, and
Inter-university Consortium for Political and Social Research (producer),
1995. Ann Arbor MI: Inter-university Consortium for Political and Social
Research (distributor), 1995.
Rust, K. (1985). "Variance estimation for complex estimators in sample
surveys," Journal of Official Statistics, Vol. 1, No. 4.
SAS Institute, Inc. (1990). SAS/STAT User's Guide, Version 6, Fourth Ed.,
Vol. 2. Cary, NC: SAS Institute, Inc.
Shah, B.V., Barnwell, B.G., Biegler, G.S. (1996). SUDAAN User's Manual:
Software for Statistical Analysis of Correlated Data. Research Triangle
Park, NC: Research Triangle Institute.
Skinner, C.J., Holt, D., & Smith, T.M.F. (1989). Analysis of Complex
Surveys. New York: John Wiley & Sons.
SPSS, Inc. (1993). SPSS for Windows: BASE System User's Guide, Release 6.0.
Chicago, IL: SPSS Inc.
Stata Corp. (1997). Stata Statistical Software: Release 5.0. College
Station, TX: Stata Corporation.
Wolter, K.M. (1985). Introduction to Variance Estimation. New York:
Springer-Verlag.
Woodruff, R.S. (1971), "A simple method for approximating the variance of a
complicated estimate," Journal of the American Statistical Association, Vol.
66, pp. 411-414.
Yamageuchi, K. (1991). Event History Analysis. Applied Social Research
Methods Series, Vol. 28. Newbury Park, CA/London: Sage Publications. Office
of Management and Budget (OMB) June 1990 definitions of MSAs, NECMAs,
counties, parishes, independent cities.
Walter Mebane
Mon Nov 19 01:34:04 EST 2001