Further Notes on Survey Research
1. Introduction
2. Sampling
3. Measures and Indicators
4. Constructing Questionnaires
Urban Planning 504:  Quantitative Planning Methods
Prof. Campbell
University Of Michigan

last modified:  Monday, March 3, 2008 1:12 PM

Other Links
Institute for Social Research, Survey Research Center (U-M)
"Monitoring the Future" Study and its sampling methodology
THE 2000 US CENSUS
the 2000 Census questionnaires (both short and long forms)
2000 Census plans and operations

Some sources and readings on survey research:

course electronic reserve readings:
Babbie, Earl. "Ch. 4 Research Design, Ch. 5 Conceptualization, Operationalization and Measurement" in The Practice of Social Research. 9th edition, Belmont, CA: Wadsworth, 2003, pp. 90 - 147. [pdf]

Fowler, Floyd. "Sampling," in Survey Research Methods. 2nd edition, Sage, 1993, pp. 10-37. [pdf]

Nishikawa, Nancy . "Ch 2: Survey Methods for planners," in The Planner’s use of information, edited by Hemalata C. Dandekar Chicago, IL: American Planning Association, 2003, pp. 51-78. [pdf]

see also:
Babbie, Earl. 1994. The Practice of Social Research. 7th ed. Belmont, CA: Wadsworth.
Bailey, Kenneth D. 1987. Methods of Social Research. 3rd ed. New York: Free Press.
Dillman, Don A. 1978. Mail and Telephone Surveys. New York: Wiley.
Fowler, Floyd J. Jr. 1993. Survey Research Methods. Beverly Hills: Sage Publications.


 


1. Introduction to Survey Research
Survey Research -- a form of observation

Why sometimes seen as boring:

  • often associated with consumer attitudes, preferences, and other seemingly mundane topics
  • questions often very simple and "dumbed down" to fit into a questionnaire format
  • the design of survey research involves lots of minute details -- crucial when you are actually designing a survey, but boring in the abstract while sitting in a classroom 


Why surveys sometimes fail -- several common mistakes:

  • pick the wrong sampling strategy
  • low response rate
  • bad questionnaire:  too long, asked the wrong questions, etc.
  • lots of data but no easy way to analyze


 

Some basic terms:
primary data:  data that you collect (e.g., through a survey, an interview, etc.) -- that is, collecting original data.
secondary data:  data that has already been collected by someone else (e.g., the US Census) 

census:  "(1) a count of the population and a property evaluation in early Rome;  (2) a usually complete enumeration of a population..."  (Merriam Webster's 10th)

sample: [ME, fr. MF essample, fr. L exemplum -- more at EXAMPLE]  "(1) a representative part of a single item from a larger whole or group esp. when presented for inspection or shown as evidence of quality:  SPECIMEN  (2) a finite part of a statistical population whose properties are studied to gain information about the whole"   (Merriam Webster's 10th)

sample (N) vs. population (M)
sampling fraction = N/M
questionnaire
survey instrument
respondent
unit of analysis
longitudinal vs. cross-sectional
panel survey
Heisenberg Uncertainty Principle  -- for research, this implies that one cannot observe with disturbing in some way the observed.

 

The Culture of survey research:

Conducting a survey is a form of communicative inquiry -- of asking questions.
Both communication and inquisitiveness are shaped by culture.

  • therefore the questions we ask (survey questions)
  • the reasons we ask them (theory and hypotheses)
  • the way we ask them  (survey design -- e.g., over the telephone, in-person on the street)
  • whom we ask  (the sample)
  • and the way they are answered (responses)
are embedded in the culture where we live and work. 
as a result, survey research is both a science and a social activity (a dialogue), and cannot be easily exported from one place to another. 
 

Examples: 

  • questionnaires on environmental awareness in post Cold War Poland
  • interviews of U.S. defense contractors (and open- vs. closed-ended questions;  exploratory vs. more focused questioning)
Key Themes:
  1. surveys take a lot of time:  only engage in primary research if secondary data is not available (e.g., no data, data too old, not for the people / places of interest, not the right variables, etc.)
  2. keep the survey simple and very straightforward:   the challenge is to ask simple, clear questions (with easily tabulated answers) that help us understand complex, often messy social issues.
  3. surveys are the confluence of statistics and human nature:   be scientific, but also socially perceptive

  4. you want to know the framework of results before you start designing the questionnaire.  Avoid asking questions without knowing what you will do with the answers.  Surveys should start with clear research and policy questions.

 
Steps of Survey Research:
A Hypothetical Survey Research Schedule

Q     Questionnaire Construction
FC      Formatting and Coding
S          Sampling
PT      Pretest
A        Analysis
PR      Presentation


 
 
 
 

Ways to Administer Surveys:
This will shape your questionnaire, as well as your sample.
 
How the Questionnaire gets to the Respondent:
a. in-person survey
b. mail
c. telephone
d. internet
How they answer the questionnaire:
1.  self-administered
2.  administered by staff

see also Trochim on the plus and minus of survey methods
 
 

Some Advantages and Difficulties with Each Approach (ordered from low to high-tech)
In-person Mail Survey telephone internet
Advantages
  • higher response rates
  • the most personable of the approaches
  • lots of control over the process
  • cost savings:  no interviewers
  • time savings
  • questionnaire can be done at the respondent's convenience
  • greater assurance of anonymity
  • standardized wording (no interview bias)
  • respondents can check records
  • accessibility:  can reach anyone with a mail address
  • fast
  • lower cost than face-to-face
  • more anonymous than face to face
  • can use wide geographic area
  • random-digit dialing facilitates an unbiased sample
  • Computer assisted technology both accelerates the process and allows for complex filtering
  • answers already in electronic format
  • can allow for complex filtering questions
  • very quick response time
  • can reach anyone with access to a computer worldwide
  • perhaps the quickest to fill out (especially if multiple choice)
  • Difficulties
  • can be costly, especially if covering a wide geographic area
  • time-consuming
  • can be expensive
  • requires good training of interviewers
  • lack of flexibility:  no probing, no corrections, no soothing reluctant respondents
  • low response rate
  • no control over the environment
  • no control over question order
  • no control over who fills out the questionnaire
  • no control over date of response
  • cannot use complex questionnaire format (e.g., complicated filtering)
  • lots of resistance to telephone interviews
  • can be confused with marketing
  • needs to be fairly short
  • no good for long questions, long lists of answer categories
  • can take many tries to reach someone at home
  • no visual materials possible
  • bias introduced -- only some people have access to computers.  how do you determine the population?
  • low response rate
  • hard to get long answers to open-ended questions (as compared to in-person or telephone)
  • a mistrust of the way the data might be used (e.g., for marketing?)

  •  
     

    Which way would you recommend administering each survey below?
    Population Topic Mail Survey In-person telephone internet
    homeless in New York City their diet and eating habits no yes no no
    New Yorkers their view of the homelessness yes yes best? ?
    U.S. city planners their use of computers yes  yes yes best?
    typical Americans reactions to the recent bombings in Afghanistan too slow? yes yes no?
    prisoners prison life yes yes ? no
    New York subway riders crime on the subway
    Chicago residents attitude about the Sears Tower
    Chicago residents attitude about a new proposed skyscraper design
    8-year-old children in Ann Arbor their favorite public places to play
    8-year-old children in Mozambique their favorite public places to play
    college students their feelings about racism
    Muscovites their views on Putin and democracy in Russia
    U.S. Senators their support of recycling laws
    married women their fear of violence from their husbands
    college students their recognition of 100 book titles

     
     



     
    2. Sampling

     
    Why Sample?
    1. can be almost as accurate as the whole population (sometimes even better:  a well done sample is more useful than a poorly done "census"/full-count).
    2. one can save money
    3. one can save time -- this allows for measurement at a single point in time (e.g., opinions about an election on a specific date)
    4. less questionnaires mean less paperwork means less chance of error

    5. less people interviewed means a lower profile for the the research project (which can be useful).
    Terms:
    unit of analysis
    sample
    randomness
    sampling element -  the unit about which information is collected (similar to unit of analysis.   sampling element refers to sample selection;  unit of analysis refers to the analysis stage). 
    sampling frame - the list of units from the population from which a sample is selected.
    population - the theoretically specified aggregation of study elements.
    study population - the aggregation of elements from which the sample is selected.  (usually somewhat smaller than the population because not every element of the population has a chance to being selected.)
    sample size
    response rate  - the number of persons participating in the survey as a percent of the total sample size.   for example:  the percentage of questionnaires that are returned. 
    respondent - a person who responds to a survey, e.g., by completing a questionnaire.
    sampling error - inaccuracy in moving from sample observations to population parameters.
    weighted samples - sampling where there is an unequal but known chance of each element being selected.  (each case is then assigned a "weight" to adjust for these unequal chances).  The weight is proportional to the number of persons in the population that the individual case represents.
    bias  - "systematic error introduced into sampling or testing by selecting or encouraging one outcome or answer over another."   (Merriam Webster's 10th)

    What is Sampling Based on?   The Concepts of Probability and Randomness

    that is, the probability of each individual being included in the sample.
    We assume that randomness (random sampling) creates an equal chance of each element being selected.
     
    A sample will be representative if all members of the population have an equal chance of being picked.

     
     

    Sample Size
    depends on:


    Sampling Fraction -- or -- why do we usually just care about the sample size, not the population size?
    sample (N) vs. population (M)
    sampling fraction = N/M

    the actual formula for the standard error (standard deviation of the sampling distribution) is:

    where f = sampling fraction = N / M

    but since typically M >> N
    then f --> 0

    so 1-f becomes 1, and so the formula for the standard error becomes:

    Comparison of Corrected and Uncorrected Standard Error Calculations of a Hypothetical Population of 38,000 (and standard deviation of 20,000).
     
    sample size (n)
    Population size (m)
    sampling fraction (f)
    standard deviation of the sample
    std error (corrected)
    std error (uncorrected)
    Percent difference between corrected and uncorrected standard error
    38,000 
    0.00003
    20,000 
    19999.7
    20000.0
    0.00%
    100 
    38,000 
    0.00263
    20,000 
    1997.4
    2000.0
    0.13%
    200 
    38,000 
    0.00526
    20,000 
    1410.5
    1414.2
    0.26%
    400 
    38,000 
    0.01053
    20,000 
    994.7
    1000.0
    0.53%
    800 
    38,000 
    0.02105
    20,000 
    699.6
    707.1
    1.06%
    1,600 
    38,000 
    0.04211
    20,000 
    489.4
    500.0
    2.13%
    3,200 
    38,000 
    0.08421
    20,000 
    338.3
    353.6
    4.30%
    6,400 
    38,000 
    0.16842
    20,000 
    228.0
    250.0
    8.81%
    12,800 
    38,000 
    0.33684
    20,000 
    144.0
    176.8
    18.57%
    25,600 
    38,000 
    0.67368
    20,000 
    71.4
    125.0
    42.88%
    37,999 
    38,000 
    0.99997
    20,000 
    0.5
    102.6
    99.49%
    38,000 
    38,000 
    1.00000
    20,000 
    0.0
    102.6
    100.00%

    Note that there is very little difference in using the corrected vs. uncorrected standard error until the sampling fraction gets large.  For example, even with a sample of 800 (out of a total population of 38,000), the difference is only 1 percent.  The two estimates of standard error only begin to deviate significantly when the sample size is more than several thousand (that is, when the sampling fraction approaches about 10% or more).

    Moral of the story:   it is fine -- and more conservative -- to use the uncorrected estimate, which is easier to calculate anyway.
     
     


    Constructing the Sampling Frame:  the actual list of units from the population from which a sample is selected (e.g., the membership list from the American Planning Association;  the New York City phone book)
     
     
    Level Definition Example
    Study Population the aggregation of elements from which the sample is selected. American city planners
    Sampling Frame the actual list of units from the population from which a sample is selected the 1999 membership list from the American Planning Association
    Sample the collection of elements selected the 200 members selected from the list
    Sampling Elements (cases) the individual elements the individual (the APA member)

     
     
    "A sample cannot be more accurate than the sampling frame from which it is drawn." (Bailey, p. 84)
    AND:  you CANNOT overcome this inaccuracy through increasing the sample size.

     

    EXAMPLE:   if interviewing 40 people on the street at midnight (the sampling frame) about their social life leads to a biased sample (over selection of people with an active social life -- or else insomniacs), then increasing the sample size to 200 people (at midnight) will not overcome this initial bias.
    The critical question to ask is:  WHO did NOT response to the survey, and how are these people different from those that did answer?  The more the reasons for why someone responded (or didn't) are correlated to the variables in your survey, the more your sampling strategy will introduce biases into the results.

      
    EXAMPLE:   doing an online survey about how comfortable and familiar people are with technology will clearly lead to a biased outcome (since the criterion for being included in the sampling frame is highly correlated to the questions -- and thus your variables -- in the survey). 

    HOWEVER:   you would likely have less bias with this sampling frame if your variables are less correlated to the selection criterion:  e.g., whether they own a dog or cat.  Yet there may be in fact a relationship between pet ownership and computer use, which you would only know if you could also survey non computer users.   The ideal is to use selection criteria that are orthogonal (statistically independent) with the variables in the questionnaire.

    So:   it is always safest to start with a non biased sampling frame.



    EXAMPLE:   A SURVEY ABOUT NEW URBANISM

    A door-to-door survey was conducted in various U.S. suburban areas.
    One survey question was:  "Would you prefer to live in a higher density town with more public transit, more front porches, more walkable space, even if it means smaller houses and lots?"

    In the population as a whole, 25 percent would have answered yes.

    Can we know how biased the respondent sample will be as compared to the population as a whole?
    Answer?   Only if we can know (or at least estimate) the preferences of the nonrespondents and of the population as a whole.

    Example:  Given a specific response rate (e.g., 80%) and knowledge of both nonrespondent preferences
    (e.g., 40%) and the total population's preferences (25%), we can know what the respondent sample mean will be (21%),
    and thus how biased this is (4% too low).
     

    Respondent sample mean as a function of Response Rate and Percent of Non-Respondents who would prefer the higher density town (if the population mean = (µ = 25%)
     
    Percent of Non-Respondents who would prefer the higher density town  (µ = 25%)
    Response Rate
    10%
    20%
    25%
    30%
    40%
    50%
    75%
    95%
    26%
    25%
    25%
    25%
    24%
    24%
    22%
    high response rate 
    90%
    27%
    26%
    25%
    24%
    23%
    22%
    19%
    80%
    29%
    26%
    25%
    24%
    21%
    19%
    13%
    70%
    31%
    27%
    25%
    23%
    19%
    14%
    4%
    60%
    35%
    28%
    25%
    22%
    15%
    8%
    50%
    40%
    30%
    25%
    20%
    10%
    0%
    40%
    48%
    33%
    25%
    18%
    3%
    30%
    60%
    37%
    25%
    13%
    low response rate
    no bias

    One can express this table as a graph:

     

    for more information on non-responses, see papers from the International Conference on Survey Nonresponse 1999
     
     
     


    Where might sample biases be introduced?  At several stages of the process
    the selection of the population (e.g., all adults in Ann Arbor)
    the selection of the sampling frame (e.g., a list of residential phone numbers in Ann Arbor)
    the creation of the sample (e.g., completed responses to a telephone survey) -- especially if a low response rate.
     
     

    Two Approaches to Sampling:

    examples of probability sampling examples of nonprobability sampling
    simple random
    systematic sampling
    stratified random sampling
    cluster sampling
    convenience
    quota sampling
    dimensional sampling
    purposive sampling
    snowball sampling

     


    probability sampling
    every individual must have a known probability of being included in the sample.

    simple random sample (sampling without replacement -- that is, no-one can be picked twice)
    Construct a list (a sampling frame) of the study population and randomly select, say, 100 individuals.

    Each person, regardless of groupings, has an equal chance of being selected.

    this is the ideal type;  but often hard to achieve.  It sounds easier than it is to do.
    Why?  can you directly obtain a sampling frame?  (an actual list of sampling units).   Sometimes it doesn't exist, and to create it would be a terribly massive, frustrating undertaking.

    Example:   how would you randomly sample and interview 50 people who might shop at Krogers over a given day using random sampling.
    or:   randomly sample 100 Americans (would require a list of 274 million names).
     

    A simple random sample of 20 cases
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    161
    162
    163
    164
    165
    166
    167
    168
    169
    170
    171
    172
    173
    174
    175
    176
    177
    178
    179
    180
    181
    182
    183
    184
    185
    186
    187
    188
    189
    190
    191
    192
    193
    194
    195
    196
    197
    198
    199
    200


    systematic sampling
    (a modest variation on simple random sampling)
    if the sampling frame ordering itself is random, then select every Kth element from the frame.
    Examples:   write down the make and model of every 10th car on the freeway;  interview every 20th person who enters Kroger.
     

    A systematic random sample of 20 cases
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    161
    162
    163
    164
    165
    166
    167
    168
    169
    170
    171
    172
    173
    174
    175
    176
    177
    178
    179
    180
    181
    182
    183
    184
    185
    186
    187
    188
    189
    190
    191
    192
    193
    194
    195
    196
    197
    198
    199
    200

     


    stratified sampling
    (a more fundamental departure from simple random sampling)
    1. divide the population into strata (non-overlapping groups, hopefully fairly homogenous)
    2. select a random sample from each strata (they need not be rank-ordered
    3. aggregate (combine) the results from the strata to make inferences about the population
    You can have one or several dimensions of strata:
    e.g., the NSF Survey of Scientists and Engineers stratified by occupation, race and sex.  Why?   because of the large number of white male civil engineers in the US, but far fewer black female physicists, the survey researchers OVER SAMPLED the latter group (and under sampled the former group) so that there would be enough black female physicists in the sample to make statistically significant inferences.   (For example:  a simple random sample might lead to only 2 black female physicists being selected in the sample;  over sampling might lead to 10 or more.)

    Over sampling leads to weighted samples, thus the use of weights.
    The weight is proportional to the number of individuals in the population as a whole that each case in the sample "represents."

    If you see weights for each case in a data set, that is an indication of a stratified survey with differential probabilities of each case being selected.
    (Don't ignore the weights -- otherwise you will create biased estimates of the population.)

    Stratified Sampling can also lower sampling error,
    but only when the variable(s) used to stratify the data (e.g., faculty rank) are statistically related to the variables in the sample (e.g., faculty salaries).
     
     

    Example:   use of weighted stratified sampling
    a survey of planners, doctors and robbers in New York City.

    Since planners and robbers both small subgroups of the population, a simple random sample would lead to only about 15 planners and 60 robbers included in the sample (too small to make accurate inferences about these subgroups -- especially planners)

    Solution?
    Divide into 3 occupational strata and over sample planners and robbers.   Since each strata has a different probability of being selected, use weights to adjust before aggregating to the population as a whole.

    Visually, the 3 strata can be visualized as follows:


     0  ------  percent of each strata in sample ------>>>    100%

    Above, one sees the effect of over-sampling planners (and undersampling doctors)

    In table form:
    Population Simple Random Sample (unstratified) n=300) Stratified Sample n=300 Probability of Selection in Stratified Sample Weight = 1 / probability of being selected
    Planners 2,000 15 100 100/2000 = .05 20
    Doctors 30,000 225 100 100/30000 = .0033 300
    Robbers 8,000 60 100 100/8000 = .0125 80
    TOTAL 40,000 300 300 --- ---

    How to interpret weights?   Each planner in the sample represents 20 planners in the population as a whole, while each sampled doctor represents 300 doctors in the population as a whole.
    If you see survey data where all the cases have weights = 1, then no disproportional sampling.



    cluster sampling
    used where
    (1) no sampling frame directly available, and/or
    (2) simple random sampling would be expensive, complex, time-consuming and/or logistically difficult.
     
    1. determine one or more intermediate levels (sampling units) between the overall study population and the ultimate sampling elements.
    2. for each level (sampling unit), take a random sample of each, and then a random sample within that larger "cluster", etc. (Since this process involves more than one stage or step of sampling, it is often called "Multistage Cluster Sampling".


    Example:  Imagine that you wanted to conduct in-person interviews with neighborhood organizations.
    There are 9 cities scattered around the country with the relevant types of organizations, and 16 organizations within each of the 9 cities (or 144 total organizations).  You need to interview 12 organizations.  A simple random sample would likely require interviews in (and this travel to) these 9 distant cities:

    If you used multi-stage clustered sampling, you would first randomly select a certain number of cities (here three), and then randomly select four organizations within each of the three cities.  This saves travel time, and also makes it easier to assemble a sampling frame (a list of the ultimate sampling elements).

     
     
    AN UNUSUALLY COMPLICATED EXAMPLE: 
    from United States to 
       8 states to 
          4 counties within each state to 
             4 school districts within each county to 
                4 schools within each district to 
                   3 classrooms within each school to
                      5 students within each classroom. 
    (TOTAL number of sampling elements = 8 x 4 x 4 x 4 x 3 x 5 = 7680 students.

    each stage of sampling leads to the selection of a sampling unit (e.g., states, cities, schools).  the final stage is the actual selection of the sampling elements (school students).

    Each stage also introduces sampling error (the chance that you might happen to pick the more urban states, the poorer counties, the richer schools, the brighter students, etc.).  With random sampling at 6 different stages, the chance for error is quite high.   To reduce this, you might reduce the number of stages.  For example:   go right from states to districts, and from districts to schools, and from schools to students.  Use only as many stages as needed to obtain the sampling frame.

    So:  a trade-off:  clustered sampling makes some kind of survey research possible or at least easier, but you sacrifice some accuracy.
     


    cluster and stratified compared:
    stratified  sampling cluster sampling
    is a sampling frame directly available? yes.  (always?) No (that is a key reason why you use it).
    which groups are sampled from? ALL groups only selected groups
    Advantages allows for over sampling (and thus greater accuracy) of underrepresented groups (by weighted sampling), thus keeping the overall sample size small.
    can sometimes lower error
    allows for the surveying of individuals even when there no sampling frame available. 
    thus can save time and money
    Disadvantages more complex (multi-stages;  use of weights) more complex
    each stage of random sampling introduces possible error. 
    Ideal situation  high variation across strata
    low variation within strata
    low variation across clusters
    high variation within clusters
    Example a survey of U-M students:  first divide into strata (freshmen, sophomore, Junior, senior, Masters students, doctoral students).  Then take a random sample from each strata.  a survey of college students nation-wide:   (1) first randomly select, say, 6 states;   (2) then randomly select 5 colleges from each state;  (3) then construct sampling frames from each of the 5 x 6 = 30 colleges and randomly select 25 students from each list.  (total:  750 students).
    Advantages over Simple Random Sampling lower sampling error;   greater representation of small strata. often no sampling frame available for simple random sampling (e.g., a single list of college students nationwide)
    because cluster sampling can "cluster" the sampling units into geographic clusters, one would only need to travel to (in our example) to 6 states, rather than to the nearly 50 states that simple random sampling would likely require.

     
     


    nonprobability sampling
    downside:  since one doesn't know the probability of each element being selected, one can't accurately know the relationship between the sample and the population
    advantage:  often much easier, simpler,  faster and cheaper to do.  good for exploratory research.

    convenience
    simply ask who is around (people sitting next to you on the bus, in the laundromat as you do your clothes, etc.)
    as easy as it comes.


    quota sampling
    nonprobability equivalent of stratified sampling.
    each strata is proportionate in size to that group's size in the population.

    e.g., if Democrats were 60% of the population and Republicans were 40%, then ask, say, 120 Democrats and 80 Republicans.

    danger:  potential of biased selection within these quotas.   (that is, biases based on other factors than party)


    dimensional sampling
    like quota sampling, but more than 1 dimensional

    No need to make proportional.   just have at least 1 entry per cell.

    EXAMPLE:
    Cities with Low Public Transit Use Cities with high public transit use
    Growing Cities Phoenix San Francisco
    Declining Cities Memphis Philadelphia

    allows for all pertinent combinations to be represented in a small sample, and hence allow for more in depth study of each.  (such as in case studies -- to be covered later in the semester)
     
     
     


    purposive sampling
    discretion left to the wisdom of the researcher
    can be either typical cases (such as Peoria, Illinois)
    or exceptional cases (such as Berkeley, Madison, Ann Arbor, Ithaca, Palo Alto, Cambridge, Chapel Hill, etc.)

    one type:  reputational sampling.  select cases based on their reputation.
    EXAMPLE:  a study interested in educational reform.   First select states, then districts within state, and then select schools that have the reputation of being innovative.
     


    snowball sampling
    or:  chain referral sampling

    ask someone in the group, who then refers you to others in the group, etc.

    good for deviant subcultures.  e.g., interviewing Hells Angels members, etc.
     
     
     
     
     
     



     
    3. Measures and Indicators 

    Conceptualization: linking concepts to measures.

    Conceptual summaries are based on real observations. (And yet observation is shaped by our mental preconceptions, so there is a two-way interaction between concepts and measures.)

    face validity:  how well does the measure match the concept?

    Danger: reification. "the process of regarding as real things that are not" [Babbie, 1994:  120]

    Can everything be measured?  a philosophical question.  Babbie (110) argues that "... we can measure anything that exists. There are no exceptions. If it exists, we can measure it."  But others would say not.  There is often a difference between the ability to name something, to measure it, to understand it, and to affect it.   We sometimes falsely assume that just because we can name something and measure it, we therefore understand it.  (For example, it is far easier to measure crime, poverty, urban decline and urban despair than to understand it.)
     
     
    EXAMPLE:  can everything be measured?

    Can we put a price on nature? (e.g., of a free flowing river)
    This raises a dilemma:  One may philosophically/ethically object to this, since it represents the commodification of nature -- the forced adaptation of nature into the human world of economic accounting and prices.  And yet the failure to price nature may lead to its undervaluation and exploitation. 
    So:  the economic value of a river may be a poor measure of its total "value" (ecological value, intrinsic value, etc.), but it may be the most practical solution.


     

    Why are measures important?


    Why the gap between the data results and what you want to know?


    Some strategies to select and develop measures:

    Example:  U.S. Census
    Variable Concept Comments
    "Percent of Housing Units built 1939 or before" (from US Census)  Age of housing (or more generally, age of city) Multiple dimensions: There is no simple measure of age of a city, unless one specifies the founding date.
    "Race" Self-identified based on several categories Race Controversial + multiple dimensions.  and as of 2000:  can select more than one category.
    Ancestry Self-identified based on several categories Ancestry Controversial + multiple dimensions.
    Density, etc. Urban Form A complex concept with many qualitatively different dimensions
    Highest education achieved Educational Level Multiple dimensions: Does one simply look at grade level achieved (quantity or INPUT), or a more substantive measure of quality (OUTCOME)?

     

    Other examples of complex concepts:
     
    Concept Unit of Analysis
    Globalization Cities
    sustainability Cities
    High tech Industry
    Level of Economic development City
    Economic class  Person
    Urban social alienation person
    Citizen participation person
    Productivity person

     

    interchangeability of indicators
    sometimes you may want to, e.g., one is easier to collect than another.
    They should have high correlations. Ideally perfect.

    But a danger: sometimes we think certain indicators are interchangeable, but in fact they are different dimensions of a more complex, accurate indicator. (e.g., GPA and GRE scores for intelligence).
     
     
     
     



     
    4. Constructing Questionnaires

     
    Topics
    Reasons for bad or non answers
    Tactics/Tips
    Open-Ended vs. Closed-Ended Question
    Question Order
    Contingency Questions
    Cover Letter
    Instructions for Interviewers or Respondents
    Pretesting
    Examples of questionnaires
    the 2000 U.S. Census questionnaires (both short and long forms)
    2000 Census plans and operations
    American Community Survey
    National Science Foundation surveys
    Disneyland Paris Survey
    The Internet Questionnaire

    [REFER to hypothetical schedule to show placement of questionnaire draft, testing, revision, coding, etc.]
     
     
     

    Reasons for bad or non-answers:

    1. fear that survey is just marketing

    2. fear that information to be used against respondent (e.g., IRS)

    3. is tired of surveys

    4. hates being a guinea pig

    5. "sophisticated respondent" who knows what the survey wants to hear.

    6. gives answers that sound proper, rather than truthful.

    7. fear of sounding uneducated.

    8. my time is too valuable.

    9. I've never thought about that.

    10. too many questions or redundant questions.

    11. not the right questions asked.
     

    Relevance: 2 parts

    1. you are asking the right people (sample)

    2. you are asking the right questions (that match your concepts)
     

    Tactics/Tips:

    1. a good cover letter

    2. put threatening questions at the end, or drop all together.

    3, Assure anonymity.

    4. make the survey seem relevant

    5. have the survey be interesting and at the level of intelligence of the respondents.

    6. Avoid wasted or repetitious questions (this is the task of the "Analysis Review of Questionnaire" and the pretest): i.e., don't ask a question without knowing what you will do with it. (and don't be tempted to piggyback questions that really belong on another survey).

    7. If you are dealing with a heterogeneous group, a few techniques:

    a. use a few different questionnaires.

    b. multiple wording (e.g., wife, husband, girlfriend, etc.)

    c. skips or contingency questions (if yes, then on to Q 7, etc.)

    8. Avoid double-barreled questions (remember, each answer must be quantified, generally along one dimension): or as simple as a yes or no.

    9. Avoid ambiguous questions: i.e., if the terminology is unknown.

    10. keep wording simple and sentences short.

    11. Factual questions are easier than abstract questions: e.g., are you happy?

    12. Avoid leading questions: (i.e., leading the respondent to one answer or another): Do you condemn the violent activities of Saddam Hussein?

    13. Avoid sensitive or threatening questions. Sex, suicide, etc. Structure the question so that a socially-undesirable answer does not stand out. Make it easy for a truthful answer.
     
     

    Open-Ended vs. Closed Ended Questions:
     
    Closed ended positive:

    1. standard answers; can be compared

    2. easier to code

    3. clearer questions lead to clearer answers.

    4. avoids irrelevant information.

    5. some sensitive questions have a better response with predetermined categories (e.g., income)

    6. easier (technically) to answer: check a box rather than filling in a line.
     

    Open-ended positive:

    1. when you don't know all the categories.

    2. can give more detail

    3. when there are too many categories (e.g., place of birth)

    4. work better for complex issues.

    5. more room for creativity and self-expression. (i.e. put a few in an otherwise closed questionnaire to please the respondent, and to give them a chance to provide info. not expressed in closed ended answers).

    Closed ended negative:

    1. too easy to answer when one is guessing or just filling in boxes.

    2. categories are limiting, don't fit.

    3. categories too many or confusing (esp. a problem on the prone)

    4. info falls between the cracks (the categories)

    5. variations fall between the cracks (and appear to be identical answers)

    6. clerical errors (wrong answer checked/circled)

    Open ended negative:

    1. leads to worthless info.

    2. non-standardized info.

    3. hard to code

    4. requires more writing and thinking ability

    5. too general answers?

    6. take more time

    7. require more paper

    so.........

    Use closed ended when:

    1. discrete categories (exhaustive, mutually exclusive as well)

    2. few categories

    3. avoid interval variables that don't break into categories.

    4. better for mailed and self-administered surveys.

     

    Use open-ended when:

    1. complex questions

    2. better for interval scale variables

    3. good for early exploration

    You can easily mix both; and this is commonly done.

     

    Question Order:

    1. sensitive questions come late

    2. open-ended questions late

    3. easy stuff first (e.g., fact over opinion)

    4. ask information needed for subsequent questions first (think of the survey as a memory refreshing process)

    5. put questions in logical order (e.g., chronological)

    put related questions together; don't jump around.

    6. avoid establishing a "response set" i.e., repetition leads to a tendency to answer the questions in a particular way.

    7. separate reliability-check question-pairs.

    8. be careful about adjacent questions with scaling differences. (one approach: scatter them)

    9. vary questions by length and type.

    10. how about the funnel technique? from general to specific.

    (but Bailey recommends to stick with the easy stuff first)
     
     

    Contingency Questions: Very Important

    questions only answered by a subset of the sample, as determined by earlier screening or filter questions.

    Why used? relevancy; don't want to have respondents put too many NAs.

    (alternative: two questionnaires; but that is hard)

    KEY: keep the format simple; arrows can confuse.

    Of course, this will vary tremendously depending on whether it is an interview or mail survey. You can be much more complex with interviews, since the trained interviewer is doing the decisions about which contingency questions should be asked.
     
     

    Cover Letter or Introductory Statement:

    gives legitimacy.

    explains why study is important.

    explains why each respondent is important

    says who is doing it, what will be done with the info, if there is confidentiality,

    Also: how long it will take to complete.

    what to do with the questionnaire (if it is mailed)
     
     

    Instructions for Interviewers or Respondents:

    in mailed questionnaires, keep instructions simple, and perhaps repeat them for each question. (e.g., circle one)

    Face sheet for interviewer:

    can include specifics on interview: time, length, special problems,

    plus race, gender, and other characteristics (intentionally) not asked.
     
     

    Pretesting: very important

    how to work out the flaws.

    it is amazing how many problems jump out during the pretest; problems you could never see by yourself.

    You can always do two pretests, just to be sure.

    KEY: you want to know, before you do the survey, exactly how you will interpret answers. The pretest will largely eliminate the most glaring ambiguities and other problems that prevent this clear interpretation.

    Whom do you pretest? anyone. other students, office workers, etc.

    or: you could pick a clustered sample of your target population and do them.

    This is also a good time to do a test run of your coding and see how you will analyze the data. What is relevant data? what is not? How will you input it? What have you left out?
     

    Options: what to do after the pretest........

    1. adding/subtracting questions.

    2. change wording

    3. changing categories (number, boundaries); even adding categories, if they are not exhaustive

    4. changing open-ended to closed ended, and vice versa

    5. changing the format

    6. adding filter questions and contingency questions

    7. change the ordering

    8. add more space for open-ended questions

    9. get rid of redundancies

    10. look for patterns of answers: response sets. (e.g. agree/disagree, etc.)

    11. change questions with little or no variation: e.g. 98% one answer.

    12. Think about how hard some of the open-ended questions will be to code (e.g. occupation; industry; favorite activity, etc.) How will you cluster or collapse?

    13. how long did it take to answer? Have the pretester respondents note this.

    14. Finally: "feel of the questionnaire": paper, look, cover letter, etc. aesthetics. (most important for mailed questionnaires: does it look like junk mail?)