UP504
|
Sampling Fraction |
![]() |
Sampling Fraction -- or --
why do we usually just care about the sample size, not the population size?
sample (N)
vs. population (M)
sampling fraction =
N/M
the actual formula for the standard error (standard deviation of the sampling distribution) is:
where f = sampling fraction = N / M
but since typically M >> N
then f --> 0
so 1-f becomes 1, and so the formula for the standard error becomes:
Comparison of Corrected and Uncorrected Standard Error
Calculations of a Hypothetical Population of 38,000 (and standard deviation
of 20,000).
sample
size (n) |
Population
size (m) |
sampling
fraction (f) |
standard
deviation of the sample |
std
error (corrected) |
std
error (uncorrected) |
Percent
difference between corrected and uncorrected standard error |
1 |
38,000 |
0.00003 |
20,000 |
19999.7 |
20000.0 |
0.00% |
100 |
38,000 |
0.00263 |
20,000 |
1997.4 |
2000.0 |
0.13% |
200 |
38,000 |
0.00526 |
20,000 |
1410.5 |
1414.2 |
0.26% |
400 |
38,000 |
0.01053 |
20,000 |
994.7 |
1000.0 |
0.53% |
800 |
38,000 |
0.02105 |
20,000 |
699.6 |
707.1 |
1.06% |
1,600 |
38,000 |
0.04211 |
20,000 |
489.4 |
500.0 |
2.13% |
3,200 |
38,000 |
0.08421 |
20,000 |
338.3 |
353.6 |
4.30% |
6,400 |
38,000 |
0.16842 |
20,000 |
228.0 |
250.0 |
8.81% |
12,800 |
38,000 |
0.33684 |
20,000 |
144.0 |
176.8 |
18.57% |
25,600 |
38,000 |
0.67368 |
20,000 |
71.4 |
125.0 |
42.88% |
37,999 |
38,000 |
0.99997 |
20,000 |
0.5 |
102.6 |
99.49% |
38,000 |
38,000 |
1.00000 |
20,000 |
0.0 |
102.6 |
100.00% |
Note that there is very little difference in using the corrected vs. uncorrected standard error until the sampling fraction gets large. For example, even with a sample of 800 (out of a total population of 38,000), the difference is only 1 percent. The two estimates of standard error only begin to deviate significantly when the sample size is more than several thousand (that is, when the sampling fraction approaches about 10% or more).
Moral of the story: it is fine -- and more conservative -- to use the uncorrected estimate, which is easier to calculate anyway.