Contents: |
UP504
(Campbell) |
Writing up the results of quantitative research
* sometimes there is in fact no underlying relationship between the variables in question. (The null hypothesis cannot be rejected.) However, more often there is likely a relationship, yet you can't find it. Therefore, before you conclude that there is no relationship between the variables (and that, e.g., an R-square of 0.12 proves this), consider the following possibilities (and also consider ways to alter your project to address them):
* incomplete data sets, with lots of missing values (this is a particularly frequent frustration of comparative international data).
* a unit of analysis that is the wrong scale, especially when it is too aggregated (e.g., national data, when the relationship you are looking for is best revealed with city or household data). This is a common problem, and the best solution is to collect data at a smaller scale if possible/available. (one runs the risk here of committing an ecological fallacy.)
* not enough cases
(for statistical generalization) -- this can be linked to having too aggregated
a unit of analysis (see above) -- for example, running a regression on seven
counties in SE Michigan will not work; but, running a regression on the hundreds
of census tracts in the region can potentially lead to a statistically significant
model.
How about the opposite: too many cases? We
have spent time in this class examining different sampling strategies, allowing
you to statistically generalize from a relatively small sample to a much larger
population. However, if you already have access to a large data set (such as
3,000 cases), try first to work with the entire data set. There is no inherent
reason why you have to reduce your number of cases through further sampling.
If you have access to an enormous data set (e.g., 120,000 cases and 80 variables),
yes, you might want to first cut it back to make it easier to handle (e.g.,
in Excel). You might also be just interested in a specific subset of cases (e.g.,
only the Michigan counties in a data set of all U.S. counties). Otherwise, take
advantage of the large data set's full power by using all the cases.
* lots of variables, but not the ones you need (an under-specified model)
* Wrong level of measurement: for example, you have nominal or ordinal data, when you really need interval-scale data (remember that different levels of measurement require different statistical tests).
* not enough variation in your dependent variable. (Since a strong model explains most of the variation, with little variation, there is little to explain.)
* non-linear relationships, where there is no obvious linear transformation to correct the problem.
* Using secondary data from poorly constructed survey research (e.g., a questionnaire with ambiguously worded questions).
* the wrong format for your dependent variable. (therefore, try alternatives). Example? If you are using "average commute time (in minutes)" as a variable and getting weak results, perhaps try to include more information about commute times: e.g., percent of people who commute less than 15 minutes/day; percent over 45 minutes; percent who work at home, etc. Means (averages) can be useful measures of central tendency, but they tend to strip the data of lots of its richness and variation. (For example: two communities may both have mean commute times of 30 minutes, but are otherwise quite different: for the first, everyone commutes 30 minutes a day; and for the second, half the people commute zero minutes while the other half commute for an hour.)
* etc.
2.
What do you do in these situations?
If possible, fill in the missing values, add more variables, cases, disaggregate
the data to a smaller scale, do linear transformations, convert absolute values
to percentages, try dummy variables, etc. Or, go the other
direction and convert the data to nominal or ordinal scales and look for statistically
significant patterns there (using chi-square, etc.).
3.
If that doesn't work, try explaining another dependent variable.
Be flexible: shift from a hypothetical-deductive mode
to an exploratory mode.
4.
If that fails, turn "failure" into the subject of your analysis
You can still write up your analysis. Take a step
back and reflect on the analytical process; you may have learned more
from these apparent "failures" than you think. Explain why there
were no obvious and statistically significant patterns found in the data.
Write a treatise on the politics of data. Propose a different study
that would better answer your research question. (I once had several
students who were preparing a statistical compendium of data from apartheid
South Africa; finding little useful data, they wrote a wonderful
paper about the politics of data collection and dissemination in that divided
country.)
5.
Morale: all is not lost if your R-square is low.
That is, for this assignment, if the final results are less than
breathtaking, write a thoughtful review and analysis of the steps you took in
the research project. Refer to the concepts and skills from this
class.
FINALLY: Remember that the final product of this project is not simple data output, but also an intelligent, reflective discussion of the methodology, the results and their implications, the process of defining measures, selecting variables and cases, finding data, and the shortcomings of the research. Develop an overall narrative that places the research in a larger context.
-------------------------------------
SPECIAL CASE: INTERPRETING AND ANALYZING SURVEY DATA
Survey data present special potentials and difficulties for analysis. The data is often heavily nominal and ordinal, not what you need for interval-scale analysis (such as correlation and regression). You often don't have enough cases to be highly statistically significant, especially if the survey is an exploratory one (e.g., 25 cases). You often have a large number of variables (e.g., 50-100 questions asked), and don't know where to begin the analysis. Here is one strategy:
1. Start with the simple task of profiling the respondents overall. Who answered the survey? Who didn't? How representative is the sample for the community in question? How representative is the sample for the larger world? How complete were the responses? How many complete vs. partial responses did you get? It can be very useful to provide a brief overview table or summary of the sample.
2. Then calculate the marginal totals for each question. That is, what frequency of respondents answered each question. (Example: 27% "high", 42% "medium," and 31% "low" to your question about a sense of community.) You might find that incorporating the absolute and percent values for each question directly onto the questionnaire itself (either by hand or typing) is an effective way to communicate this information. (This also allows the reader to see the original format and wording of the questions.)
3. Look for the most interesting variables with significant ranges of answers. (If 98% answered that they believed that "environmental protection is a high priority for our community," that answer be useful in and of itself, but what you want are variables with a wider range of answers.) Then run cross-tabs (cross-tabulations). For example, a 2x3 table comparing male and female responses to the question about a sense of community. (Note that this a nominal variable crossed with an ordinal variable; any nominal and/or ordinal variables will do; or you can demote an interval variable to an ordinal variable.) Note that you can do more than a 2-dimensional cross-tab, e.g., sex by race by "sense of community" (it just gets more complicated). [Cross-tabs are the nominal/ordinal equivalent of scatterplots for interval data.]
Example of crosstabs:
"How often do you participate in community development events in your
neighborhoods?"
absolute counts (and percent of gender groups) listed
Never
|
Less that once a month
|
Once a month or more
|
Total
|
|
Male |
12 (63%)
|
4 (21%)
|
3 (16%)
|
19 (100%)
|
Female |
10 (32%)
|
9 (29%)
|
12 (39%)
|
31 (100%)
|
Total |
22 (44%)
|
13 (26%)
|
15 (30%)
|
50 (100%)
|
4. If there seems to be a pattern in a cross tab, run a test of significance, such as chi-square (for nominal data), or Kendall's tau, gamma, etc. (for ordinal data).
Cross-tabs in SPSS:
The cross-tabs function in SPSS can be found in:
Analyze > descriptive statistics > crosstabs
To run tests of statistical significance on your crosstabs, within the crosstabs dialogue box, click on the "statistics" button. In the statistics window, select the appropriate crosstabs statistic, such as chi-square or tau (for more information on which to use, check a statistics textbook). Do note whether your pair of variables are interval-interval, nominal-nominal, or interval-nominal.Note: standard tests of difference of means (such as two-mean t-tests and ANOVA) can be found in
Analyze > compare meansCross-tabs in Excel:
The cross-tabs function in Excel is called "PivotTable" -- one generates a "PivotTable Report".
5. Inductive or deductive: At this point, you can either use the data to test specific hypotheses, or else present a more exploratory overview of patterns in the data. If effective, develop graphs to display the most interesting patterns. Effective analyses of survey data often interweave tables, charts and narrative, telling a story and creating a vivid profile. An abstract with the key findings is useful.
6. For advanced users, try models that have a nominal-scale dependent variable, such as a logit model. It uses a similar logic to that of regression. (Example: if you have data on individuals and where each one lived in 1980 and 1990, predicting whether or not each one moved during this period.)
-------------------------------------