UP504 • Gathering Data, Constructing Tables and Graphslast updated: Wednesday, February 21, 2007 |
Assignment Twodue Wednesday, March 7 [new date] see also notes on downloading data |
This assignment consists of two parts. The first part is to gather data and construct a data table. The second part is to generate graphic figures, both from the data that you generate and from a data set made available to the class. Please follow the instructions carefully.
You are to work in teams of two students. (Note: in group
projects, you are to turn in a single, integrated write-up. All group members
receive the same grade.)
Skills to be learned:
You are to gather data on an urban planning or policy related theme. You may
obtain the data from a source of your choice: e.g., United Nations,
World Bank, Bureau of Labor Statistics, etc. You might also look at non-governmental
sources. (Please do not use data from the Decennial U.S. Census.) The goals of the assignment are to gain experience with locating data
sets, selecting variables and cases, loading them into spreadsheets, and properly
documenting, labeling and citing your data. The variables, unit of analysis,
time and space frame are your decision. Convert this data into an Excel spreadsheet
form (or some other appropriate spreadsheet format). (You are to turn in only
the printed table; do not turn in the raw data or an electronic version of the
data.)
For some ideas on accessing web-based data sets, see the: |
Common mistakes made in data tables:
• an example of a flawed data table (an MS Word document)
• the same table with annotations of problems
(Note: you might view the first table, identify as many problems as you can, and then check the second version to see the answers.)
* the source of your data should not already be in spreadsheet format
* the data should NOT be from the Decennial US Census
* each student should generate a unique data set
* at least 25 cases and at least 5 variables.
* properly cite the source of the data (and its original format)
* clearly label all variables (units, date, etc.); avoid awkward abbreviations
* provide a complete title to the table
* explain all unclear variables
* properly handle all missing cases
* if useful, provide an extra column(s) with converted variables (e.g., percents, dollar amounts adjusted for inflation, ranks, etc.). Be consistent with number formats; do not include excessive/insignificant digits (e.g., 7.4235%).
* sort the cases based on some logical criterion (e.g., alphabetical, geographic, or ranked by a key variable -- such as population size).
* properly justify the column alignment (e.g., right-justify numbers, line up decimal points)
* if possible, have the table on one page (this may require some reduction in font size)
NOTE: The second data does NOT need to be related in any way to the first data set we provide you (on public transportation). Simply choose a data set of interest to you.
Part 2: Graphing the Data
Generate two graphs: one from your individual data set in Part
1, and one from the data set provided (see below). (You might experiment with alternative
formats for each, and then select the best.) Be careful in your selection of
the type of graph and the variables. (See Tufte for guidelines.) Clearly label
the graph elements, with a title and sources.
For the first data set (from Part 1), create
a chart that illustrates what you think are the most interesting patterns in
the data. (Exploratory Data Analysis)
For the second data set (from the 2000 US Census), create a graph that reveals an interesting relationship between the percent of people who commute by public transportation and one or several other variables. (Structured data analysis). Note: there are MANY more variables here than you can display, so choose a few carefully. There is certainly a wide variety of ways to answer this question, and I look forward to seeing the range of results.
DOWNLOAD the data set (up504w07assign2data.xls)
to be handed in (in paper format) on
March 7 (during class) one data table (Part 1) two graphs (Part 2) |
to be uploaded onto your ifs space before class
on March 7 (instructions below) one data table (Part 1) two graphs (Part 2) |
You are to make your two graphs available for presentation during class on Wednesday March 7
See guidelines on developing web pages.
Be sure to use this standardized url format:www-personal.umich.edu/~youruniquename/up504/assign2/index.html no spaces, all lowercase (no caps) |
results:
student graphs
Once you create your file and upload it into your html directory on your ifs space, check the link on the above page to make sure that the link works. (Best: check from another computer WITHOUT being logged onto to your own IFS space.) Be sure that all your files are uploaded into your ifs space -- and not just sitting in your local harddrive. |
Note: sometimes a graph that works well in Excel does not not look as good on the web, or vice versa. (There are many reasons for this: the web is in color, but printed graphs usually in black and white; screen size vs. paper size; different resolutions of pixel display vs. printing; etc.) Therefore you may need to slightly alter your graph to make it look good on the web. Be sure that:
Table: properly labeled properly cited (source of data) complete title (year, variables, units, geography if relevant) missing cases explained units explained (e.g., dollars, in millions) at least 5 variables and 25 cases (more are fine but not needed) |
Graphs: properly labeled and cited complete title makes a point quickly at least bivariate no distortions minimal "chart junk" accurate encourages the eye to compare data reveals the data at different levels: both broad overview and details |
A spreadsheet in Excel serves at least three different functions:
1. to be a repository of your data set.
2. to create a data table for presentation and/or publication
3. to be the base for generating Excel charts.
It is often hard to have one single sheet serve all three functions well.
One strategy is to therefore have three separate sheets (all on the same
file) to serve these different functions. You might call them: "Original
Data," "presentation table," "data for charts". (For example, for
the "data for charts" you may want to rearrange the columns, calculate new variables,
change the labels, etc., but not have these manipulations mess up either your
original data set or your presentation table.
1. It is often easier to work with a chart as a separate sheet
than one that is embedded (sitting on top of your data sheet). Charts
as separate sheets are automatically full screen and it is easier to move around
from one chart to another by simply clicking on the tabs at the bottom.
You can always convert a graph as "object" to a new sheet through the "location"
feature.
2. Scatterplots are often a much more effective and direct way to show bivariate relationships that other chart types. You might even try a "bubble chart" to add a third variable. (You can also try multiple series to show more than two variables.)
3. Excel's default background color for charts is usually gray. You might switch to white for better contrast.
4. There is an inevitable trade-off between trying to show lots of data and variables on one chart and yet maintaining a simple, clean graphic where the variable relationships are clear. You may want to show multiple dimensions of the data (think of each variable as a data dimension), but a computer screen or piece of paper has only two dimensions. One strategy, if you can't fit everything on one chart, is to generate a single graphic that consists of several charts side-by-side. The reader's eye can better compare the several graphics if they are the same size, format, scale, etc. Tufte calls this useful design "small multiples."
5. One test of whether your chart is clear is to have a fellow student look at it. If they can quickly see the patterns in the data (rather than patterns in the graphic design) without your verbal explanation, then that is evidence that you are communicating well. But if the chart is so complex or unintuitive that the viewer needs a lengthy explanation, then the graph is not working. Try to simplify, change the data ordering, change the chart type, the scale, try a combination chart, convert variables (e.g., use percents rather than absolutes, such as percent Hispanic population), use different variables, etc.
6. Excel does make it easy to add a "trend line" to your chart. (With two variables, a trend line is essentially a regression line, and the R2 is simply the square of the correlation coefficient "r". R2 values closer to 1.0 indicate a stronger linear relationship.) However, this tool can be abused. Only add a trend line if it helps the reader understand the data patterns. Otherwise, leave it out. (And if you do include a trend line, you might make the line subtle, rather than a think black line, and perhaps make the R2 value small and discrete.)
7. Overall, show the data; have the view think the patterns in the data, not the graphic design; avoid distortion; encourage the eye to compare data; clearly label the graph. Give the viewer the greatest number of ideas, in the shortest time, with the least amount of ink, in the smallest space. (For Tufte, this means avoiding "chart junk".) A chart is not a replacement for a table; sometimes a table can communicate information in ways that a chart can't. So do recognize the limitations of charts (e.g., that it is often hard to show more than two variables), but creatively explore ways to reveal the data.
You might find that other software (e.g., SPSS) has better graphing capabilities. As an option, you may choose to create some or all of your assignment with another application. (We stress Excel on this assignment because it is the most widely available software, and most planners will end up in offices with Excel on their machines.)