5 Aug 2021
For printing outputs in the style that Gelman does, run the following macro
local gelman_output esttab,b(1) se wide mtitle("Coef.") scalars("rmse sigma") ///
coef(_cons "Intercept" rmse "sigma") nonum noobs nostar var(15)
Load the data and make a scatterplot
import delimited https://raw.githubusercontent.com/avehtari/ROS-Examples/master/ElectionsEconomy/data/hibbs.dat,clear delim(" ")
li in 1/5
┌─────────────────────────────────────────────────┐ │ year growth vote inc_part~e other_ca~e │ ├─────────────────────────────────────────────────┤ 1. │ 1952 2.4 44.6 Stevenson Eisenhower │ 2. │ 1956 2.89 57.76 Eisenhower Stevenson │ 3. │ 1960 .85 49.91 Nixon Kennedy │ 4. │ 1964 4.21 61.34 Johnson Goldwater │ 5. │ 1968 3.02 49.6 Humphrey Nixon │ └─────────────────────────────────────────────────┘
Graph a scatterplot
scatter vote growth, xtitle("Economic growth")
Now estimate the regression y = a + bx + error
. qui regress vote growth
Add a fitted line to the prior graph by drawing a function representing a straight line using the coefficients (constant and slope) extracted from the regression.
You can place more than one graph (of the twoway types) on a set of axes by putting two (or more) graph statements in separate paretheses as shown below
twoway (scatter vote growth, xtitle("Average recent income growth")) ///
(function y = _b[_cons]+_b[growth]*x,range(-.5 4.5) lp(solid) )
After a regression you can refer to the coefficients by looking at the names in the table you get with the following command. regress,coefleg
. regress,coefleg Source │ SS df MS Number of obs = 16 ─────────────┼────────────────────────────────── F(1, 14) = 19.32 Model │ 273.632269 1 273.632269 Prob > F = 0.0006 Residual │ 198.272733 14 14.1623381 R-squared = 0.5798 ─────────────┼────────────────────────────────── Adj R-squared = 0.5498 Total │ 471.905002 15 31.4603335 Root MSE = 3.7633 ─────────────┬──────────────────────────────────────────────────────────────── vote │ Coefficient Legend ─────────────┼──────────────────────────────────────────────────────────────── growth │ 3.060528 _b[growth] _cons │ 46.24765 _b[_cons] ─────────────┴────────────────────────────────────────────────────────────────
You can add the following lines the code for the graph above to get the formatted graph in the book (remember to add a line continuation symbol /// at the end of the prior line )
ytitle("Incumbent party's vote share'") legend(off) ///
ylab(45 "45%" 50 "50%" 55 "55%" 60 "60%") ///
xlab(0 "0%" 1 "1%" 2 "2%" 3 "3%" 4 "4%") ///
ysize(5) xsize(5) ysc(fextend) xsc(fextend) ///
subtitle(Data and linear fit) yline(50,lw(vthin)) ///
text(53 2.5 "y= 46.3 + 3.0 x", place(e))
Now print the output in the style that Gelman uses using the macro we stored above.
`gelman_output'
───────────────────────────────────────── Coef. ───────────────────────────────────────── growth 3.1 (0.7) Intercept 46.2 (1.6) ───────────────────────────────────────── sigma 3.8 ───────────────────────────────────────── Standard errors in parentheses
You could also just issue the regress
command which will replay the last results. (Note where the estimates featured in Gelman’s simplified table come from)
. regress Source │ SS df MS Number of obs = 16 ─────────────┼────────────────────────────────── F(1, 14) = 19.32 Model │ 273.632269 1 273.632269 Prob > F = 0.0006 Residual │ 198.272733 14 14.1623381 R-squared = 0.5798 ─────────────┼────────────────────────────────── Adj R-squared = 0.5498 Total │ 471.905002 15 31.4603335 Root MSE = 3.7633 ─────────────┬──────────────────────────────────────────────────────────────── vote │ Coefficient Std. err. t P>|t| [95% conf. interval] ─────────────┼──────────────────────────────────────────────────────────────── growth │ 3.060528 .6962739 4.40 0.001 1.567169 4.553887 _cons │ 46.24765 1.621932 28.51 0.000 42.76895 49.72635 ─────────────┴────────────────────────────────────────────────────────────────
Another way to add the straight line to the scatterplot is using the twoway lfit
(or lfitci
) graph procedure. This allows you to put confidence intervals around the line for either the regression line (which represents predicted conditional means observed if you were to draw another sample) , or in this case I asked for the confidence intervals for the points (option stdf
for forecast). This is the CI for where the observed points might lie if you were to draw another sample of from the same population.
twoway (lfitci vote growth, lp(solid) stdf ) ///
(scatter vote growth,m(O) xtitle("Average recent income growth"))
(page 16)
bayes: regress vote growth
Burn-in ...
Simulation ...
Model summary
------------------------------------------------------------------------------
Likelihood:
vote ~ regress(xb_vote,{sigma2})
Priors:
{vote:growth _cons} ~ normal(0,10000) (1)
{sigma2} ~ igamma(.01,.01)
------------------------------------------------------------------------------
(1) Parameters are elements of the linear form xb_vote.
Bayesian linear regression MCMC iterations = 12,500
Random-walk Metropolis–Hastings sampling Burn-in = 2,500
MCMC sample size = 10,000
Number of obs = 16
Acceptance rate = .3159
Efficiency: min = .08521
avg = .0914
Log marginal-likelihood = -56.640716 max = .09876
------------------------------------------------------------------------------
| Equal-tailed
| Mean Std. dev. MCSE Median [95% cred. interval]
-------------+----------------------------------------------------------------
vote |
growth | 3.06295 .7641382 .024315 3.074891 1.564028 4.54869
_cons | 46.2363 1.805143 .060097 46.26667 42.5979 49.78299
-------------+----------------------------------------------------------------
sigma2 | 16.43696 7.220643 .247357 14.84454 7.525531 35.19754
------------------------------------------------------------------------------
Note: Default priors are used for model parameters.
(page 25)
Import the data and then set up a binary variable flag (called exclude) so that we can leave out some of the countries in the graph for legibility
import delimited https://raw.githubusercontent.com/avehtari/ROS-Examples/master/HealthExpenditure/data/healthdata.txt,clear delim(" ")
gen exclude=inlist(country,"Netherlands", "Belgium", "Germany", ///
"Ireland", "Iceland", "Greece", "Italy", "Sweden", "UK")
Then generate the graph in the book. The tilde before exclude tells graph scatter to include observations where exclude ≠ 1
twoway scatter lifespan spending if ~exclude, mlabel(country) m(i)