24 Sep 2020
Use for outputs
local gelman_output esttab,b(1) se wide mtitle("Coef.") scalars("rmse sigma") ///
coef(_cons "Intercept" rmse "sigma") nonum noobs nostar var(15)
local gelman_output2 esttab,b(2) se wide mtitle("Coef.") scalars("rmse sigma") ///
coef(_cons "Intercept" rmse "sigma") nonum noobs nostar var(15)
Residual plot adding lines for mean=0 and +/- 1 standard deviation (using the root mean squared error from the regression)
import delimited https://raw.githubusercontent.com/avehtari/ROS-Examples/master/KidIQ/data/kidiq.csv, clear
qui regress kid_score mom_hs mom_iq
rvfplot ,yline(0) yline(-`e(rmse)' `e(rmse)', lp(dash))
qui import delimited https://raw.githubusercontent.com/avehtari/ROS-Examples/master/Introclass/data/gradesW4315.dat, delim(" \t",collapse) clear
qui regress final midterm
`gelman_output'
rvfplot ,yline(0) title(Residuals vs. predicted values) ///
xtitle(predicted value) name(fig11_7_stata_a,replace)
predict final_resid,resid
scatter final_resid final, yline(0) title(Residuals vs. observed) ///
name(fig11_7_stata_b, replace) xtitle(observed value)
graph combine fig11_7_stata_a fig11_7_stata_b
(11 vars, 52 obs) ───────────────────────────────────────── Coef. ───────────────────────────────────────── midterm 0.7 (0.2) Intercept 64.5 (17.0) ───────────────────────────────────────── sigma 14.8 ───────────────────────────────────────── Standard errors in parentheses
local a=64.5
local b=0.7
local sigma=14.8
gen final_fake=`a' + `b'*midterm + rnormal(0, `sigma')
qui regress final_fake midterm
`gelman_output'
rvfplot ,yline(0) title(Residuals vs. predicted values) xtitle(predicted value) ///
name(fig11_8_stata_a,replace)
cap drop final_resid
predict final_resid,resid
scatter final_resid final_fake, yline(0) title(Residuals vs. observed) ///
name(fig11_8_stata_b, replace) xtitle(observed value)
graph combine fig11_8_stata_a fig11_8_stata_b
───────────────────────────────────────── Coef. ───────────────────────────────────────── midterm 0.5 (0.2) Intercept 86.8 (16.8) ───────────────────────────────────────── sigma 14.6 ───────────────────────────────────────── Standard errors in parentheses
The data
qui import delimited https://raw.githubusercontent.com/avehtari/ROS-Examples/master/Newcomb/data/newcomb.txt ,clear
hist y, fc(white) ysc(off) xtitle("") discrete width(2)
Fitting (inappropriately) a normal distribution to the data by a simple empty regression of y. Given the assumption of normal distribution of the error term in regression, this forces any deviations around the mean to be normally distributed)
We construct replicated data differently than in Gelman. They use bayesian regression for all their examples. As the estimation is done by simulation they already have replicates of the coefficients and error term which they summarize for the model output. It is trivial then to generate a new outcome by plugging the data into the different sets of coefficients with the error term.
As we use maximum likelihood methods, we treat the coefficients as fixed and the data as random
So we combine our estimate of the coefficient with a draw from the error term to generate a new value.
qui regress y
cap drop y_r*
forval i=1/20 {
qui gen y_r`i'=_b[_cons] + rnormal(0,`e(rmse)')
qui hist y_r`i',name(y_r`i',replace) fc(white) ysc(off) xtitle("")
}
unab yhist: y_r*
graph combine `yhist'
qui graph export img/fig11_10_stata.png,replace
Alternate visualization on a single axis
twoway kdensity y || kdensity y_r1,lc(gs14) lpatt(solid) || ///
kdensity y_r2,lc(gs14) lpatt(solid) || kdensity y_r3,lc(gs14) lpatt(solid) || ///
kdensity y_r4,lc(gs14) lpatt(solid) || kdensity y_r5,lc(gs14) lpatt(solid) || ///
kdensity y_r6,lc(gs14) lpatt(solid) || kdensity y_r7,lc(gs14) lpatt(solid) || ///
kdensity y_r8,lc(gs14) lpatt(solid) || kdensity y_r9,lc(gs14) lpatt(solid) || ///
kdensity y_r10,lc(gs14) lpatt(solid) || kdensity y_r11,lc(gs14) lpatt(solid) || ///
kdensity y_r12,lc(gs14) lpatt(solid) || kdensity y_r13,lc(gs14) lpatt(solid) || ///
kdensity y_r14,lc(gs14) lpatt(solid) || kdensity y_r15,lc(gs14) lpatt(solid) || ///
kdensity y_r16,lc(gs14) lpatt(solid) || kdensity y_r17,lc(gs14) lpatt(solid) || ///
kdensity y_r18,lc(gs14) lpatt(solid) || kdensity y_r19,lc(gs14) lpatt(solid) || ///
kdensity y_r20,lc(gs14) lpatt(solid) , ysc(off) xtitle("") legend(off) ///
xlab(-25(25)50) ysc(fextend) xsc(fextend)
qui graph export img/fig11_11_stata.png,replace
import delimited https://raw.githubusercontent.com/avehtari/ROS-Examples/master/Unemployment/data/unemp.txt, delim(" \t",collapse) clear
gen y_lag=y[_n-1]
qui regress y y_lag
`gelman_output2'
(1 missing value generated) ───────────────────────────────────────── Coef. ───────────────────────────────────────── y_lag 0.77 (0.08) Intercept 1.35 (0.46) ───────────────────────────────────────── sigma 1.02 ───────────────────────────────────────── Standard errors in parentheses
Again we generate relicate datasets as described above.
cap drop y_r*
set seed 3992
forval i=1/15 {
qui gen y_r`i'=_b[_cons]+ ///
_b[y_lag]*y_lag + rnormal(0,`e(rmse)')
}
forval i=1/15 {
qui line y_r`i' year,sort name(y_r`i',replace) fc(white) ///
xlab(.) ylab(.) ytick(0(1)10) xtick(1950(10)2010) ytitle(" ") xtitle("") ///
subtitle("Simulation `i'")
}
unab yhist: y_r*
graph combine `yhist'
qui graph export img/fig11_14_stata.png,replace
In stata it is easier to write a program to cacluate this test statistic (quantifying the relative “lack of smoothness” of the replicated data as compared to the actual data by looking at the number of switches in direction in each)
cap prog drop switch_test
prog define switch_test
syntax varlist
tokenize `varlist'
local first `1'
while "`first'"!="" {
tempvar y_lag y_lag2 test
qui gen `y_lag'=`first'[_n-1]
qui gen `y_lag2'=`first'[_n-2]
qui gen `test'=sum(sign(`first'-`y_lag')!= ///
sign(`y_lag'-`y_lag2')) if `y_lag2'!=.
di `test'[_N] ", "_c
macro shift
local first `1'
}
end
. * Number of switches in actual data . switch_test y 26, . * Number of switches in replications . switch_test y_r1-y_r15 43, 43, 43, 39, 48, 35, 38, 43, 38, 38, 44, 32, 43, 42, 44,