General Instructions
The exam is open book, open notes, open web pages. Do not discuss this exam with anyone except the instructors (Geyer, via e-mail, or Chatterjee).
You may use the computer, a calculator, or pencil and paper to get answers, but it is expected that you will use the computer. Show all your work:
- For simple computer commands, you may just write down the command
you used and the result it gave on your exam solution.
- For complicated commands or plots, make a printout and attach
the printout to your exam solution.
No credit for numbers with no indication of where they came from!
Question 1 [25 pts.]
The data for this problem are at the URL
(and were used in homework).
With that URL given to Rweb, one variable
x
is
loaded.
(If you are using R at home,
see the footnote about reading this data into R).
In the homework we used the sample mean as an estimator of location,
but for these data it has a slow rate of convergence n1/3.
If we use a more robust estimator, say the Hodges-Lehmann estimator associated
with the Wilcoxon signed rank test, call it the sample pseudomedian,
then we get the usual n1/2 rate of convergence
(the pseudomedian obeys the square root law
).
-
Calculate the sample pseudomedian (Hodges-Lehmann estimator associated
with the Wilcoxon signed rank test) of these data.
-
Estimate the standard error of the sample of this estimate
(considered as a point estimate of the location parameter) using
Efron's nonparametric bootstrap.
Use at least 1000 bootstrap samples to calculate your estimate.
- Make a histogram of the bootstrap distribution of the sample pseudomedian showing the point which is the sample pseudomedian of the original data.
Question 2 [25 pts.]
The data for this problem are at the URL
With that URL given to Rweb, two variables x
and
y
are loaded.
(If you are using R at home,
see the footnote about reading this data into R).
We wish to fit a regression model with x
as the
predictor and y
as the response.
These data, except for outliers, appear to fit a fourth degree polynomial
that is specified in the R formula mini-language by
y ~ poly(x, 4)
(on-line
help).
Since these data have some outliers, we will use a robust
regression program ltsreg
to do the regression
(recall that this program was used in homework).
- Fit the model specified by the R formula
y ~ poly(x, 4)
to these data. Report the regression coefficients. -
Estimate the standard errors of all five regression coefficients
(considered as a point estimates of the population regression coefficients)
using Efron's nonparametric bootstrap.
Bootstrap residuals, not cases.
Use at least 250 bootstrap samples to calculate your estimate
(You should use more than 250, but
rweb.stat.umn.edu
is slow, at least runningltsreg
.) - Produce a plot, like those on the bootstrapping regression web page showing not only the sample regression function but also the bootstrap regression functions.
Question 3 [25 pts.]
The data for this problem are at the URL
With that URL given to Rweb, one variable y
is loaded.
(If you are using R at home,
see the footnote about reading this data into R).
These data are actually a simulated stationary time series. We would like to fit an AR(2), which means autoregressive of order 2, model, which has the form
where the betas are autoregressive coefficients (the two parameters of
interest) and the Z's are IID mean zero innovations
. Then
the observed data are
where the X's are as defined above. So there are four unknown parameters (the betas, mu, and the innovations variance), but we are only interested in two.
The R statement
ar.burg(y, order.max = 2, aic = FALSE)$ar
(on-line
help)
produces a vector of length 2 that estimates the betas.
Note that there is no need to subtract off the sample mean from the
series like we did in the
subsampling bootstrap for time series example. That is part of what
the ar.burg
does.
- Do a subsampling bootstrap to obtain bootstrap distributions of
the two autoregressive coefficients (betas). Use subsampling bootstrap
sample size 15, and assume a root n rate of convergence.
(No answers from this part, just code.)
- Give (subsampling) bootstrap estimates of the standard errors of
the betas.
- Give a scatterplot of
β1*
versus
β2*
showing (the subsampling bootstrap estimate of) their joint sampling
distribution. Put horizontal and vertical lines on the plot to show
the corresponding sample estimates the beta hats (can't make hats in
HTML).
Note: the
h
andv
arguments to theabline
function (on-line help) add such lines to plots and thelty
argument makes different line types.
Question 4 [25 pts.]
The data for this problem are at the URL
With that URL given to Rweb, two variables
fruit
and seeds
are loaded.
(If you are using R at home,
see the footnote about reading this data into R).
These are the same data that were used for
question 4 on the first midterm.
Note that the URL ends t1p4.txt
not t2p4.txt
.
- Calculate the Pearson correlation coefficient for these data.
- Since
the data are highly non-normal, the usual assumptions that go with
the Pearson correlation coefficient are badly violated. Use the
(Efron, nonparametric) bootstrap to approximate the sampling distribution
of the Pearson correlation coefficient. Use bootstrap sample size 1000.
(No answers from this part, just code.)
- Report the bootstrap standard error of the sample Pearson
correlation coefficient.
- Make a histogram of the bootstrap distribution of the sample Pearson correlation coefficient showing the point which is the sample Pearson correlation coefficient of the original data.
Footnote about Reading Data for Problem 1 into R
If you are doing this problem in R rather than Rweb, you will have to duplicate what Rweb does reading in is URL at the beginning. So all together, you must do for problem 1, for example,
X <- read.table(url("http://www.stat.umn.edu/geyer/f06/5601/mydata/two-thirds.txt"), header = TRUE) names(X) attach(X)
To produce the variable x
needed for your analysis.
Of course, you read different data files for different problems that use external data entry. Everything else stays the same.