General Instructions
The exam is open book, open notes, open web pages. Do not discuss this exam with anyone except the instructor.
You may use the computer, a calculator, or pencil and paper to get answers, but it is expected that you will use the computer. Show all your work:
- For simple computer commands, you may just write down the command
you used and the result it gave on your exam solution.
- For complicated commands or plots, make a printout and attach
the printout to your exam solution.
No credit for numbers with no indication of where they came from!
Question 1 [25 pts.]
The data for this problem are at the URL
With that URL given to Rweb, two variables
x
and y
are
loaded.
(If you are doing this problem in R rather than Rweb,
see the footnote about reading this data into R).
This is regression data. We assume the standard model that is nonparametric about the regression function
where g is an unknown smooth function (infinite-dimensional parameter), σ is an unknown constant (scalar parameter), and the Zi are IID standard normal.
Use the R function sm.regression
(on-line help)
to fit a regression function
(g hat) to these data. Use optimal smoothing, where optimal
is
defined by this package's method (ordinary cross-validation).
Hand in a scatterplot with the smoothing spline regression estimate shown. Also report the value of the bandwidth used (the one chosen by cross-validation).
Question 2 [25 pts.]
This problem continues the analysis started in Question 1 and uses the same data and the same model assumptions.
Suppose in the plot that is the answer to Question 1 we want a confidence interval for the value of the population regression function g(x) at x = 2.0.
Run an (Efron) nonparametric bootstrap to estimate the sampling distribution of this estimator of g(2.0) and calculate the 95% bootstrap percentile confidence interval obtained from the bootstrap sampling distribution.
The following detailed instructions are necessary. They were not covered
in class or on the class web page about sm.regression
(which
was here).
-
Just after the
library(sm)
command, before doing anything else, issue the commandsm.options(eval.points = x)
This statement makes the
sm.regression
function evaluate the smooth at the givenx
points (which is not its default behavior). This command need be given only once. It stays in effect throughout the Rweb submission. -
It turns out that the function
hcv
is now obsolete, although it is still in the package and can be used in problem 1. For this problem, however, the bootstrap causes it to fail and the new functionh.select
(on-line help) should be used. Instead ofh <- hcv(x, y)
in this problem do
h <- h.select(x, y, method = "cv")
- Be sure to recalculate the optimal bandwidth for each bootstrap sample
(that way the bootstrap approximates the error in both parts of the algorithm:
bandwidth calculation and smoothing). Of course,
the arguments of
h.select
change to evaluate for bootstrap data. -
If
out
is the output of thesm.regression
function (on-line help), thenout$estimate
gives the vector of predicted values at all
x
points andout$estimate[out$eval.points == 2.0]
gives the predicted value corresponding to
x
= 2.0. -
Bootstrap residuals not cases. (The required predicted values are discussed
in the preceding item.)
-
Use bootstrap sample size 400 (anything longer takes too long on
rweb.stat.umn.edu
). -
In order to not make
nboot
plots, the optiondisplay = "none"
must be given to the functionsm.regression
inside the bootstrap loop (and also outside if you prefer).
Hand in a histogram of your bootstrap estimate of the sampling distribution of the estimator showing the endpoints of the confidence interval on the histogram. Also report the numbers that are the endpoints of the confidence interval.
Question 3 [25 pts.]
The data for this problem are at the URL
With that URL given to Rweb, one variable
x
is
loaded.
(If you are doing this problem in R rather than Rweb,
see the footnote about reading this data into R).
These data are a stationary time series for which we want to estimate
the 0.9 quantile (also called the 90th percentile) of the marginal
distribution of each x
value (the marginal distribution
is the same for all times by stationarity). The quantile
function
(on-line help)
estimates quantiles.
Calculate the 0.9 quantile of x
.
Calculate a 95% confidence interval for the parameter estimated by this estimator (the population 0.9 quantile) calculated as described in our careful example for bootstrapping time series and also hand in a histogram with relevant quantiles marked as done in that example.
Use subsampling bootstrap sample size b = 50.
You may assume this estimator obeys the square root law (has n1 ⁄ 2 rate of convergence).
Question 4 [25 pts.]
The data for this problem are at the URL
With that URL given to Rweb, one variable
x
is
loaded.
(If you are doing this problem in R rather than Rweb,
see the footnote about reading this data into R).
This is independent and identically distributed data. It comes from a gamma distribution, which is a skewed distribution having density curve
where c(α, λ) is a constant that depends on the parameters α and λ (whose exact expression doesn't matter).
In this problem we wish to estimate the shape
parameter α.
It is explained in theory books that the so-called method of moments
estimator of α is
alpha.hat <- mean(x)^2 / (((n - 1) / n) * var(x))
(as usual, n
is the sample size length(x)
and
the (n - 1) / n
converts the so-called sample variance
into the empirical variance).
Calculate a bootstrap t with double bootstrap variance estimate 95% confidence interval for the true unknown parameter α using the method of moments point estimator. Use bootstrap sample size 1000 and the default double bootstrap sample size (for the inner bootstrap that determines the variance estimate).
Footnote about Reading Data for Problem 1 into R
If you are doing this problem in R rather than Rweb, you will have to duplicate what Rweb does reading in a URL at the beginning. So all together, you must do for problem 1, for example,
X <- read.table(url("http://www.stat.umn.edu/geyer/f06/5601/mydata/camel.txt"), header = TRUE) names(X) attach(X)
To produce the variables x
and y
needed
for your analysis.
Of course, you read different data files for different problems
that use external data entry, and the variables in those files
may have names other than x
and y
.
Everything else stays the same.