No homework assignments are any longer tentative.
No. | Due Date | Sec. | Exercises | Comments |
---|---|---|---|---|
1 | Wed Jan 24 | 6.2 | 6, 7, 10 | |
6.3 | 5, 7, 12, 14 | |||
2 | Wed Jan 31 | 6.4 | 2, 6, 7, 8 | |
6.5 | 2, 6, 9, 10 | |||
3 | Wed Feb 7 | 6.6 | 2, 6 | |
A | 1, 22 | additional problems(see below: number 1 and number 22). | ||
7.1 | 2, 4, 6, 8 | |||
7.2 | 6, 10, 11 | |||
4 | Wed Feb 14 | 7.3 | 4, 6, 8 | |
7.4 | 1, 2, 6 | |||
7.5 | 2, 4, 6, 11 | |||
5 | Wed Feb 28 | 7.6 | 4, 8, 10, 11 | |
7.7 | 6, 11 | |||
A | 2, 3, 4 | additional problems(see below). | ||
6 | Wed Mar 7 | 7.8 | 2, 4, 6, 14 | |
A | 5, 6, 7, 8, 9, 10 | additional problems(see below). | ||
7 | Wed Mar 21 | 8.1 | 1, 4, 15 | |
8.5 | 2, 12, 14 | |||
8.6 | 2, 3, 7 | |||
A | 11 | additional problems(see below). | ||
8 | Wed Mar 28 | 8.7 | 2, 4, 7 | For 7 also find the P-value of the test. See the page about F tests. |
9.1 | 4, 7, 8 | |||
9 | Wed Apr 11 | 9.2 | 2, 6 | |
9.3 | 5 | |||
9.4 | 2 | |||
9.6 | 4, 9 | data are in
http://www.stat.umn.edu/geyer/old03/5102/examp/ds9-7.4.txt
and
http://www.stat.umn.edu/geyer/old03/5102/examp/ds9-7.9.txt. The answer in the back of the book uses the large sample approximation.
R doesn't. So R doesn't give the same answer unless you say
ks.test(x, y, exact = FALSE) .
| ||
A | 12 | additional problems(see below). | ||
10 | Wed Apr 18 | A | 13, 14 | additional problems(see below). |
10.1 | 4, 6, 7 | the data for 7 are in the file http://www.stat.umn.edu/geyer/old03/5102/examp/ds10-1-3.txt | ||
10.2 | 12, 16 | for 16 give the 95% prediction interval rather than M.S.E. | ||
11 | Wed Apr 25 | 10.3 | 10, 11 | the data for 10 and 11 are in the file http://www.stat.umn.edu/geyer/old03/5102/examp/ds10-9.txt |
A | 15, 16, 17, 18, 19 | additional problems(see below). | ||
12 | Fri May 4 | 10.6 | 10 | the data for 10 are in the file http://www.stat.umn.edu/geyer/old03/5102/examp/ds10-18.txt |
10.7 | 14, 15 | the data for 14 and 15 are in the file http://www.stat.umn.edu/geyer/old03/5102/examp/ds10-24.txt | ||
10.8 | 11, 12, 13 | the data for 11, 12, and 13 are in the file http://www.stat.umn.edu/geyer/old03/5102/examp/ds10-29.txt | ||
A | 20, 21 | additional problems(see below). |
Additional Problems
1. Like the example of maximum likelihood done by computer except instead of the gamma shape model, we will use the Cauchy location model. The likelihood is given by equation (6.6.7) on p. 366 of DeGroot and Schervish. For data, use the URL
and for a starting point use the sample median rather than the sample mean,
that is, median(x)
instead of mean(x)
.
This makes sense because the true parameter value θ is the theoretical
median. The sample mean is a very bad estimate of
location for the Cauchy distribution.
The median
function
(on-line
help) calculates the sample median.
The dcauchy
function
(on-line
help) calculates the Cauchy p. d. f.
2. Solve the quadratic equation to prove that the interval (2.18) in the handout does indeed have endpoints (2.19) in the handout.
3. Calculate the three kinds of intervals given by equations (2.20), (2.19), and (2.22) in the handout for binomial data with n = 50 and x = 4. Use 95% for the confidence coefficient.
4. Calculate the second and fourth central moments μ2 and μ4 in the notation of the handout for the so-called double exponential distribution with density
(note this distribution is symmetric about zero, so the mean is zero and all odd central moments are zero).
Compare the correct asymptotic variance of the sample variance μ4 − μ22 with the incorrect asymptotic variance of the sample variance 2 μ22 that we would get if we incorrectly assumed the data were normal. (Section 2.10 of the handout).
5.
Starting with the asymptotic distribution for
Sn2 given on p. 16 of the
more on confidence intervals handout
use the delta method to give
the asymptotic distribution of
Sn.
6.
Using the method of Section 1.2 of the more on confidence intervals
handout, find an exact 95% confidence interval for the mean
(not the rate)
parameter of an exponential distribution from which it is assumed we have
independent and identically distributed data with sample size 15 and
sample mean 103.49.
7.
Using the method of Section 2.9.2 of
the more on confidence intervals
handout,
find an asymptotic (approximate, large sample) 95% confidence interval
for the mean parameter of a Poisson distribution from which is assumed we have
independent and identically distributed data with sample size 50 and
sample mean 2.9.
Hint: In order to use plug-in
you need a consistent
estimator of the standard deviation of the Poisson distribution. What is
the standard deviation and what is its relation to the mean? The sample
mean consistently estimates the mean parameter. What does that suggest
for a consistent estimator of standard deviation?
8. Suppose we have an independent and identically distributed sample from a Geometric(p) distribution with sample size 30 and sample mean 7.8. Find the maximum likelihood estimate of p and a 95% confidence interval for p based on the MLE and either observed or expected Fisher information.
9. Like the example of two-parameter maximum likelihood done by computer except instead of the gamma shape-rate model, we will use the Cauchy location-scale model. The likelihood is given by
where
The R function
dcauchy(x, location = mu, scale = sigma)
calculates f(x | μ, σ),
returning a vector of values if x
is a vector.
For data, use the URL
Method of moments estimators make no sense for the Cauchy distribution because the Cauchy distribution doesn't have any moments. We have to use estimators based on quantiles instead.
For a starting point for mu
use the sample median
(as we did in additional problem 1).
This makes sense because μ is the theoretical median.
And for a starting
point for the scale parameter sigma
use half the sample
interquartile range, that is, 0.5 * IQR(x)
.
This makes sense because
the theoretical interquartile range is 2 σ.
Report the values you obtain for
- the MLEs for μ and σ.
- the observed Fisher information matrix.
- 95% confidence intervals for μ and σ.
The median
function
(on-line
help) calculates the sample median.
The IQR
function
(on-line
help) calculates the sample interquartile range.
The dcauchy
function
(on-line
help) calculates the Cauchy p. d. f.
You may need to omit the test for positivity of the scale parameter
σ that the example code has because the R function nlm
will not keep this parameter positive. It does seem to get back on track
and find a positive solution, although it gives warnings.
10. Suppose the variables X1, X2, ..., Xn, Y1, Y2, ..., Yn are independent, and suppose the Xi are identically Exponential(θ) distributed and the Yi are identically Exponential(1 / θ) distributed.
- Find the maximum likelihood estimate when the sample size is
n = 25 and the sample means are 3.12 for the mean of the
Xi and 0.432 for the mean of the
Yi. Give the MLE both as a formula
(a function of
mean(x)
andmean(y)
) and numerically. - Calculate both observed and expected Fisher information.
- Show that even after the MLE is plugged in for the parameter,
observed and expected Fisher information are different, both as
formulas (functions of
mean(x)
andmean(y)
) and numerically. - Calculate 95% asymptotic (approximate, large sample) confidence intervals for the parameter θ, one using observed Fisher information, one using expected Fisher information.
11. Basically this is Problem 8.6.10 in DeGroot and Schervish. Use the data in their Table 8.1, which can be read into R with the statements
calcium <- c( 7, -4, 18, 17, -3, -5, 1, 10, 11, -2) placebo <- c(-1, 12, -1, -3, 3, -5, 5, 2, -11, -1, -3)
- Perform a test of the hypotheses stated in Problem 8.6.10 using Welch's approximate test, giving the P-value.
- Perform a test of the same hypotheses using the exact t-test based on the assumption of equal variances, giving the P-value.
- Interpret these P-values.
- Calculate a 95% two-sided confidence interval for the difference of the means of the two groups.
The web page on doing t-tests in R may help.
12. For the data in the URL
calculate the following point estimators
- the sample mean
- the sample median
- the sample 10% trimmed mean
- the sample 20% trimmed mean
- the median of the Walsh averages (Hodges-Lehmann estimator associated with the Wilcoxon signed rank test)
13. For the data in the URL
calculate confidence intervals for the center of symmetry (we assume the population distribution is symmetric about some point θ which is the unknown parameter of interest) associated with
- the sign test
- the Wilcoxon signed rank test
- the Student t test
having confidence level above 95% and as close to 95% as you can get
(this is what the wilcox.test
function does by default).
14. For the data in the URL
calculate P-values for an upper tailed test about the center of symmetry (we assume the population distribution is symmetric about some point θ which is the unknown parameter of interest) with null and alternative hypotheses
H1: &theta > 0
for each of the following types of test
- the sign test
- the Wilcoxon signed rank test
- the Student t test
(note: the t.test
and wilcox.test
functions do two-tailed tests by default so you must use the optional argument alternative = "greater"
to do an upper-tailed test).
15. For the data in the URL
which contains two variables x
and y
,
assume the data follow the simple linear regression model
-
Calculate the P-value for a test
with null and alternative hypotheses
H0: β1 = 0
H1: β1 ≠ 0 - Interpret the P-value. Does the test say the value of the true population regression coefficient β1 is statistically significantly different from zero at the 0.05 level?
16. For the data in the URL
which contains two variables x
and y
,
assume the pairs (Xi,
Yi) are independent and identically
bivariate normal distributed with correlation
-
Calculate the P-value for a test
with null and alternative hypotheses
H0: ρ = 0
H1: ρ ≠ 0 - Interpret the P-value. Does the test say the value of the true correlation coefficient ρ is statistically significantly different from zero at the 0.05 level?
17. For the data in the URL
which contains two variables x
and y
,
assume the data follow the simple linear regression model
-
Calculate the P-value for a test
with null and alternative hypotheses
H0: β1 = 0.6
H1: β1 ≠ 0.6 - Interpret the P-value. Does the test say the value of the true population regression coefficient β1 is statistically significantly different from 0.6 at the 0.05 level?
Note: This is exactly the same as Additional Problem 15 (word for word) except that the hypothesized value of the regression coefficient is 0.6 rather than zero.
18. For the data in the URL
which contains two variables x
and y
,
assume the data follow the simple linear regression model
-
Calculate the P-value for a test
with null and alternative hypotheses
H0: β2 = 0
H1: β2 ≠ 0 - Interpret the P-value. Does the test say the value of the true population regression coefficient β2 is statistically significantly different from zero at the 0.05 level?
Note: This is exactly the same as Additional Problem 15 except that it is about the quadratic regression model rather than the simple linear model and the test is about β2 rather than about β1.
19. For the data in the URL
which contains two variables x
and y
,
it is clear from the scatter plot produced by plot(x, y)
that a simple linear regression will not fit the data (no statistics
needed, the points are obviously nowhere near a straight line).
From the scatter plot curves up at both ends, it is clear that a polynomial of even degree is needed for the regression function (assuming we restrict our consideration to polynomials), because a polynomial of odd degree would go up at one end and down at the other.
-
Fit the following three regression models:
- The quadratic model
y = β0 + β1 x + β2 x2 + error
- The quartic (fourth degree) model
y = β0 + β1 x + β2 x2 + β3 x3 + β4 x4 + error
- The sixth degree model
y = β0 + β1 x + β2 x2 + β3 x3 + β4 x4 + β5 x5 + β6 x6 + error
Report the regression coefficients for each model.
- The quadratic model
- Perform a test in which the quadratic model is the little model and the quartic model is the big model. Report the F statistic and the P-value for the F test for model comparison. Interpret the P-value. Which model does this test tell you to use?
- Perform a test in which the forth degree model is the little model and the sixth degree model is the big model. Report the F statistic and the P-value for the F test for model comparison. Interpret the P-value. Which model does this test tell you to use?
-
Make a scatter plot of the data points, with the estimated regression function
plotted for all three models on the same plot (use
lty = 2
,lty = 3
, and so forth to distinguish the lines). Hand in the plot. Comment on the differences between the curves and the relation to the results of the F tests.
20.
Modify the example calculating the MSE
of an estimator by simulation making two changes. Use the t
distribution
with 2.5 degrees of freedom for the distribution of the data (instead of
the standard Cauchy distribution in the example) and use the 20% trimmed mean
for the point estimator, which is calculated by the mean
function in R using the trim
optional argument
(on-line help).
Provide both a point estimate and a confidence interval for the actual
true MSE.
21.
Modify the percentile bootstrap confidence
interval example making two changes. Make the parameter to be estimated
the interquartile range of the population and the point estimator of this
parameter the interquartile range of the data, which is
calculated by the IQR
function in R
(on-line help).
22. Redo problem 6.4.2 (which was done in homework assignment 2) and do 6.4.5 (which was not) except with absolute error loss instead of squared error loss. This will entail using the computer to find posterior medians like the computer examples for posterior medians. Note that the posterior distribution for 6.4.5 is given by Theorem 6.3.2 in the book.
Note on Math on the Web
Some web browsers don't display the math formulas above correctly. In this case you have two options.
- Get a non-sucky web browser that actually implements (rather than disdains) internet standards.
- Read the additional problems in PDF (Adobe Palatable Dog Food) format.