University of Minnesota, Twin Cities School of Statistics Stat 3011 Rweb Textbook (Wild and Seber)

Statistics 3011, Fall 2000, Prof. Geyer, Homework Assignments

Go to assignment: 1 2 3 4 5 6 7 8 9 10 11 12

Note: The problems assigned will all be "Exercises" or "Review Exercises" not "Quiz" questions.

No. Due Date Ch. or Sec. Exercises Comments

1 Fri Sep 15 1.1 2, 4
1.2 1
1.3 2
1 (Review) 2, 3, 6 12, 16

2 Fri Sep 22 2.3.1 1
2.3.2 4
2.3.3 2 histogram only, use computer and use right=FALSE for comparison with Figure 2.3.8 in the textbook and the histogram example from class.
2.3.4 2
2.4.1 4 use computer to find median
2.4.2 1(b), 4 see data entry example from class.
2.4.3 2
2 (Review) 4, 6

3 Fri Sep 29 2 (Review) 9abde
3.1.2 1, 3
3 (Review) 3 You don't have to do anything special to avoid plotting the 24th point. Its y value is NA (no value) so it will be ignored.
4.3 1, 3 2 was previously assigned, now dropped
4.4.2 3
4.4.3 2
4.4.4 2
4.5 1, 4
4 (Review) 2, 3

4 Fri Oct  6 4.7.1 2
4.7.3 1
4 (Review) 5, 18
5.2 1, 5b-i
5.4.1 1
5.4.2 2
5.4.3 a-i no number on question, do all parts
5 (Review) 11
A 1, 2 "additional problems" see below.

5 Fri Oct 13 6.2.2 2, 3 You should also do 1 in the sense of visiting the examples page but needn't hand anything in.
6.2.3 2, 3 You should also do 1 in the sense of visiting the examples page but needn't hand anything in.
6.2.4 1, 2
6.4.3 1abcf, 2
6 (Review) 1, 2, 4, 10

6 Fri Oct 27 7.2.1 1
7.2.2 1, 2
7.2.3 1, 3
7.3.1 1, 3
7.5 1, 3
7 (Review) 4, 10, 12, 20

7 Fri Nov  3 A 3, 4, 5, 6 "additional problems" see below. Note that the notes for 3(c) and 4(c) have been corrected.
7 (Review) 15
8.2 2 Recall that the R command fred <- c(0.513, 0.524, 0.529) creates a data vector of those three numbers, and similarly for longer data vectors.
8.3 1, 2 Do each of these problems twice, once using the methods described in Wild and Seber getting the answer in the back of the book, then again using the R function prop.test described in the Lecture Examples for Chapter 8.
8 (Review) 1 Problem 4, originally assigned, is moved to next week.

8 Fri Nov 10 7.5 2 This problem and the next are paired. The two samples can be obtained in Rweb by the statements
x <- density[1:6] y <- density[7:29]
when you are on the Rweb for 3011 page with the dataset
Table 7.2.1 (p. 291) cavend.txt
selected in the "Datasets from Wild and Seber" chooser.
8.4 1 For (a) use the R function t.test, which does the Right Thing, not the answer in the back of the book.
8.5 1, 2
8.6 1, 2
8 (Review) 4, 12
A 7, 8 "additional problems" see below.

9 Wed Nov 22 A 9 "additional problems" see below.
9.2 1, 3, 4
9.3 2, 3, 7
9 (Review) 2, 12, 18

10 Fri Dec  1 10.1.2 1
10.3 1, 2, 3, 4, 5
10 (Review) 4, 6abdef, 12abcefg omit part (c) of 6 and part (d) of 12

11 Fri Dec  8 11.1 1
11.2.1 to 11.2.3 2
11 (Review) 2, 5, 6
12.1.3 the exercise
12.2 the exercise
12.3 the exercise Ignore the part of (c) about "Superimpose the same line on your plot in (a)"

12 Wed Dec 13 A 10, 11, 12, 13 "additional problems" see below. Also see Chi-Square Tests for 2 by 2 Tables for help with additional problem 10
12.4.2 1, 2
12 (Review) 5

No.	Due Date	Ch. or Sec.	Exercises	Comments
1	Fri Sep 15	1.1	2, 4
		1.2	1
		1.3	2
		1 (Review)	2, 3, 6 12, 16
2	Fri Sep 22	2.3.1	1
		2.3.2	4
		2.3.3	2	histogram only, use computer and use `right=FALSE` for comparison with Figure 2.3.8 in the textbook and the histogram example from class.
		2.3.4	2
		2.4.1	4	use computer to find median
		2.4.2	1(b), 4	see data entry example from class.
		2.4.3	2
		2 (Review)	4, 6
3	Fri Sep 29	2 (Review)	9abde
		3.1.2	1, 3
		3 (Review)	3	You don't have to do anything special to avoid plotting the 24th point. Its `y` value is `NA` (no value) so it will be ignored.
		4.3	1, 3	2 was previously assigned, now dropped
		4.4.2	3
		4.4.3	2
		4.4.4	2
		4.5	1, 4
		4 (Review)	2, 3
4	Fri Oct 6	4.7.1	2
		4.7.3	1
		4 (Review)	5, 18
		5.2	1, 5b-i
		5.4.1	1
		5.4.2	2
		5.4.3	a-i	no number on question, do all parts
		5 (Review)	11
		A	1, 2	"additional problems" see below.
5	Fri Oct 13	6.2.2	2, 3	You should also do 1 in the sense of visiting the examples page but needn't hand anything in.
		6.2.3	2, 3	You should also do 1 in the sense of visiting the examples page but needn't hand anything in.
		6.2.4	1, 2
		6.4.3	1abcf, 2
		6 (Review)	1, 2, 4, 10
6	Fri Oct 27	7.2.1	1
		7.2.2	1, 2
		7.2.3	1, 3
		7.3.1	1, 3
		7.5	1, 3
		7 (Review)	4, 10, 12, 20
7	Fri Nov 3	A	3, 4, 5, 6	"additional problems" see below. Note that the notes for 3(c) and 4(c) have been corrected.
		7 (Review)	15
		8.2	2	Recall that the R command `fred <- c(0.513, 0.524, 0.529)` creates a data vector of those three numbers, and similarly for longer data vectors.
		8.3	1, 2	Do each of these problems twice, once using the methods described in Wild and Seber getting the answer in the back of the book, then again using the R function `prop.test` described in the Lecture Examples for Chapter 8.
		8 (Review)	1	Problem 4, originally assigned, is moved to next week.
8	Fri Nov 10	7.5	2	This problem and the next are paired. The two samples can be obtained in Rweb by the statements x <- density[1:6] y <- density[7:29] when you are on the Rweb for 3011 page with the dataset Table 7.2.1 (p. 291) cavend.txt selected in the "Datasets from Wild and Seber" chooser.
		8.4	1	For (a) use the R function `t.test`, which does the Right Thing, not the answer in the back of the book.
		8.5	1, 2
		8.6	1, 2
		8 (Review)	4, 12
		A	7, 8	"additional problems" see below.
9	Wed Nov 22	A	9	"additional problems" see below.
		9.2	1, 3, 4
		9.3	2, 3, 7
		9 (Review)	2, 12, 18
10	Fri Dec 1	10.1.2	1
		10.3	1, 2, 3, 4, 5
		10 (Review)	4, 6abdef, 12abcefg	omit part (c) of 6 and part (d) of 12
11	Fri Dec 8	11.1	1
		11.2.1 to 11.2.3	2
		11 (Review)	2, 5, 6
		12.1.3	the exercise
		12.2	the exercise
		12.3	the exercise	Ignore the part of (c) about "Superimpose the same line on your plot in (a)"
12	Wed Dec 13	A	10, 11, 12, 13	"additional problems" see below. Also see Chi-Square Tests for 2 by 2 Tables for help with additional problem 10
		12.4.2	1, 2
		12 (Review)	5

Additional Problems

1. Suppose the probability of a widget being defective is 0.02. Suppose widgets come in boxes of 12. Assume widget defects are statistically independent.

What is the probability that a box of widgets contains no defects?
What is the probability that a box of widgets contains at least one defective widget?

2. For the probability model for the random variable X defined by the following table

x 0 1
pr(x) 1 - p p

Find E(X).
Find sd(X).

3. Suppose the random variable T has Student(10) distribution (Student's t-distribution with 10 degrees of freedom).

Find P(T < 1.234). Use R or Rweb. (Answer: 0.8772914).
Find P(T > 1.234). Use R or Rweb.
Find P(|T| > 1.234). Use R or Rweb.
Note: the vertical bars are absolute value signs. The question is the same as: Find P(T < -1.234 or 1.234 < T).

4. Suppose the random variable T has Student(7) distribution

Find the t such that P(T > t) = 0.05. Use R or Rweb or Appendix A6 in Wild and Seber. (Answer: 1.894579).
Find the t such that P(T < t) = 0.05. Use R or Rweb or Appendix A6 in Wild and Seber or part (a).
Find the t such that P(|T| > t) = 0.05. Use R or Rweb or Appendix A6 in Wild and Seber.
Note: the vertical bars are absolute value signs. The question is the same as: Find the t such that P(T < -t or t < T) = 0.05.

5. Widgets produced at Acme Widget Works are specified to have 7.00 mm frammis diameter. A random sample of 5 widgets are taken from the production line and their diameters accurately measured. The sample mean was 6.9123 mm and the sample standard deviation 0.0884 mm. Assume that the distribution of frammis diameters is normal, and give an interval that has 95% coverage probability for the true mean frammis diameter of widgets being produced based on Student's t-distribution. Answer: (6.80, 7.02).

6. Jones and Smith are running for Mayor of the town of Outer Boondock. Two polls taken one month apart by the local paper, both with sample sizes of 500, had the results shown below

candidate first poll second poll
Jones 37.2% 42.6%
Smith 45.4% 42.8%
Undecided 17.4% 14.6%

candidate	first poll	second poll
Jones	37.2%	42.6%
Smith	45.4%	42.8%
Undecided	17.4%	14.6%

It appears from the polls that Jones is gaining. But appearances may be deceiving.

Calculate a 2 standard error interval for the difference of the true proportions of the population that would have indicated a preference for Jones on the dates of the polls if the whole population had been asked. Assume that the polls did take a random sample of the population.
Interpret your interval. Does it indicate that Jones is really gaining? Or is no real change a possibility?

7. Redo part (a) of Additional Problem 6 using the R function prop.test rather than hand calculation.

8. Using the data for the first poll in Additional Problem 6 calculate an approximate confidence interval for the difference of proportions of voters favoring Jones and favoring Smith. (Hint: Which of Wild and Seber's three cases is this?)

9. In two polls taken a month apart, each poll sampling 600 likely voters, the preferences expressed for the candidates were

	First Poll	Second Poll
Shrub	45%	50%
Pierce	35%	36%
Bottom	12%	8%
Undecided	8%	6%

Both polls gave their margin of error as 4%.

In the second poll the following results were reported for suburban college educated women (67 were in the sample, about 1 / 9 of the sample).

	Second Poll
Shrub	62%
Pierce	24%
Bottom	9%
Undecided	5%

The large difference between these results and the results for the whole sample caused much woofing among the pundits.

In the questions below, you do not have to be precise (though you can if you want). The simple ``mental adjustments'' recommended by Wild and Seber in Section 8.5.3 for these situations are good enough.

What is the margin of error of the 50% reported for Shrub in the second poll?
What is the margin of error of the 5% increase in the support for Shrub from the first to the second poll?
What is the margin of error of the 14% difference in the support for Shrub and for Pierce in the second poll?
What is the margin of error of the 62% reported for Shrub in this subgroup?
What is the margin of error of the 38% difference in the support for Shrub and for Pierce in this subgroup?

10. For the two polls in additional question 9 the table below gives the actual counts (how many actual people correspond to each cell of the table), which you need for this problem.

	First Poll	Second Poll
Shrub	270	302
Pierce	210	215
Bottom	72	48
Undecided	48	35

The sample size for both polls was 600.

Perform a test of whether there is any difference in the true population proportions of people favoring Shrub at the times of the two polls. Obtain a P-value and interpret the P-value, saying what it implies about support for Shrub. Clearly state whether you did a one-tailed or two-tailed test and why.
Perform a test of whether there is any difference is the true population proportions of any of the categories at the times of the two polls. For this you will need the matrix read into Rweb. The following box does this for you. You only need to supply the correct analysis.
polls <- matrix(c(270, 302, 210, 215, 72, 48, 48, 35), byrow=TRUE, ncol=2) dimnames(polls) <- list(c("Shrub", "Pierce", "Bottom", "Undecided"), c("First", "Second")) polls

Obtain a P-value and interpret the P-value, saying what it implies about support for the various candidates.

11. This is just the exercise for Section 12.4.3 in Wild and Seber (p. 535). The only point of making it an "additional problem" is that you can use the Rweb form below to do it. This loads the data with the outlier removed (from the file gauge.txt which is the same as Wild and Seber's except for the deletion of case 35), so you don't have to dink around with the outlier removal.

12. This is just exercise 3 for Section 12.4.4 in Wild and Seber but done my way rather than their way. The data in question are in the first dataset in the book (heart.txt) and the variables in question are SYSVOL, which is the predictor, and DIAVOL, which is the response.

Make a scatter plot, add the least squares regression line, and print the regression summary. You don't need to do anything with this other than hand it in. Comparing with this stuff you already know may help a bit with the other parts of this exercise.
Make a QQ plot of the residuals (as described in the QQ Plots section of the Chapter 12 examples page). Do you see anything in this plot that is problematic (that may violate the assumptions for linear regression)?
Make a plot of the residuals versus the fitted values (as described in the Residuals Versus Fitted Values section of the Chapter 12 examples page). Do you see anything in this plot that is problematic (that may violate the assumptions for linear regression)?

13. For the coyote data described on p. 56 in Wild and Seber (in the file coyote.txt) recall that the R commands

   males <- length[gender == "male"]
   females <- length[gender == "female"]

put the lengths of the male and female coyotes in different R variables. Suppose we want to test whether there is any "statistically significant" difference in body length between the sexes.

Describe the null and alternative hypotheses of the test.
Does this mean you are doing a one-tailed or a two-tailed test? Explain how you chose which type of test to do.
Perform the test using the computer. Report the test statistic and the P-value of the test.
Interpret the P-value of the test in two different ways.
- In the simple-minded form addressed to worshipers of the number 0.05.
- In a more nuanced form that actually explains the scientific meaning of this hypothesis test.
In each case relate the hypothesis test to coyote length not just "null" and "alternative".