For sample problems you can use to
practice for the in-class tests or the Final Exam,
including last
year's Final and an answer key for the Sample Problems for Test 2, click here.
Answer key for Sample Final Exam Problems
is now also included.
Instructor: Eric Slud, Math. Dept. Rm. 2314, X 5-5469, evs@math.umd.edu
Office hours: Monday 4, Th 2, or by appointment.
Prerequisite: Math 140-141 & Stat 400.
Text: Probability & Statistics for Engineering
and the Sciences with Minitab 14, 7th ed. (2008),
by J. L. Devore, Duxbury Press.
Coverage: In the first 2 weeks, we will review Stat 400
ideas and techniques. Afterwards: we will
cover in
the Devore text: Chapters 7-10, 12, 14 and
parts of 11 and 13, plus some extra handouts
on statistical computing
and simulation. For more detailed topics, see the Chapter tables of
contents
and the official course
syllabus, together with the handouts below.
Grading: The grade in the course will be based 20% on
homeworks (about 8, graded) from the book
and including some
supplementary problems of mine, 10% on data-project homeworks, 40% on 2
in-class
tests, and 30% on a comprehensive final.
Computing: You will need to learn to work with some
statistical computing platform to do simple statistical
calculations
on moderate to large datasets in the course, and to do data
simulations. Calculator or spreadsheet
will not be enough. You may use
Minitab or Matlab or R or a standard statistical
package like SAS or Stata.
However, I will be providing
information and help (and web-posted scripts) only with
R. To find
information about which computer labs on campus
have which of these kinds of software loaded, click here.
You can find information on getting started with R in the
CD that comes with the book, or by visiting the
R web-site from
which you can freely download R software (very similar to Splus)
including miscellaneous
packages and datasets. For an
introductory tutorial in R, click here. For a quick
start, see Rbasics
handout,
and then consider reading more about syntax in a
book, like the (early chapters of the) book of
W. Venables and Brian
Ripley, "Modern Applied Statistics With S" (Springer, currently 4th ed.).
As indicated in the "R_Manual" section
of the Devore text's accompanying CD, you can get a special
R package
containing all of the book's datasets, from a network of web sites
called CRAN that
contain R add-on packages. You do this by the command
> install.packages("Devore7")
To load the files within an R session you type:
> library(Devore7)
Homework Assignments
HW1, due Fri., Feb. 4, .
Reading: Read and Review Chapters 5 and 6
in DeVore. Also read the Handout (1) below on Simulation.
#1 Suppose that the independent random variables
Xi
for i=1,..,100 have density f(x) = 2x
for 0 < x < 1.
(a) Find the approximate probabilities P(
45+10j < S < 55+10j) for j=1,2,3,4, where
S = X1+...+X100.
(b) Find the expectation
and variance of the number of indices i for which
Xi > 0.6.
#2 Read the Simulation of Random Variables Handout and do Problem Sim.3 on page 4 of that handout.
#3 Suppose that U1,...,U40 are
Uniform[0,θ] random variables, observed as data.
(a). Show that the scaled average S1 =
(U1+...+U40)/20 is an unbiased estmator of θ.
(b). Show that for some constant c, c*S2 is
an unbiased estimator of θ , where S2 =
max(U1,...,U40).
Hint: Check that P(S2 < x) =
(x/&theta)40 for 0 < x < θ .
#4 Find the standard error of the two estimators
S1 and c*S2 appearing
respectively in
parts (a) and (b) of Problem #3.
#5 Suppose that Y1,...,Y1000
are independent identically distributed observations with density
f(y) = 1/3
for 0 < y < 1 and f(y) = 2/3
for 1 < y < 2, and for k=1,2,3,4 let Nk =
(# of indices i in 1..1000 with (k-1)/2 < Yi < k/2).
Find the means and variances of each of the relative
frequencies Nk/1000, for k=1,2,3,4.
HW2, due Wed., Feb. 18.
Complete your review of Chapter 6
(ML Estimators), and read Sections 7.1 through 7.3 in
Ch.7 of DeVore.
Then solve and hand in the following problems:
#1, 2 Problems 20 and 28 in Sec. 6.2, p.251.
#3 (Do #11 on p.263 for practice and look at its solution in
the Solutions manual. Then do and hand in the following problem.)
Suppose that you learn a new method of generating 90% two-sided
confidence intervals (L(X), U(X)) for the
unknown mean
μ for samples X1,
..., Xn
of data in which the individual values Xi are
approximately normally distributed, where the sample
size n
is between 35 and 50. Suppose also that you have a method
of simulating independent samples X(r) =
X1,r, ..., X42,r
for
r=1,...,2000, , on each of which you can calculate the
confidence interval Ir = (L(X(r)),
U(X(r))) .
(For these simulated
intervals, you will know the mean parameter μ0 .)
(a) What is the
the approximate number of these confidence intervals
I1, ..., I2000 that you expect to contain
the true mean μ0 ?
(b) What kind
of random variable is the number N of samples
r=1,..,2000 for which μ0 falls outside
Ir ?
(c) What is
the approximate probability that N in (b) is between
185 and 220 , inclusive of endpoints ?
(d)
Approximately how likely is it that of the first 20 of
these samples X(r) and intervals
Ir , r=1,..,20,
all contain μ0 ?
#4, 5 Problems 8 and 10 in Sec. 7.1, p.262.
#6, 7 Problems 18 and 20 in Sec.7.2, p.269.
#8 You can find by clicking here a
dataset consisting of the logarithms of the average annual rainfall in
inches from 70 US and
Puerto Rico cities (data from the 1975
Statistical Abstracts of the United States). (a).
Compute a few scaled relative frequency
histograms of
these data (with different numbers L of class intervals), and hand in
the one that you think best shows the shape of
the underlying
density. Overlay on the same histogram plot
(by hand if necessary) a graph of the normal density curve with the
same mean and variance as the sample mean and variance of your data.
Use this plot and histogram to comment briefly
on whether
you think the assumption of normal distribution for these
data is tenable.
(b). Give a 95% two-sided confidence
interval for the mean of these data, using an assumed-known value 0.25
for the
variance and an assumption of normality for the individual
data points.
(c). Give a 95% two-sided confidence
interval for the mean of these data, assuming normality, if the
variance is unknown.
(d). Re-do parts (b) and (c), giving approximate
large-sample confidence intervals dispensing with the assumption of
normality for the data.
HW3, due Wednesday, March 2. Read the rest of Chapter 7,
and the first 2 sections of Ch.8 of DeVore.
Solve and hand in the
following eight problems:
#1 Use R or other statistical software to simulate
100 samples of size 40 of Gamma(1.3,2.6) data-values
Xi
(i.e., random variables with density f(x)
= (2.6)2 x0.3 e-2.6x for
x > 0, for which the mean is μ = 1.3/2.6 = 0.5).
(a). For each sample
(i.e. each row of a rectangular 100 x 40 array),
calculate Xbar, and use it to
define a large-sample 95% CI for
μ .
(b). Plot in some form (or
print out) the confidence intervals calculated on your 100 samples of
size 40,
indicating whether each CI contains the true value 0.5.
(Each CI is a function only of Xbar for that sample.)
(c). How many times did
your CI fail to capture the true value μ0
? What is the expected number of
times (out of 100) for this to happen
? Should you have been suprised if this happened as few as 1 time ?
if it happened 9 or more times ?
#2 Do #27, p.270 for practice and then hand in the following:
Give some numerical computations (in R or
other computing
platform) showing what the 95% confidence intervals would be (for some
specific
examples of values X1+...+Xn =
k) and what their actual coverage probabilities would be
according to exact Binomial(n,p) probability distributions
for the values
(n=78, p=.57), (n=47,p=.53), (n=46,p=.16)
according to confidence intervals (7.11), (7.10), and the
one
given in problem 27. See the Rscript/Coverage.RLog
script for the necessary R coding.
ALSO: for each of these (n,p)
parameter combinations, give at least one nearby value of n (for same
p) for
which the ordering of "best performance" among the three
intervals is altered.
#3--#4 Do #22, #26, p.269.
#5 Do #38, p.278.
#6 Do #44, p.280.
#7 Do #52, p.281.
#8 Do #12, p.294.
HW4, due Wednesday, March 16. Read the rest of Chapter 8,
and sections 9.1, 9.2 and 9.4
of Ch.9 of DeVore. Solve and hand in
the following eight problems:
Do #10, pp.293-4, #20, p.304, #32, p.306, #42, p.311, #52 and #54, p.317-8.
Do #2, p.334, three ways: using a large-sample Z-approximation
as covered in Sec.9.1; using a pooled
t-test as in Sec. 9.2; and with the
Satterthwaite-Welch approximation as on pp.336-337.
Do #28, p.342.
HW5, due Monday, April 11. Read Section 9.5. Read
Chapter 14 through Section 14.2. Then read
Section 4.6 plus the
handout on Empirical
Distribution Functions.
#1: Problem on power and p-value: suppose that you
see data values X1,...,X31 which can
be assumed
to be iid normally distributed, with Xbar = 24.0
and S = 8.0. Suppose that these date were collected
to test the
hypothesis H0: μ ≤ 22.7
versus HA: μ > 22.7.
(a). Give the
p-value for the test in which you treat this as a large-sample test
(or equivalently,
where you take σ0=8.0 as known).
(b). Find the
power of the size .05 test versus μ1 =
25, again treating the test as a large-sample test.
(c). Re-do
part (a), this time treating the test as a small-n one-sample
t test. This part
of the problem requires you to use a calculator or
PC to calculate the p-value using a
t-distribution probability distribution function program in place of a
table.
#2-3: Ch.9, #62, 64, pp.363-364.
#4: Ch.9, #68. Do a preliminary test for equality of
variances before you decide which two-sample
t interval to use for the
mean difference.
#5-#6: Ch.14, #6, 9, p.575.
#7: Ch. 9, #72, p.365.
#8: Ch.4, #94, p.179. Use R to create
probability plots using qqnorm or
qqplot.
R scripts will be provided,
HW6, due Wednesday, April 20. In Devore, finish
reading Sec. 14.2, and read Ch. 10,
Sections 1 and 2. Solve and
hand in the following six problems:
Ch.14, #8, p.575.
Ch. 4, #92(a), pp.178-179.
Ch.10, # 2, 8. pp.378-379; and # 12, 16, pp.384-385.
HW7, due Wednesday, May 4. Read Sections 12.1 to 12.3,
and solve and hand in
the following problems:
Ch.10, p. 385: #18, 20.
Ch.12, p.453: #6; #12, p.465; #20, p.466;
#34, 36, p.476.
GENERAL GUIDELINES ON HOMEWORK.
1. Academic Dishonesty. You may ask questions of each other
and of me to get hints on how to solve the
various assigned homework
problems. However, you may not share computations and written
work: you
must each do that work and write it up individually.
Homework papers which have identically copied
segments will be
regarded as a violation of the campus honor code.
2. Late Homework and Test Make-Ups. The course policy on
late homeworks is that they will be accepted
but graded down, by 10
percent if past due by no more than one class session and by
25 percent if later than
that. These penalties will be
waived only for medical excuses or valid University-recognized
holidays.
Regarding test make-ups, we will adhere to campus policy.
Sample Problems for Tests and Exams
(1) To practice for Test 1, a series of 10 relevant
applied/computational problems drawn (selected and
edited) from
the DeVore "Testbank" (on the CD-ROM coming with the book's 7th
edition) can be found here.
(You may have
to zoom in with your browser or MS Word reader to read some of the
technical formula
elements in this document.)
(2) A sheet of additional problems relevant to Test 1 can be
found here. These are modified from similar
problems that I have given in the past, which would call for a little
more theoretical interpretation than
the mechanical `applied' problems
coming from the TestBank problem-sheet in (1).
(3) Practice problems for In-Class Test 2 (Wed., Dec. 1) can be found
here, along with an answer key.
For a list of topics and
problem types, click here.
(4). To see last Fall's Stat 401 Final Exam, from another
instructor, click here. In this
Exam, the MLE in
Problem 1 is not a topic that we spent much time
on, but we did spend some time on it and you should be
able to do
it. The other problems are well in the mainstream of what we covered
this semester.
Answer key is included here.
(5). Try the sample-exam from
1995 which I have adapted to conform more closely to what we studied
this term. An answer key is included here.
(Scroll down a bit in the document to find these Sample Exam answers.)
Handouts (some from Stat 400, and some from John Millson):
(1) 10/20/03 There
are two handouts here, respectively on
Transformation
of Random
Variables
and on Random
Number Generation and Simulation . These topics are very
important for the rest of the course, as they allow us to generate
and interpret `artificial data'
to illustrate the meaning of our
Probability Limit Theorems (Law of Large Numbers, Central
Limit
Theorem) and later statistical results (Consistent Statistical
estimators, Confidence
Intervals). In addition, Simulation gives
us an `experimental' avenue to calculate via artificial
data
probabilities which may be too difficult to figure analytically.
(2) As of 8/23/10 See John Millson's Stat 401 page for handouts on diverse topics related to the course.
(3) 10/22/03
The handout on Normal
Approximation to Binomial Distribution contains
a
word-problem worked example, as well as some numerical
examples of the quality of the
normal approximation to the
Binomial. This example is continued below, in a statistical
setting (confidence interval for estimate of a population
proportion in a political opinion poll)
in handout (7) below,
dated 11/19/03.
A graph comparing the
distribution function values of Binom(100,.3) with its
approximating normal distribution N(30,21) can be found here.
(4)
9/29/03 This handout concerns numerical calculations for the Binomial
approximation to
Hypergeometric random variables, and the Poisson
approximation to the Binomial. In addition,
some simulated-data results are given
to show that the expectations and probability mass functions
behave as they should
according to the relative-frequency interpretation of probabilities.
(5) 10/27/03 Example of Simulation for Calculating Probability and Expectation.
(6) 11/3/03
Picture showing the behavior
of sample averages Sn/n as a function of
n from
1,...,2000
on each of four sets of simulated data, from different types
of random variables.
Within each picture, the
sample averages Sn/n are based on progressively larger segments of the
same 2000 data-values, and the point is to see that these
averages settle down to the place where
the Law of Large numbers
guarantee they should for large enough n, namely the
theoretical
expectation of the individual r.v.'s.
(7) 11/12/03
Pictures showing behavior of
scaled relative frequency histograms compared with densities.
The document shows plots of histograms in large simulated samples
overlaid with the theoretical densities
they are supposed to
represent, and of empirical distribution functions overlaid with the
theoretical cdf's
the data in large simulated datsets are
supposed to represent. The latter are available in two settings:
(i) The
overlaid empirical and theoretical cdf's for 1000 simulated values of
Z1+Z2 (sum of two
independent standard
normal deviates) can be found
here .
(ii) The overlaid empirical
and theoretical cdf's for 1000 simulated values of U1+...+U100
(sum of 100 independent Uniform[0,1] independent
deviates) can be found here.
(8) 11/19/03 The
word-problem on political opinion polling begun in handout (3) above,
dated 10/22/03, is continued here from the vantage
point of statistics, particularly
confidence
intervals for estimates of a population proportion in a political opinion
poll.