Instructor: Eric V. Slud, Statistics Program
Sample Problems for Fall 2007 Exam
Take-Home Makeup/Extra-Credit for In-Class
Test
Getting Started in R
Course topics
Course handouts
Homework Assignments
Other links
Course text: Wayne Fuller (2009),
Sampling Statistics, Wiley.
Recommended: Lohr, S. L. (1999).
Sampling: Design and Analysis, Prerequisite: A semester
of statistics at the level of STAT 401 or 420. Course Description:
Pacific Grove, CA: Duxbury.
There is also a second (2009) edition.
Sampling refers to the statistical techniques
used in political polls, marketing surveys, federal data gathering
and many areas of social science and public health.
This course provides an introduction to methods of sampling and
analyzing data from finite populations from
both a theoretical and
applied perspective. It is intended for Statistics and Mathematics
students interested in
applications and for students in the Applied
Statistics track of the Survey Methodology program, as well as
students in disciplines such as business, life science or social
science who need sampling in their research.
The Fuller text emphasizes both mathematical theory
and real data applications, especially those with a regression
flavor. The recommended Lohr text is easier reading, with many
simpler applications. The course material
requires that you
understand basic statistical concepts such as point estimation,
confidence limits, and the
central limit theorem. More advanced
theoretical topics in Fuller's book will be covered, gently,
emphasizing
statements (and in some cases, alternative versions) and
interpretations of theorems rather than proofs.
STAT 440 is part of the required material for the
MATH/STAT/AMSC
MA and PhD Written Examinations in Applied Statistics.
Coverage in Fuller's book: Chapter 1 (with lighter coverage of theory in Sec. 1.3),
Chapters 2-3 (All), and
selections from Chapter 4 (Secs. 4.3-4.4) and
5 (Sec.5.1).
Coverage in Lohr's book: Chapters 1--8 plus topics from Chapters 9 and 11.
References:
Cochran, W. J. (1977). Sampling Techniques (3rd. ed.). New York: J. Wiley.Sarndal, C.-E., Swensson, B., and Wretman, J. (1992). Model Assisted Survey
Sampling. New York: Springer.
Course Requirements and Grading:
There will be an in-class midterm and a final exam on Thursday, Dec. 16 from 4--6 p.m.
There will be frequent homework assignments, 7--8 in all, including both theoretical and
applied problems. Grades will be based on the midterm (25%), homework (40%), and
the in-class final exam (35%)
Course Policies:
(i) As part of the applied homework assignments, students will be expected to do arithmetic
calculations on the computer, which will sometimes involve a small amount of programming.
Students may choose the language or platform, which may range from Spreadsheets to SAS to
R or Splus. However, all computational illustrations in the course and all computer help
offered in an office-hour setting will be restricted to R.
For the systematic Introduction to R and R reference manual distributed with the R software,
either download from the R website or simply invoke the command
> help.start()
from within R. For slightly less extensive introductory tutorials in R, click CUNY or Illinois State.
(ii) Late homework will be accepted late, but grade will always be reduced.
(iii) All homeworks for students taking the course on campus should be handed in as
hard-copy on or before the due date.
Homework Assignments.
Homework solutions including numerical answers, some discussion,and R
scripts, can be found here.
HW1 due Mon. Sep. 13: Fuller
(pp. 76-81) Exercises Sec 1.6: #2, 4, 9, 12, 13,
14.
HW2 due Wed. Oct. 6: Lohr
first edition, Exercises: #1, 6, 12 in Sec. 2.10 (pp.50 ff.) and
5, 6
in Sec.3.6 (pp.88 ff) and 4.9.5, pp. 120-1. Also: do the
problem assigned in class,
verifying that the first general
unbiased estimator for the Horvitz-Thompson estimator in SRS
n out of N sampling agrees precisely with the Sen-Yates-Grundy
estimator.
HW3 due Mon. Dec. 6.: Lohr
first edition, Exercises: #4, 9, 12, 15 in Sec. 5.9 (pp.170 ff.)
and Fuller Chapter 2, problems #7, 10, pp.168-170.
HW4 due Mon. Nov. 24: Lohr
first edition, do the following 5 Exercises.
Chapter 7, #9 and
16 pp.251-2; and Chapter 8, #2, 8;
Plus one additional problem assigned in class:
(I) Suppose that n units from a frame population
U are sampled according to
some probability design π(s)
with single inclusion probabilities πi
which can be assumed
uniformly bounded between 0.1 and 0.3 (the
exact bounds do not matter much), and that individuals
respond
independently of each other and the sample choice, all with the same
(unknown)
probability P(ri=1) =
a. Suppose that the adjustment factor for weights is either
(&Sigmai ε S ri/&pii)/N
or
&Sigmai ε S
ri/&pii / &Sigmai ε S
1/πi. Which of these choices has smaller variance
? Justify your answer.
For the remaining reading in the course, look next
at the Lohr sections on Variance Estimation.
A second problem that I said would be included in this HW will
instead become part of the
Take-Home Exam, to be handed in Thursday
December 16, 4pm.
(II) Variance of survey estimator in a hierarchical design.
A complex survey is designed as follows, based on a hierarchically
structured frame population. The population
is arranged geographically
in 50 counties, with known populations Nc,
c=1,...,50, of which 8 are selected
randomly with replacemement and with probability
proportional to size. Within the selected counties, two
different
plans are followed: for counties 1..25, a stratified simple random
sample of 50 individuals from each of
two strata (Men and Women) is
taken, while for counties 26..50 a simple random sample of 100
individuals is taken.
The attribute yi in this
problem is personal income (measured in units of $10,000).
Given the
following background and summary data(attached in linked text file) from the survey,
(a) Find the survey estimator of the total income in the population, combining
the with-replacement PPS formulas
at the county level with the SRS
and stratified-sample Horvitz-Thompson estimators within-county, and
(b) find an unbiased estimate for the variance of your estimator.