Spring 2023

**Instructor:** Professor Eric Slud,
Statistics Program, Math Dept.,
Rm 2314, x5-5469, slud@umd.edu

**Office hours: **M 1-2, W 10-11 (initially), or email me to make an appointment (can be on Zoom).

**Overview:** This is the second term of a full-year course sequence introducing mathematical statistics at a theoretical graduate level, using tools of advanced calculus and basic analysis. The material in the fall term emphasized conceptual definitions of the standard framework based on families of probability models for observed-data structures, along with the parameter space indexing the class of assumed models. We explained several senses in which functions of observed-data random variables can give a good idea of which of those probability models governed a particular dataset. In the fall term we emphasized finite-sample properties, while the spring term will emphasize large-sample limit theory. Aspects of the theoretical results will be llustrated using demonstrations with statistical simulation.

**Prerequisite:** Stat 700 or {Stat 420 and Math 410}, or equivalent.

**Required Course Text:** P. Bickel & K. Doksum, *Mathematical Statistics, vol.I,
2nd ed.*, Pearson Prentice Hall, 2007.

**Recommended Texts:** *(i)* George Casella and Roger Berger
*Statistical Inference*, 2nd ed., Duxbury, 2002.

*(ii)* V. Rohatgi and A.K. Saleh, *An Introduction to Probability and Statistics*,
2nd ed., Wiley, 2001.

*(iii)* Jun Shao, * Mathematical Statistics*, 2nd ed., Springer, 2003.

*(iv)* P. Billingsley, * Probability and Measure*, 2nd (1986) or later edition, Wiley.

**Course Coverage:** STAT 700 and 701 divide roughly with definitions and properties for finite-sample statistics in the Fall (STAT 700), and large-sample limit theory in the Spring (STAT 701). The division is
not quite complete, because we motivated many topics (Point Estimation, Confidence Intervals, identifiability)
in terms of the Law of Large Numbers. The coverage in the Bickel & Doksum book for the Fall is roughly Chapters 1-4 along with related reading in Casella & Berger or other sources for special topics.

This term, we begin by discussing asymptotic topics related to the comparison of large-sample variances of method of moments and ML estimators (Bickel & Doksum Chapter 5, Sec.3) in the setting of canonical exponential families, to consolidate material on exponential families in the fall and give associated large-sample theorems. Then we will interrupt our development of Chapter 5 material to return to Chapter 4 sections 4.4 and 4.5 to give a thorough introduction to confidence intervals using some large-sample theory (Central Limit Theorem and Law of Large Numbers) to clarify the problem of defining formulas fo confidence interval endpoints. We will similarly return to Neyman-Pearson tests in the large-sample setting to clarify the (approximate) definition of rejection region cutoffs, and also incorporate large-sample asymptotics through the notions of asymptotic size and power, asymptotically pivotal quantity, and asymptotic confidence level. (These topics are covered well in statistical terms, although less rigorously, in Casella & Berger.

Then we re-visit the topic of (multidimensional-parameter) Fisher Information along with numerical maximization of likelihoods, via Newton-Raphson, Fisher scoring, and EM algorithm. We complete this part of the course by proving that MLE's under regularity conditions are consistent and asymptotically normal, with related facts about he behavior of the Likelihood in the neighborhood of the MLE. We follow roughly the development of Chapter 5 of the book of Bickel and Doksum; you can see this material covered non-rigorously in the univariate case in Casella and Berger's Section 10.1. Chapter 6 of Bickel & Doksum covers the asymptotic theory of estimating-equation estimation, unifying linear model theory and maximum likelihood theory in multi-parameter settings. We prove theorems on the asymptotic optimality of MLEs among the large class of "regular" parameter estimators, and complete our discussion of large-sample theory by proving the Wilks theorem for asymptotic chi-square distribituin of Likelihood Ratio Tests along with the large-sample equivalence between Wald tests, score tests, and likelihood ratio tests.
Lecture Topics by Date.

**Grading:** There will be graded homework sets roughly every 1.5--2 weeks (6 or 7 altogether); one in-class test, tenatively on Fri., March 17; and an in-class Final Exam on Wednesday, May 17. The course grade will be based 45% on homeworks, 20% on the in-class test, and 35% on the Exam.

**Homework will generally not be accepted late, and must
be handed in as an upload pdf or png file in ELMS.**

HONOR CODE

The University of Maryland, College Park has a nationally recognized
Code of Academic Integrity, administered by the Student Honor Council.
This Code sets standards for academic integrity at Maryland for all
undergraduate and graduate students. As a student you are responsible
for upholding these standards for this course. It is very important for
you to be aware of the consequences of cheating, fabrication,
facilitation, and plagiarism. For more information on the Code of
Academic Integrity or the Student Honor Council, please visit
http://www.shc.umd.edu.

To further exhibit your commitment to academic integrity, remember to
sign the Honor Pledge on all examinations and assignments:

"I pledge on
my honor that I have not given or received any unauthorized assistance
on this examination (assignment)."

Also: messages and updates (such as corrections to errors in stated homework problems or changes in due-dates) will generally be posted here, and sometimes also through emails in CourseMail.

Additional information:
Important Dates below;

for auxiliary reading, several useful
handouts that are described and linked below;

problem set solutions posted throughout the term.

**(I)) Union-Intersection Tests** covered in Casella and Berger are
discussed in a journal article in

connection with applications to so-called
Bioequivalence trials.

**(II)** Summary of calculations in R comparing three
methods for creating (one-sided)

confidence
intervals for binomial proportions in moderate sized samples.

**(III).** Handout containing single page Appendix from Anderson-Gill article
(Ann. Statist. 1982)

showing how uniform law of large numbers for
log-likelihoods follows from a pointwise strong law.

**(IV).** Handout on the 2x2
table asymptotics covered in a 2009 class concerning different
sampling

designs and asymptotic distribution theory for the log
odds ratio.

**(V).** Handout on Wald, Score
and LR statistics covered in class April 10 and 13, 2009.

**(VI).** Handout on Chi-square
multinomial goodness of fit test.

**(VII)** Handout on Proof of
Wilks Thm and equivalence of corresponding chi-square statistic
with

Wald & Rao-Score statistics which will complete the proof
steps covered in class.

**(IX).** A DIRECTORY OF SAMPLE PROBLEMS FOR OLD IN-CLASS FINALS (with
somewhat different

coverage) CAN BE FOUND HERE.
A list of course topics in scope for the exam can be found here.

**(X).** A directory RScripts containing R scripts and workspace(s) and pdf pictures for class demonstrations of R code and outputs illustrating large sample theory and estimation algorithms. The first set of code 4/24/23 illustrates the large-sample behavior of MLE's for Gamma-distributed data, along with the behavior of the chi-square test of fit.

**(XI).** Handout on EM Algorithm from STAT 705.

**(XII) Background on Markov Chain Monte Carlo:** First see
Introduction and application of MCMC

within an EM estimation problem
in random-intercept logistic regression. For additional pdf files of

"Mini-Course" Lectures, including computer-generated figures, see Lec.1 on Metropolis-Hastings
Algorithm,

and Lec.2 on the
Gibbs Sampler, with Figures that can be found in
Mini-Course Figure Folders.

**Homework:** Assignments, including any changes and hints, will continually be posted here. The most current form of the assignment will be posted also on ELMS. Homework Solutions will be posted here .

**HW 1, due Thursday 2/9/23 11:59pm (6 Problems)**

Read Sections 4.5 and 4.5 of Bickel and Doksum, and do problems 4.4.14 and 4.5.3.

Read Section 5.3 of Bickel and Doksum, and do problems 5.3.8(a)-(c), 5.3.15(b).
In addition, based on exponential family facts (Sections 1.6 and 2.3) and class notes, do the following:

**(A)** Suppose that the discrete, nonnegative-integer-valued i.i.d. random variables
W_{1},...,W_{n} have natural exponential family form with probability mass function
p_{W}(k) proportional to k^{α} exp(-k) I[k ≥ 1] , and define

C(a, b) = ∑_{k ≥ 1} k^{a} (log(k))^{b} exp(-k),

Find a general formula, in terms of C(a,b) for various a,b, for the Asymptotic Efficiency
of the Method of Moments Estimator of α .

**(B)** Consider i.i.d. data (X_{i},Y_{i}) for i=1,...,n, where
(X_{i},Y_{i}) has bivariate normal distribution with EX = EY = 0, Var(X) = Var(Y) = σ^{2}, and E(XY) = ρ σ^{2}. What is the Asymptotic Variance Matrix of the Generalized Method of Moments Estimator of θ = (ρ, σ^{2}) based on equating ∑_{1 ≤ i ≤ n} (X_{i}+Y_{i})^{2} and ∑_{1 ≤ i ≤ n} (X_{i} - Y_{i})^{2} to their expectations. Use this to find the asymptotic relative efficiency of the estimator
of ρ σ^{2} derived from this estimator, as compared with the Cramer-Rao
lower bound for all unbiased estimators of ρ σ^{2} .

**HW 2, due Sunday 2/26/23 11:59pm (7 Problems)**

Read Sections 5.2 of Bickel and Doksum, Example 5.3.4 on Variance Stablizing Transformations, and Section 5.4 through 5.4.3., and do problems 5.2.4, 5.3.10, 5.3.33, 5.4.1(a)-(d). (Problems 5.3.3 and 5.4.1 each count as 1.5 problems.) In addition, do the following 2 problems:

**(A)** Using the fact that X/(X+Y) is Beta(α,β) distributed for X ~ Gamma(α,λ), Y ~ Gamma(β,λ), give formulas expressing the F_{m,n} distribution and t_{k} distribution or density explicitly in terms of Beta distributions or densities.

**(B)** Suppose that Z_{i} ~ N(0,1) for i=1,...,n. Using orthogonal transformations similar to those covered in class establishing the χ^{2}_{n-1} distribution for S^{2},

_{1 ≤ i ≤n} (Z_{i}+μ/√ n)^{2} is equal in distribution to
(Z_{1}+μ)^{2} + ∑_{2 ≤ i ≤n} (Z_{i})^{2}. This distribution is called the ** noncentral chi-square** with noncentrality parameter μ

**HW 3, due Monday 3/13/23 11:59pm (6 Problems)**

Read Sections 5.4 and 5.5, enriched by Chapter 10 of Casella and Berger, and do Bickel and Doksum
problems 5.4.2, 5.4.3, 5.4.10, 5.4.14. In addition, do the following 2 problems:

**(A).** (a). Suppose you analyze a sample X_{1},...,X_{n} of real-valued random variables via Maximum Likelihood assuming that they are N(μ, 1) when they are really double-exponential with density g(x,μ) = (1/2) exp(-|x-μ|) for all real x. What is the asymptotic relative efficiency of your estimator of μ ? (b) Same question if the sample is really N(μ, 1) distributed but you estimate μ via MLE under the assumption that the sample has density g(x,μ).

**(B).** Suppose that V_{1},...,V_{60} are *iid* Exponential(λ) random variables. **(a).** Find an exact two-sided 90% confidence interval for λ with equal tail coverage such that the endpoints respectively are the cutoffs for one-sided UMP test cutoffs with significance level 0.05. **(b).** Find the exact two-sided equal-tailed 90% credible interval based on the data and the prior π(λ) ~ Gamma(2,0.1). **(c).** Find the large-sample approximate 2-sided (frequentist) equal-tailed confidence interval for λ.

* If I simulate 1000 such samples of Exponential data (all with the same λ), which of these intervals do you expect to have the most and least accurate relative frequency of covering the true λ ?*

**HW 4, due Tuesday 4/11/23 11:59pm (6.5 Problems)**

Read Sections 6.1 through 6.1.2 and 6.2 through 6.2.2 in Bickel & Doksum. **Do the
following 6 problems**:

Bickel & Doksum #5.5.2 (counts as 1.5), 6.1.1, 6.1.4, 6.1.14, 6.2.9, 6.2.10.

**HW 5, due Tuesday 4/25/23 11:59pm (6 problems) **

Read Sections 6.3 and 6.4 in Bickel & Doksum. **Do the
following 6 problems**:

Bickel & Doksum #6.3.1, 6.3.5, 6.3.8, 6.4.5, 6.4.6 plus one additional problen

**(A.)** Suppose that (X_{i}, Y_{i}) for 1 ≤ i ≤ n are *iid* pairs of random variables distributed according to X_{i} ~ Expon(λ), and given X_{i}, Y_{i} ~ Poisson(exp(a+bX_{i})).

*adaptive* to whether λ is known or not.

^{^}, b^{^} denote the joint MLEs for (a,b). These estimates are not closed form, but the restricted MLE a^{*} when b=1 is obtainable in closed form. Give the likelihood equations for a^{^}, b^{^} and the closed-form expression for a^{*}.

**HW 6** due Wednesday 5/10/23 11:59pm (6 problems)

**Read Section 2.4.4 and the Handout (XI) on the EM Algorithm, as well as the R Script Rscript-4-24.RLog in the RScripts web-page directory (Handout (X)).**

**(1).** (*counts as 2 problems*) Suppose that you observe lifetimes X_{1},...,X_{n} from a Gamma(a,b) density and are interested in testing
the null hypothesis **H _{0}:** a = 2 versus

**(2).** Suppose you observe independent data X_{1},...,X_{n} in two batches: X_{i} ~ f(x,θ) for 1 ≤ i ≤ m and X_{i} ~ g(x,θ) for m+1 ≤ i ≤ n. Assume that both densities f, g with respect to the **same** parameter θ ∈ **R**^{p} satisfy all the usual regularity conditions in Chapter 6 for Maximum Likelihood theory. Suppose also that m/n → λ ∈ (0,1) as n → ∞ and that consistent maximum likelihood estimators θ^{(1)} and θ^{(2)} for θ exist, respectively based on the data samples X_{1},...,X_{m} and X_{m+1},...,X_{n}. Then state and prove a Central Limit Theorem for the maximum likelihood estimator θ^{^} based on the combined sample. **Note.** This Theorem applies to the MLE found by the EM algorithm in Example 2.4.4 of Section 2.4.4 in Bickel and Doksum.

**(3).** Do Problem **#2.4.18** in Bickel & Doksum.

**(4).** A dataset of 80 observations are generated by 2 lines of R code in
the R Script HW6-DataGen.RLog in the RScripts directory in the STAT 701 Web-page. Perform a Chi-square goodness of fit test for these data at significance level 0.05 to a Weibull density (i.e., a 2-parameter density of the form f(x,α,λ) = λ α x^{α-1} exp(-λ x^{α}) for all x>0, where the parameters α, λ must both be positive, based on the 5 intervals defined by x-axis cut-points 0, 0.35, 0.5, 0.625, 0.8, ∞.

**(5).** Derive the steps for the EM algorithm to find the Maximum Likelihood estimators of the parameters p, λ based on a sample Y_{1},...,Y_{n} from the mixture density f(y,p,λ) = p λ exp(-λ y) + (1-p) 2 λ exp(-2λ y) for all y>0, where p ∈ (0,1), λ > 0. View Y_{i} as the observed data from the complete data X_{i} = (Y_{i}, ε_{i}), where ε_{i} ~ Binom(1,p) and
Y_{i}) ~ Expon((2-ε_{i})λ) given ε_{i}.