Statistics 701 Mathematical Statistics II

STUDENTS: Please submit your online evaluation of the course, the text,
and the instructor by signing in by May 14 to the URL for the
Course Experiences web-page at https://www.CourseExp.umd.edu.

Spring 2025 MWF 9-9:50am, PHY2106

Instructor: Professor Eric Slud, Statistics Program, Math Dept., Rm 2314, x5-5469, slud@umd.edu

Office hours: M 1-2, W 10-11 (initially), or email me to make an appointment (can be on Zoom).

Lecture Handouts Statistical Computing Handouts Homework

Overview: This is the second term of a full-year course sequence introducing mathematical statistics at a theoretical graduate level, using tools of advanced calculus and basic analysis. The material in the fall term emphasized conceptual definitions of the standard framework based on families of probability models for observed-data structures, along with the parameter space indexing the class of assumed models. We explained several senses in which functions of observed-data random variables can give a good idea of which of those probability models governed a particular dataset. In the fall term we emphasized finite-sample properties, while the spring term will emphasize large-sample limit theory. Aspects of the theoretical results will be llustrated using demonstrations with statistical simulation.
Large sample theory for Maximum Likelihood Estimation and Estimating Equations will be discussed in detail; in connection with hypothesis testing, we will prove the large-sample equivalence of Wald tests, Rao Score tests, and Likelihood Ratio tests; confidence intervals -- not covered in the fall semester, will be introduced through the "test-based confidence region" duality with likelihood ratio tests and other hypothesis tests. Notions of Locally Most Powerful tests and large-sample relative efficiency of tests will be discussed, with application to the determination of sample size required to achieve specified power. Other topics that will be covered as time permits include: misspecified models, inference with missing data, and introduction to semiparametric models.

Prerequisite: Stat 700 or {Stat 420 and Math 410}, or equivalent.

Required Course Text: P. Bickel & K. Doksum, Mathematical Statistics, vol.I, 2nd ed., Pearson Prentice Hall, 2007.

Recommended Texts: (i) George Casella and Roger Berger Statistical Inference, 2nd ed., Duxbury, 2002.
(ii) V. Rohatgi and A.K. Saleh, An Introduction to Probability and Statistics, 2nd ed., Wiley, 2001.
(iii) Jun Shao, Mathematical Statistics, 2nd ed., Springer, 2003.
(iv) P. Billingsley, Probability and Measure, 2nd (1986) or later edition, Wiley.

Course Coverage: STAT 700 and 701 divide roughly so that definitions and properties for finite-sample statistics are covered in the Fall (STAT 700), and large-sample limit theory in the Spring (STAT 701). The division is not quite complete, because finite-sample confidence intervals and likelihood ratio tests in Chapter 4 are introduced in the first weeks of Stat 701. We continue Stat 701 by consolidating the topics covered in the Fall, from Chapters 1-4, from the viewpoint of behavior of statistical procedures when i.i.d. data samples are large. This will involve discussion of consistency and efficiency of estimators from exponential families, large-sample definitions and behavior of hypothesis tests and confidence intervals, and some decision theory topics where probability limit theory plays a role. We will study more deeply some relationships between likelihood estimators and other classes of "estimating equation" estimators, and will discuss the computational solution of the likelihood and estimating equations and the large-sample properties of the the resulting estimators. The heart of the spring term material is in Chapters 5 and 6 of Bickel and Doksum. We will cover in detail the EM Algorithm and introduce Bayesian theory and MCMC computation.
Readings in Casella and Berger and other sources will be occasional and topic-based.

This term, we begin by discussing asymptotic topics related to the comparison of large-sample variances of method of moments and ML estimators (Bickel & Doksum Chapter 5, Sec.3) in the setting of canonical exponential families, to consolidate material on exponential families in the fall and give associated large-sample theorems. Then we will interrupt our development of Chapter 5 material to return to Chapter 4 sections 4.4 and 4.5 to give a thorough introduction to confidence intervals using some large-sample theory (Central Limit Theorem and Law of Large Numbers) to clarify the problem of defining formulas fo confidence interval endpoints. We will similarly return to Neyman-Pearson tests in the large-sample setting to clarify the (approximate) definition of rejection region cutoffs, and also incorporate large-sample asymptotics through the notions of asymptotic size and power, asymptotically pivotal quantity, and asymptotic confidence level. (These topics are covered well in statistical terms, although less rigorously, in Casella & Berger.)

Then we re-visit the topic of (multidimensional-parameter) Fisher Information along with numerical maximization of likelihoods, via Newton-Raphson, Fisher scoring, and EM algorithm. We complete this part of the course by proving that MLE's under regularity conditions are consistent and asymptotically normal, with related facts about he behavior of the Likelihood in the neighborhood of the MLE. We follow roughly the development of Chapter 5 of the book of Bickel and Doksum; you can see this material covered non-rigorously in the univariate case in Casella and Berger's Section 10.1. Chapter 6 of Bickel & Doksum covers the asymptotic theory of estimating-equation estimation, unifying linear model theory and maximum likelihood theory in multi-parameter settings. We prove theorems on the asymptotic optimality of MLEs among the large class of "regular" parameter estimators, and complete our discussion of large-sample theory by proving the Wilks theorem for asymptotic chi-square distribition of Likelihood Ratio Tests along with the large-sample equivalence between Wald tests, score tests, and likelihood ratio tests.

Grading: There will be graded homework sets roughly every 1.5--2 weeks (6 or 7 altogether); one in-class test, tenatively on Fri., March 17; and an in-class Final Exam on Wednesday, May 17. The course grade will be based 45% on homeworks, 20% on the in-class test, and 35% on the Exam.
Homework will generally not be accepted late, and must be handed in as an upload pdf or png file in ELMS.

HONOR CODE

The University of Maryland, College Park has a nationally recognized Code of Academic Integrity, administered by the Student Honor Council. This Code sets standards for academic integrity at Maryland for all undergraduate and graduate students. As a student you are responsible for upholding these standards for this course. It is very important for you to be aware of the consequences of cheating, fabrication, facilitation, and plagiarism. For more information on the Code of Academic Integrity or the Student Honor Council, please visit http://www.shc.umd.edu.

To further exhibit your commitment to academic integrity, remember to sign the Honor Pledge on all examinations and assignments:
"I pledge on my honor that I have not given or received any unauthorized assistance on this examination (assignment)."

This course web-page serves as the Spring 2025 Course Syllabus for Stat 701.
Also: messages and updates (such as corrections to errors in stated homework problems or changes in due-dates) will generally be posted here, and sometimes also through emails in CourseMail.
Additional information: Important Dates below;
for auxiliary reading, several useful handouts that are described and linked below.

HANDOUTS & OTHER LINKS

Many relevant handouts can already be found on the Stat 700 web-page. Others will be added here
throughout the Spring 2025 semester.

(O). Have a look at the discussion paper of David Donoho on Fifty Years of Data Science, especially if you are
interested in Machine Learning and Data Science. Where do you think this course fits into his scheme of things ?

(I). Union-Intersection Tests covered in Casella and Berger are discussed in a journal article in
connection with applications to so-called Bioequivalence trials.

(II). Summary of calculations in R comparing three methods for creating (one-sided)
confidence intervals for binomial proportions in moderate sized samples. The assessment of coverage
probabilities for CIs for binomial proportions is done in R code in file "BinCvrgScript.tex" and R
workspace "BinCvrg.RData" in Rscripts and interesting pictures of Coverage Prob's for n=77
and Coverage Prob's for p=0.12.

(III). Handout containing single page Appendix from Anderson-Gill article (Ann. Statist. 1982)
showing how uniform law of large numbers for log-likelihoods follows from a pointwise strong law.

(IV). Handout on the 2x2 table asymptotics covered in a 2009 class concerning different sampling
designs and asymptotic distribution theory for the log odds ratio.

(V). Handout on Wald, Score and LR statistics covered in class April 10 and 13, 2009.

(VI). Handout on Chi-square multinomial goodness of fit test.

(VII) Handout on Proof of Wilks Thm and equivalence of corresponding chi-square statistic with
Wald & Rao-Score statistics which will complete the proof steps covered in class.

(IX). A DIRECTORY OF SAMPLE PROBLEMS FOR OLD IN-CLASS TESTS & FINALS (with somewhat different coverage) CAN BE FOUND HERE. A list of course topics in scope for the exam can be found here.

(X). A directory RScripts containing R scripts and workspace(s) and pdf pictures for class demonstrations of
R code and outputs illustrating large sample theory and estimation algorithms. The first set of code 4/24/23
illustrates the large-sample behavior of MLE's for Gamma-distributed data, along with the behavior of
the chi-square test of fit.

(XI). Handout on EM Algorithm from STAT 705.

(XII) Background on Markov Chain Monte Carlo: First see Introduction and application of MCMC
within an EM estimation problem in random-intercept logistic regression. For additional pdf files of
"Mini-Course" Lectures, including computer-generated figures, see Lec.1 on Metropolis-Hastings Algorithm,
and Lec.2 on the Gibbs Sampler, with Figures that can be found in Mini-Course Figure Folders.

(XIII). Zoom lecture (also on ELMS, with recording) from Feb. 12, 2025 on topic of Likelihood Ratio Test and basic MLE consistency.

(XIV). Because I have always found the proof of Theorem 5.2.2 in Bickel and Doksum impenetrable, here is a cleaned-up version of the proof of that Theorem that I sketched in class on Feb.19, 2025.

(XV). Here is a handout, cleaning up the notes from our class of May 5, with categorical-model likelihood theory establishing the relation between parameters fitted by loglinear models and those fitted by Poisson regression.

(XVI). Here is a link to slides on Basic Bootstrap Theory from Lecture slides in a Spring 2021 course I gave on Bootstrap Methods.

Homework: Assignments, including any changes and hints, will continually be posted here. The most current form of the assignment will be posted also on ELMS. Selected homework solutions will be posted to the ELMS course pages.
Homework assignments for Spring 2025 are still under construction.

HW 1, due Saturday 2/8/25 11:59pm (7 Problems)

Read Sections 4.4, 4.5 and 4.9 of Bickel and Doksum, and do problems 4.4.2, 4.4.7, 4.4.10, 4.5.5, 4.9.4
Read Section 5.3 of Bickel and Doksum, and do problem 5.3.10.
In addition, based on exponential family facts (Sections 1.6 and 2.3) or other knowledge about distributions and moment generating functions, do and hand in the following additional problem (A), linked here.

HW 2, due Sunday 2/23/25 11:59pm (7 Problems)

Read Sections 5.2 of Bickel and Doksum, Section 5.3 material on Edgeworth Expansions and Monte Carlo Simulation, plus Section 5.3.3 and Sections 5.4 through 5.4.3, and do problems 4.9.13, 5.2.4, 5.2.5, 5.3.13, 5.3.16, 5.3.28. Also do the extra problem (B), again linked to the Extra Problems Assigned.

HW 3, due Monday 3/10, 11:59pm (6 problems counting as 7): Reading for this HW set is: Bickel and Doksum sections 5.3.3 through 5.5, Section 2.1.2 and Section 2.4 (on Numerical Likelihood optimization including Newton-Raphson and EM algorithms plus Fisher scoring defined in problem 6.5.1). Do the following 6 problems, to be handed in by Bickel and Doksum # 5.4.4, 5.4.5, 2.3.1, 2.4.10, plus the two extra problems (C) and (D) given here:

(C). (i). Suppose you analyze a sample X₁,...,X_n of real-valued random variables via Maximum Likelihood assuming that they are N(μ, 1) when they are really double-exponential with density g(x,μ) = (1/2) exp(-|x-μ|) for all real x. What is the asymptotic relative efficiency of your estimator of μ ?

(ii) Same question if the sample is N(μ, 1) distributed but you estimate μ via MLE assuming that the sample has density g(x,μ).

In this problem you need to establish asymptotic normality for the sample median and find its asymptotic variance. The argument by which you can do this is provided in a Handout, and you must fill in details of Step 3 in that handout to complete the problem.

(D). [Counts as 2 problems] Suppose that f₀ is a known probability density on the real line, and that a location-scale family is given (for all real μ, and all positive σ) by

f(x, μ, σ) = (1/σ) f₀((x-μ)/σ) , all real x

(i) Find a formula for the 2x2 per-observation Fisher Information Matrix for this kind of data X ~ f(x, μ, σ) , which should involve only μ, σ, and some numerical constants which involve integrals defined from f₀ and its derivatives.

(ii) Specialize your result in (i) to the logistic location-scale density

f(x,μ, σ) = (1/σ) exp((x-μ)/σ)/{1 + exp((x-μ)/σ)}², all real x

Find an explicit formula in terms of θ = (μ, σ) for the 2x2 Fisher information matrix, involving constants that you find by numerical integration.

(iii) Assume that you see "data" X_i, i=1,...,20, given by values 0.03 + 0.3*i in the logistic model of part (ii). Find the MLE for (μ, σ) both by Newton-Raphson and Fisher Scoring, which DIFFER in this problem, using the initial guess μ₀ = 0.3 and σ₀ = 0.14. In each of the Newton-Raphson and Fisher Scoring solutions, give the entire iteration history needed to obtain the MLE to 4-decimal-place accuracy. NOTE: in this example, one of these methods is much more stable than the other to bad starting points. Which is the stable one ? Try (μ₀, σ₀) = (.3, .16) or (.2,.2).

HW 4, due Wednesday 4/9, 11:59pm (6 Problems counting as 7): Reading for this HW set is: Sections 5.4 and 5.5, enriched by Chapter 10 of Casella and Berger, plus Section 2.4.4 on EM algorithm, and do Bickel and Doksum problem 5.4.10, plus 2.4.1, 2.4.4, 2.4.5.

Also read Sections 6.2 through 6.2.2 in Bickel & Doksum, and do the problems: 6.2.10, 6.2.11.

HW 5, due Wednesday 4/23/25 11:59pm (6 problems)

Read Sections 6.3 and 6.4 in Bickel & Doksum. Do the following 6 problems in Bickel and Doksum:
#6.3.4 (but restrict to δ > 0 in part (a)), 6.3.5 (a)-(c) only, 6.3.6, 6.4.4, 6.4.13 plus one additional problem:

(E.) Suppose that (X_i, Y_i) for 1 ≤ i ≤ n are iid pairs of random variables distributed according to X_i ~ Expon(λ), and given X_i, Y_i ~ Expon(a (X_i)^b).
(a) Show that estimation of (a,b) is adaptive to whether λ is known or not.
(b) Let a^{^}, b^{^} denote the joint MLEs for (a,b). These estimates are not closed form, but the restricted MLE a^* when b=1 is obtainable in closed form. Give the likelihood equations for a^{^}, b^{^} and the closed-form expression for a^*.
(c) In terms of the estimators in (b), give the Wald, Rao-Score, and Likelihood-Ratio Tests of asymptotic significance level α for the hypothesis H: b=1 versus the general alternative, where (a,λ) are unknown (nuisance) parameters.

HW 6 due Monday 5/12/25 5:00pm (7 problems)

(1). (counts as 2 problems) Suppose that you observe lifetimes X₁,...,X_n from a Gamma(a,b) density and are interested in testing the null hypothesis H₀: a = 2 versus H₁: a > 2 . Here both a and b are unknown positive statistical parameters.
(a). Give the test statistic and asymptotic cutoff for the score hypothesis test that is "locally optimal" against alternatives a that are very close to (but greater than) 1.5.
(b) Is your hypothesis test the one you would use for testing H₀ versus the alternative H₂: a = 2.5 ? Why or why not ? [Note that H₂ is not a "point alternative" hypothesis because b is still unknown.]
(c) When n=60, approximate both the power of your test in (a) and also the one you would use in (b) versus H₂. The answer depends on α and b.

(2). Suppose you observe independent data X₁,...,X_n in two batches: X_i ~ f(x,θ) for 1 ≤ i ≤ m and X_i ~ g(x,θ) for m+1 ≤ i ≤ n. Assume that both densities f, g with respect to the same parameter θ ∈ R^p satisfy all the usual regularity conditions in Chapter 6 for Maximum Likelihood theory. Suppose also that m/n → λ ∈ (0,1) as n → ∞ and that consistent maximum likelihood estimators θ⁽¹⁾ and θ⁽²⁾ for θ exist, respectively based on the data samples X₁,...,X_m and X_m+1,...,X_n. Then state and prove a Central Limit Theorem for the maximum likelihood estimator θ^{^} based on the combined sample.

(3). A dataset of 100 observations are generated by the lines of R code in the R Script HW6-DataGenSp25.txt in the RScripts directory in the STAT 701 Web-page. Perform a Chi-square goodness of fit test for these data at significance level 0.05 to a 2-parameter Normal density, based on the 5 quintile intervals defined by x-axis cut-points -∞, 1.57, 3.79, 5.30, 6.57, ∞. Re-do your goodness of fit test based on the same cut-points using only the first 50 points in the dataset, and compare the results. Do the results make sense ? Explain.

(4). A dataset of 200 observations, in the form of a 5 x 4 contingency table of multinomial counts, are generated by the lines of R code in the R Script HW6-DataGenSp25.txt in the RScripts directory in the STAT 701 Web-page. The multinomial parametric model to use in analyzing these data are that the cell probability for row i and column j is p_ij = exp( (i-3)*b₁ + (j-2)*b₂ + (i-3)*(j-2)*b₃). Note: you will need to use the form of the likelihood for this model in numerical calculation of maximum likelihood estimates. Please ask me if you would like help with an R script for that calculation.
(a). Within this parametric model, with unknown parameter, θ = (b₁,b₂,b₃), test the null hypothesis H₀: b₃=0 against the general alternative of non-zero b₃, at significance level 0.05.
(b). Using the same data, test the goodness of fit of the two-parameter model H₀ versus the general completely unrestricted multinomial model, at significance level 0.05.
(c). Using the same data, test the goodness of fit of the two-parameter model H₀ versus the alternative of row-column independence, at significance level 0.05.

(5). Consider the problem of estimating θ ∈ (-1,1) in the density f(x,θ) = I_{[|x| ≤ 1]} (1+xθ)/2 ,
based on a large iid sample. The MLE is not available in closed form, but as we saw on the in-class test, the GMM estimator is very easy to calculate and is nearly fully efficient when θ is close to 0. Now consider θ values not necessarily close to 0 and define θ^* as the minimizer of
(n^-1 ∑_{1 ≤ i ≤ n} X_i - g₁(θ))² + (n^-1 ∑_{1 ≤ i ≤ n} I_{[X_i > 0]} - g₂(θ))²
where g₁(θ) = E_θ(X₁) and g₂(θ) = P_θ(X₁ > 0). Express this estimator as the solution of an estimating equation and find its relative efficiency (ARE) with respect to the MLE. Tabulate or plot the ARE for θ values 0.2, 0.4, 0.6, 0.8.

(6). Consider the problem of testing the hypothesis H₀: θ₁ = 1/3 based on a data sample
Y_i ~ Gamma(θ₁, θ₂), i=1,...,n, based on the method of moments estimator (θ₁^*, θ₂^*).
(a). Show that the method of moments estimator in this problem is the same as the minimum contrast estimator with ρ(y,θ) = (y-θ₁/θ₂)² + (y²-θ₁(1+θ₁)/θ₂²)².
(b). Find the rejection region for the large-sample test based on the minimum contrast estimator θ₁^*, at
significance level 0.05.

Important Dates

First Class: January 27, 2025

Mid-Term Exam: Fri., March 14

Spring Break: March 16-23

Last Day of Classes: Mon., May 12

Review Session for Exam: Wed., May 14, 9-10:30am (same classroom)

Final Examination: Mon., May 19, 1:30-3:30 pm (same classroom)

Statistics 701 Mathematical Statistics II

STUDENTS: Please submit your online evaluation of the course, the text,
and the instructor by signing in by May 14 to the URL for the
Course Experiences web-page at https://www.CourseExp.umd.edu.

HANDOUTS & OTHER LINKS

Important Dates

Return to my home page.

© Eric V Slud, May 8, 2025.

Statistics 701 Mathematical Statistics II

STUDENTS: Please submit your online evaluation of the course, the text, and the instructor by signing in by May 14 to the URL for the Course Experiences web-page at https://www.CourseExp.umd.edu.

HANDOUTS & OTHER LINKS

Important Dates

Return to my home page.

© Eric V Slud, May 8, 2025.

STUDENTS: Please submit your online evaluation of the course, the text,
and the instructor by signing in by May 14 to the URL for the
Course Experiences web-page at https://www.CourseExp.umd.edu.