STAT 77O: ANALYSIS OF CATEGORICAL DATA FALL 2022
Instructor: Paul J. Smith, Statistics Program Office hours: MWF 11:30-12:30, MF 2:30-3:30 or by appointment. Telephone: (301) 405-5104 E-mail: pjs@umd.edu Schedule: Fall 2022, MWF 1, MTH 0104 Textbook: Agresti, A. (2013). Categorical Data Analysis (3rd ed.). Hoboken, NJ: John Wiley. Prerequisites: STAT 420. Course Description: Categorical data analysis covers a variety of problems, which can be divided into two main areas: predicting a categorical response variable in terms of predictors, which may be categorical or continuous, and inference on the joint probability structure of a vector of several categorical variables. This course will put more emphasis on predicting a categorical response, but both main problem areas will be addressed. The course will briefly cover modern machine learning techniques for categorical data. Data analysis, graphics and interpretation are an essential component of the course, and students will analyze real world data sets, principally using the R (Links to an external site.) statistical computing package (although students may choose to use other packages instead). Course topics: • Brief review of statistical methods for categorical data, analyzing contingency tables, logistic regression. • Generalized linear models. • Binary regression, logistic and other links, model selection and diagnostics. • Multinomial response data, nominal vs. ordered responses, loglinear models for contingency tables. • Clustered categorical data and random effects. • Model free methods, such as those used in machine learning. • Large sample theory for categorical data analysis, particularly likelihood-based theory, will be presented as needed. Computing: Data analysis exercise will require computation. The package R will be employed throughout the course, but those who are familiar with SAS may use that package instead. Examinations and Grading • Midterm: Friday, October 14. • Final: Thursday, December 15, 1:30-3:30 p.m. • Homework: Frequent problem sets will be assigned. These will be a mix of theoretical and applied problems involving analysis of real data sets on the computer. Homework assignments will be posted on ELMS. • Grading: The midterm and final will each count for approximately 20% of the grade and the homework will count for approximately 60%. These percentages are tentative. References • Agresti, A. (2015) Foundations of Linear and Generalized Linear Models. Hoboken, NJ: J. Wiley. • Faraway, J. J. (2005) Linear Models with R. (2nd ed.) Boca Raton, FL: Chapman & Hall/CRC. (A preliminary version of the first edition of this book is available on the web here.) (Links to an external site.) • Faraway, J. J. (2016) Extending the Linear Model with R. (2nd ed.) Boca Raton, FL: Chapman & Hall/CRC. • McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models. (2nd ed.) New York: Chapman and Hall. • Venables, W. N. & Ripley, B. D. (2002). Modern Applied Statistics with S. (4th ed.) New York: Springer.