LECTURE KEYWORD SUMMARY, MULTIVARIATE STATISTICS, STAT 750, Spring 2022
=======================================================================
Lec.1, 1/24/22
Keywords to begin with:
Data structure
** table n x (p+d), n=#obs, p=#Outcomes Y_i, d=#predictors X_j
Data Display
** summaries of Outcomes and Predictors by variable,
scatterplots of Y's vs X's
Data Transformation ** linear transformation, projection, centering and rescaling,
subsetting by group, conversion to ranks, other nonlinear recoding
Subsetting X's -- Variable Selection Model Selection
Simultaneous subsetting of X's and Y's so that groups of X's are suitable for predicting
subsets of Y's (Examples: recommender systems or genomics)
Statistics ** sampling distribution (theoretical)
** exact calculation of density under model
** versus Monte Carlo empirical distribution
(use multivariate t or Wishart as examples)
** reference distribution under null hypothesis
Univariate models ** single Y modeled conditionally given multiple X
Multivariate models ** multiple outcomes Y modeled, maybe conditionally given X
================
Lec.2, 1/26/22
Data Display ** correlation (Pearson or Spearman?) pairwise within Y's , within X's
Data Transformation ** conversion to ranks, other nonlinear recoding
Classification/ ** groups g pre-defined via Y's, mapping to be defined as f(X)
Discrimination primarily "supervised" with true labels, sometimes "semi-supervised"
================
Lec.3, 1/28/22
Matrix algebra (see Appendix with that title in Mardia, Kent and Bibby)
Definitions of column-space, row-space, rank, nonnegative-definite
Master result: Singular Value Decomposition, contains
Spectral representation of Symmetric Nonnegative-definite (covariance) matrices
Corollaries: Projection Matrices via SVD, symmetric square-root of covariance matrix
verification of formulas for trace and det
respectively as sum and product of eigenvalues
Expression for joint density f(x) as limiting probability per unit volume for
small boxes decreasing to the point x
================
Lec.4, 1/31/22
Review Jacobian change of variable formula for probability densities of smooth
and smoothly invertible function Y=g(X) of random vector X with density f(x)
Spherical symmetry (rotational invariance) for random vector
Examples of spherically symmetric joint densities
Fact: for rotationally symmetric random p-vector X, R=length(X) and X/R are indep
random variables, with X/R uniformly distributed on the surface of p-dim sphere
(See pdf handout 2. on this topic).
================
Lec.5, 2/2/22
Conclusion of rotational-symmetry topic; hints on Exercises
Equivalent definitions of multivariate normal: via density, via ch.f., and as
affine transformation for vector with iid N(0,1) entries.
=================
Lec.6, 2/4/22
Run-through of properties of multivariate normal: mean , variance, independence equivalent
to uncorrelatdness; generalized inverse of covariance matrix in singular case; density
of multivariate normal on affine subspace in singular-covariance case; maximum-probability
(or minimum-volume for fixed probability) sets as ellipsoids.
================
Lec.7, 2/7/22
Conditional density of Y given X when these random vectors are jointly multivariate normal
Multivariate CLT as justifification for multivariate normal
Mixtures of multivariate normal densities
Maximum likelihood estimation from iid multivariate normal samples
Sufficient statistics and likelihood ratio tests for the mean in multivariate-normal setting
===============
Lec.8, 2/9/22
Conditional densities for one multivariate normal subvector given another
===============
Lec.9, 2/11/22
Xbar and S as MLE's
Formulation of multivariate normal parameter space and hypotheses
===============
Lec.10, 2/14/22
Likelihood ratio test (LRT) and Wilks' Theorem
LRT for null hypothesis of specified multivariate normal mean (one-sample case)
with unrestricted unknown covariance matrix
Wishart distribution, Malanobis distance
===============
Lec.11, 2/16/22
Hotelling T^2 distribution
Independence of Xbar and S based on multivariate normal data matrix
Independence of weighted combinations of rows of n x p multivariate normal data matrix
based on n-dimensional orthonormal vectors of weights
===============
Lec.12, 2/18/22
Two-sample tests of same versus different means in sample populations with unknown
unrestricted variance matrix assumed to be the same across samples
R Script and demonstration of one- and two-sample tests and simulation of p-values
Further distributions arising in Multivariate Normal hypothesis tests (end of Ch.3 MKB)
----------------
Lec.13, 2/21/22
Accuracy of Monte Carlo calculations of distributional percentage points and p-values
Proof that T^2(p,m) Hotelling T^2 distribution is the same as ((mp/(m-p+1))* F_{p,n-p+1}
----------------
Lec.14, 2/23/22
Catalogue of hypothesis tests we obtain for multivariate normal means and variances
using Likelihood Ratio Test, and also using Union Intersection Test idea
Template for obtaining new hypothesis tests based on differently constrained parameters
Relationship between UIT's and simultaneous confidence intervals.
------------------
Lec.15, 2/25/22
Two-sample LRT for equality of covariance matrices
More on UITs and simultaneous CIs: derivations in cases
------------------
Lec.16, 2/28/22
Introduction of Multivariate Regression Model,
Motivation by comparison with univariate regression models and
derivation of MLEs for coefficient matrix B and outcome covariance matrix Sigma
------------------
Lec.17, 3/2/22
Demonstration that B-hat and residual-matrix U-hat are independent in multivariate-normal
regression model, and verification of Wishart distribution for Sigma-hat
Computational demonstration of model fitting and hypothesis tests for correlation
between outcome variables in multivariate regression, and of relation between
conditional distribution of residuals (one column given others) and copmprehensive univariate
regression model for one column $Y^{(j)}$ in terms of X and of other outcome columns $Y^{(-j)}$
------------------
Lec.18, 3/4/22
Completion of Ch.6 MKB: covered Sec 6.3 through 6.3.1
LRT hypothesis test for C1 B M1 = D uin multivariate regression
plus: Multiple Correlation, Partial Correlation
------------------
Lec.19, 3/7/22
MANOVA as regression, LRT with Wilks' Lambda
def'n of Pillai's Trace as alternative
-------------------
Lec.20, 3/9/22
MANOVA table demonstration in R
discussion of Wilks Lambda and
relationship to product of independent Beta's (Thm 3.7.3)
and approximation in cases k=2 or 3 by F's
--------------------
Lec.21, 3/11/22
Brief discussion of sample test review problems
Introduction to Ideal Principal Components (ie, the principal-
component eigenspaces fo the true variance matrix Sigma)
---------------------
Lec.22, 3/14/22
Discussion of HW problem (II) extact T^2(p-1,n-1) distribution
using alternate representation of H0: mu proportional to mu_0
as R mu = 0 , where R (px(p-1)) has rows forming
an orthonormal basis for {mu_0}-orthcomplement (cf. MKB, pp.132-133)
Extended discussion/hints on problems of Sample Test
---------------------
Lec.23, 3/16/22
Further discussion on sample test & review for in-class test
Further introduction to PCA: sample principal components,
general properties, and Principal Component regression
------------------TEST ON 3/18/22
Lec.24, 3/28/22
Discussion of test solutions and further definitions concerning principal components.
------------------
Lec.25, 3/30/22
Illustration of PC software and R calculations "from scratch" on Boston Housing data
in R Script PrinCompBHous.RLog.
------------------
Lec.26, 4/1/22
Large sample theory for estimates of PCs. PC regression to reduce dimensionality of
an outcome dataset. Use of PCs of a variable-set as predictive variables for a different outcome.
------------------
Lec.27, 4/4/22
Introduction of Factor Analysis model. Nonidentifiability due to orthogonal rotations of loadings.
Side condtions (several different versions) to restore identifiability.
Orthogonal-column loadings as one possible side condition for identifiable loadings matrix.
------------------
Lec.28, 4/6/22
Illustration of Factor Analysis R-functions in the 5-company stock-returns example
(#9.4 in Johnson & Wichern) including interpretation of loadings in FactorExmp.RLog script.
------------------
Lec.29, 4/8/22
Principal Factor Method (3 versions: using correlation matrix R in place of S)
(i) direct use of PCs with top-k eigenvectors of S as loadings, then Psi as diag (matrix residual)
(ii) estimate communalities via max correlations (of j'th variable on others), then Psi, then Lambda via spectral decomposition of R - Psi.
(iii) same plan as (ii) but communalities estimated via multiple corr of j'th variable on others.
Contrasted these approximate "principal factor methods" with MLE Factor Model estimates,
used in formal goodness of fit test for model.
------------------
Lec.30, 4/11/22
Introduction of EM Algorithm & Woodbury Identity
for Factor Analysis MLE Calculation via EM
following C. Bishop book's Chapter 12, esp. Sec.12.4
------------------
Lec.31, 4/13/22
Completion of EM Algorithm implementation for Factor Analysis
(Rubin & Thayer 1982)
Computational Illustration on 103x5 stock-returns dataset of
LRT Goodness of Fit test for "Probabilistic PCA"
which is the factor model with Psi = sigma^2 * I_{pxp}
------------------
Lec.32, 4/15/22
Canonical Correlation, motivation and linear-algebra solution
including goodness-of-fit test (under normality) for independence of X, Y
------------------
Lec.33, 4/18/22
Introduction/overview of clustering from all 3 books
1. Model-based
a. Mixture and label-identifier models
b. Density Estimation
2. Criterion/ Algorithm-based
3. Hierarchical Agglomerative/Divisive
4. Other (particularly, Spectral Clustering)
Clustering ** grouping or rule-based subsetting, with the general objective
(subsetting Y's) that Y observations within group are more alike (homogeneous)
than observations across groups,
primarily "unsupervised" without labels, sometimes "semi-supervised"
------------
Lec.34, 4/20 Clustering, continued
Software (library cluster for hierarchical, kmeans, mcluster for mixture models)
Dendrogram data representations
Illustration of clustering and "confusion" matrices for (sample from) iris data
where the true species-based clusters are known.
-------------
Lec.35, 4/22 More on clustering
Further discussion of the IrisCluster.RLog script showing the software implementation
and interpretation of clusters from methods kmeans, agnes, diana, mclust.
The discussion is enriched by the model-based clustering analyses
(with parametric mixture-of-normal models).
General Question: how to assess clustering reliability or quality. Introduce idea of
clustering data-sample replicates ("bootstrapping clusters") to assess reliability.
-----------
Lec.36, 4/25 Bootstrapping -- in general and in Clustering
Nonparametric vs parametric bootstrapping.
Intermediate case of bootstrapping from a "parametric density" defined from a
kernel-density estimator defined from observed data.
Some illustration using the R Script BootMultivar.RLog.
-----------
Lec.37, 4/27 More discussion of bootstrapping specifically related to clustering, using
confusion matrices and metrics like Sensitivity and Positive Predictive Value.
Further R Script illustration using iris data, cf. BootClus.RLog.
Lec.38, 4/29 Illustration of the bootstrapping of clustering with the R Script
Lec.39, 5/2 Kernel methods -- intro of kernels, basic theory
Lec.40, 5/4 Kernel clustering methods -- radial basis function (Gaussian) kernel
and variants
Lec.41, 5/6 More on Kernel clustering, including bootstrapping of the
kernel-based clustering, using script involving iris data.
Lec.42, 5/9 Kernel PCA -- with script illustration, KernelMethods.RLog.
---------------
Eventually we left off the Sparse PCA topics, and several students did them for
final projects:
Sparse PCA, Simultaneous PCA, regularization of PCA in high dimensions