


March 20, 1998, Spring 1998 (STAT 798C)
March 31, 2001, Spring 2001 (STAT 798L)
October 20, Fall 2010, STAT 705
October  7, Fall 2011, STAT 705
October 15, Fall 2012, STAT 705
October 6,8 Fall 2014, STAT705

EM/STAT798C/glm.LAshumway, S-plus
stat/STAT705/glm.LAshumway, R (AIC, ts.plot different)

0. GLM Analysis of Los Angeles Mortality Data 1970-1979
A. Introduction to lm() and glm().
B. Comparison between Poisson, Negative Binomial and Quasi-Likelihood
   For QUASI go to 7777777777777777777777 at the end.
----------------------------------------------
Nice way to plot two ts
M1 <- glm(y ~ y1 + y2, family = quasi(link = "log", variance = "mu")) 
ts.plot(ts(M1$y),ts(M1$fitted.values),xlab="Weeks",ylab="Counts",type="l",col=c(1,2))
legend('topright', col=c(1,2), c('original','QLM'),pch=c('-','-'))
-----------------------------------------------------------------



          GLM Analysis of Los Angeles Mortality Data 1970-1979
       =========================================================

LA Pollution-Mortality Study: 
1970-1979, 508 observations,  6-day spacing. Weekly FILTERED data.
The data were lowpass filtered, filtering out frequencies above 0.1
cycles per day.

        
Mortality:          (1) Mrt: Total Mortality
                    (2) Rsp: Respiratory Mortality
                    (3) Crd: Cardiovascular Mortality

Weather:            (4) Tmp: Temperature
                    (5) Hum: Relative Humidity

Pollution:          (6) Crb: Carbon Monoxide
                    (7) Slf: Sulfur Dioxideglm.LAshumway 
                    (8) Nit: Nitrogen Dioxide
                    (9) Hdr: Hydrocarbons
                   (10) Ozn: Ozone
                   (11) Par: Particulates


Shumway, R.H., A.S. Azari and Y. Pawitan (1988). Modeling mortality
fluctuations in Los Angeles as functions of pollution and weather
effects.  Environmental Research, 45, 224-241. (RA421.E5)


1. Reading in the data
-----------------------

Data form as received from Shumway Sat, 11 Feb 1995 (508 by 11):

 183.63  11.90  97.85  72.38  29.20  11.51   3.37   9.64  45.79  6.69  72.72
 191.05  10.75 104.64  67.19  67.51   8.92   2.59  10.05  43.90  6.83  49.60
 180.09   9.33  94.36  62.94  61.42   9.48   3.29   7.80  32.18  4.98  55.68
 184.67   9.54  98.05  72.49  58.99  10.28   3.04  13.39  40.43  9.25  55.16
 173.60   8.27  95.85  74.25  34.80  10.57   3.39  11.90  48.53  9.15  66.02
..................


A <- matrix(scan("LAshumway"), byrow=T,ncol=11)
Read 5588 items

> A[1:5,]
       [,1]  [,2]   [,3]  [,4]  [,5]  [,6] [,7]  [,8]  [,9] [,10] [,11]
[1,] 183.63 11.90  97.85 72.38 29.20 11.51 3.37  9.64 45.79  6.69 72.72
[2,] 191.05 10.75 104.64 67.19 67.51  8.92 2.59 10.05 43.90  6.83 49.60
[3,] 180.09  9.33  94.36 62.94 61.42  9.48 3.29  7.80 32.18  4.98 55.68
[4,] 184.67  9.54  98.05 72.49 58.99 10.28 3.04 13.39 40.43  9.25 55.16
[5,] 173.60  8.27  95.85 74.25 34.80 10.57 3.39 11.90 48.53  9.15 66.02

##Better: Give col names using dimnames.
  
> A <- matrix(scan("LAshumway"), byrow=T,ncol=11,
dimnames=list(NULL,c('Mrt','Rsp','Crd','Tmp','Hum','Crb',
'Slf','Nit','Hdr','Ozn','Par')))  ###OK!!!

> A[1:5,]
        Mrt   Rsp    Crd   Tmp   Hum   Crb  Slf   Nit   Hdr  Ozn   Par 
[1,] 183.63 11.90  97.85 72.38 29.20 11.51 3.37  9.64 45.79 6.69 72.72
[2,] 191.05 10.75 104.64 67.19 67.51  8.92 2.59 10.05 43.90 6.83 49.60
[3,] 180.09  9.33  94.36 62.94 61.42  9.48 3.29  7.80 32.18 4.98 55.68
[4,] 184.67  9.54  98.05 72.49 58.99 10.28 3.04 13.39 40.43 9.25 55.16
[5,] 173.60  8.27  95.85 74.25 34.80 10.57 3.39 11.90 48.53 9.15 66.02

Or better:
> head(A)
        Mrt   Rsp    Crd   Tmp   Hum   Crb  Slf   Nit   Hdr  Ozn   Par
[1,] 183.63 11.90  97.85 72.38 29.20 11.51 3.37  9.64 45.79 6.69 72.72
[2,] 191.05 10.75 104.64 67.19 67.51  8.92 2.59 10.05 43.90 6.83 49.60
[3,] 180.09  9.33  94.36 62.94 61.42  9.48 3.29  7.80 32.18 4.98 55.68
[4,] 184.67  9.54  98.05 72.49 58.99 10.28 3.04 13.39 40.43 9.25 55.16
[5,] 173.60  8.27  95.85 74.25 34.80 10.57 3.39 11.90 48.53 9.15 66.02
[6,] 183.73  7.55  95.98 67.88 66.78  7.99 2.57 10.11 48.61 8.80 44.01



> is.matrix(A)
[1] TRUE
> is.data.frame(A)
[1] FALSE

> A[1:5,'Mrt']
[1] 183.63 191.05 180.09 184.67 173.60 ###OK! 

> A$Mrt[1:5] 
 Error in A$Mrt: $ operator is invalid for atomic vectors  ##Not helpful.
        
> LA <- data.frame(A)    ###Change into data frame
> is.data.frame(LA)
[1] TRUE
> dim(LA)
[1] 508  11

> LA[1:5,]  ##Observe the different row names.
     Mrt   Rsp    Crd   Tmp   Hum   Crb  Slf   Nit   Hdr  Ozn   Par     
1 183.63 11.90  97.85 72.38 29.20 11.51 3.37  9.64 45.79 6.69 72.72
2 191.05 10.75 104.64 67.19 67.51  8.92 2.59 10.05 43.90 6.83 49.60
3 180.09  9.33  94.36 62.94 61.42  9.48 3.29  7.80 32.18 4.98 55.68
4 184.67  9.54  98.05 72.49 58.99 10.28 3.04 13.39 40.43 9.25 55.16
5 173.60  8.27  95.85 74.25 34.80 10.57 3.39 11.90 48.53 9.15 66.02
> 

NOTE: Good to attach(LA). Then can use directly ts.plot(Mrt) etc.

> LA[1:5,'Mrt']
[1] 183.63 191.05 180.09 184.67 173.60 ###OK! 

> LA$Mrt[1:5]
[1] 183.63 191.05 180.09 184.67 173.60 ###OK! 

> mean(LA$Mrt)
[1] 169.0482

> var(LA$Mrt) ##Var is larger: 201.1539/169.0482 = 1.18992
[1] 201.1539    Poisson reg is reasonable as the ratio is not far from 1. 


> summary(LA$Mrt)
  Min. 1st Qu. Median Mean 3rd Qu.  Max. 
 142.1   159.6  166.7  169   176.4 231.7

2. Looking at the data
-----------------------
##Relationships between the variables
> pairs(LA)   ##FIG 1

##The corresponding spectral estimates
par(mfrow=c(3,3), oma=c(0,0,5,0))

ts.plot(LA$Mrt)        ###Same as tsplot(A[,'Mrt'])
mtext("Mrt=Total Mortality", line=0,cex=1.2)
PR.Mrt <- spectrum(LA$Mrt) ##Raw Periodogram
SP.Mrt <-  spectrum(LA$Mrt, spans=c(3,5,7)) ##Smoothed Periodogram

ts.plot(LA$Crb)
mtext("Crb=Carbon Monoxide", line=0,cex=1.2)
PR.Crb <- spectrum(LA$Crb) 
SP.Crb <- spectrum(LA$Crb, spans=c(3,5,7)) 

ts.plot(LA$Par)
mtext("Par=Particulates", line=0,cex=1.2)
PR.Par <- spectrum(LA$Par)
SP.Par <- spectrum(LA$Par, spans=c(3,5,7))        ##FIG 2

mtext("LA Mortality and Covariate Time Series", outer=T, cex=1.5)


ts.plot(LA$Mrt)        ###Same as tsplot(A[,'Mrt'])
mtext("Mrt=Total Mortality", line=0,cex=1.2)
PR.Mrt <- spectrum(LA$Mrt) ##Raw Periodogram
SP.Mrt <-  spectrum(LA$Mrt, spans=c(3,5,7)) ##Smoothed Periodogram

ts.plot(LA$Tmp)
mtext("Par=Particulates", line=0,cex=1.2)
PR.Par <- spectrum(LA$Tmp)
SP.Par <- spectrum(LA$Tmp, spans=c(3,5,7))    

ts.plot(LA$Crb)
mtext("Crb=Carbon Monoxide", line=0,cex=1.2)
PR.Crb <- spectrum(LA$Crb) 
SP.Crb <- spectrum(LA$Crb, spans=c(3,5,7)) 

mtext("LA Mortality and Covariate Time Series", outer=T, cex=1.5)

###Find the index of the maximum periodogram ordinate

> names(SP.Mrt)
 [1] "freq"      "spec"      "coh"       "phase"     "spans"     "filter"   
 [7] "df"        "bandwidth" "n.used"    "series"    "method"    "taper"    
[13] "pad"       "detrend"   "demean"   
> length(SP.Mrt$freq)
[1] 256
> length(SP.Mrt$spec)
[1] 256

> min(SP.Mrt$freq)
[1] 0.001953125   ###Almost 0.
> max(SP.Mrt$freq)
[1] 0.5


> (1:length(SP.Mrt$spec))[SP.Mrt$spec==max(SP.Mrt$spec)]
[1] 10
> SP.Mrt$freq[10]
[1] 0.01953125

> (1:length(SP.Crb$spec))[SP.Crb$spec==max(SP.Crb$spec)]
[1] 10
> SP.Crb$freq[10]
[1] 0.01953125

> (1:length(SP.Par$spec))[SP.Par$spec==max(SP.Par$spec)]
[1] 10
> SP.Par$freq[10]
[1] 0.01953125

#Period: 
> 1/0.01953125
[1] 51.2            ###Period measured in the time interval used.
                       That is 51.2 such intervals.
> (1/0.01953125)*7
[1] 358.4 days      ###<--- About one year.

 
3. Poisson Regression: Use canonical link throughout
   Non-integer data!!! Get 50 warnings.
------------------------------------------------------
> class(LA)
[1] "data.frame" ##Need this


Response  : Mrt   
Covariates: Rsp Crd Tmp Hum Crb Slf Nit Hdr Ozn Par 
 
PoisReg.All <- glm(formula = Mrt ~ Rsp+Crd+Tmp+Hum+Crb+Slf+Nit+Hdr+Ozn+Par, 
                            family=poisson, data=LA)
OR:

PoisReg.All <- glm(Mrt ~ Rsp+Crd+Tmp+Hum+Crb+Slf+Nit+Hdr+Ozn+Par, 
                            family=poisson, data=LA)

There were 50 or more warnings (use warnings() to see the first 50)

Warning messages:
1: In dpois(y, mu, log = TRUE) : non-integer x = 183.630000
2: In dpois(y, mu, log = TRUE) : non-integer x = 191.050000
3: In dpois(y, mu, log = TRUE) : non-integer x = 180.090000
4: In dpois(y, mu, log = TRUE) : non-integer x = 184.670000
5: In dpois(y, mu, log = TRUE) : non-integer x = 173.600000
............................................................
............................................................
47: In dpois(y, mu, log = TRUE) : non-integer x = 203.750000
48: In dpois(y, mu, log = TRUE) : non-integer x = 202.200000
49: In dpois(y, mu, log = TRUE) : non-integer x = 188.990000
50: In dpois(y, mu, log = TRUE) : non-integer x = 190.970000

> names(PoisReg.All)
 [1] "coefficients"      "residuals"         "fitted.values"
 [4] "effects"           "R"                 "rank"
 [7] "qr"                "family"            "linear.predictors"
[10] "deviance"          "aic"               "null.deviance"
[13] "iter"              "weights"           "prior.weights"
[16] "df.residual"       "df.null"           "y"
[19] "converged"         "boundary"          "model"
[22] "call"              "formula"           "terms"
[25] "data"              "offset"            "control"
[28] "method"            "contrasts"         "xlevels"
 
> PoisReg.All

Call:  glm(formula = Mrt ~ Rsp + Crd + Tmp + Hum + Crb + Slf + Nit + 
           Hdr + Ozn + Par, family = poisson, data = LA)

Coefficients:
(Intercept)          Rsp          Crd          Tmp          Hum          Crb
  4.485e+00    6.005e-03    6.380e-03   -1.204e-04   -7.808e-05    4.958e-04
        Slf          Nit          Hdr          Ozn          Par
 -7.446e-03   -4.668e-04    7.655e-04    2.230e-03    1.080e-04

Degrees of Freedom: 507 Total (i.e. Null);  497 Residual
Null Deviance:      588.6
Residual Deviance: 57.85        AIC: Inf



> summary(PoisReg.All)

Call:
glm(formula = Mrt ~ Rsp + Crd + Tmp + Hum + Crb + Slf + Nit +
    Hdr + Ozn + Par, family = poisson, data = LA)

Deviance Residuals:
       Min          1Q      Median          3Q         Max
-1.1297829  -0.2238007   0.0005743   0.2097785   1.3165738

Coefficients:
              Estimate Std. Error z value Pr(>|z|)
(Intercept)  4.485e+00  1.006e-01  44.581  < 2e-16 ***
Rsp          6.005e-03  1.496e-03   4.014 5.97e-05 ***
Crd          6.380e-03  5.984e-04  10.661  < 2e-16 ***
Tmp         -1.204e-04  9.268e-04  -0.130    0.897
Hum         -7.808e-05  3.805e-04  -0.205    0.837
Crb          4.958e-04  2.538e-03   0.195    0.845
Slf         -7.446e-03  5.646e-03  -1.319    0.187
Nit         -4.668e-04  2.274e-03  -0.205    0.837
Hdr          7.655e-04  6.159e-04   1.243    0.214
Ozn          2.230e-03  1.840e-03   1.212    0.226
Par          1.080e-04  5.172e-04   0.209    0.835
---
Signif. codes:  0 '***. 0.001 '**' 0.01 '*' 0.05 '.'  0.1 ''   

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 588.609  on 507  degrees of freedom
Residual deviance:  57.845  on 497  degrees of freedom
AIC: Inf

Number of Fisher Scoring iterations: 3


##NOTE: "residuals" refer here to the "working" residuals defined as
        (y-hat(mu))/(d mu/d eta) where the derivative is evaluated at
        hat(mu). In our case of Poisson regression (d mu/d eta)=mu,
        since mu=exp(eta). So the working residuals are: 

                   (y-hat(mu))/hat(mu).
      
(y-hat(mu))/hat(mu)= (y-fitted value)/(fitted value)=
(PoisReg.All$y-PoisReg.All$fitted.values)/PoisReg.All$fitted.values

##Check:

The 7th,8th,...,11th working residuals are:

> ((PoisReg.All$y-PoisReg.All$fitted.values)/PoisReg.All$fitted.values)[7:11]
         7          8          9          10         11 
0.01862911 0.04731465 0.06041701 -0.01871553 0.02238912

Now from "residuals"  ###Get the same.
> PoisReg.All$residual[7:11]
         7          8          9          10         11 
0.01862911 0.04731465 0.06041701 -0.01871553 0.02238912

Now from resid(): ###Gives residuals when applied to glm objects.
                  ###Get the same.
> resid(PoisReg.All, type="working")[7:11] 
         7          8          9          10         11 
0.01862911 0.04731465 0.06041701 -0.01871553 0.02238912

##In the function resid(), "type" should be one of: 
  "deviance", "pearson", "working", "response"


In what follows we use "residuals"; that is the working residuals.
At the very end we also use the response residuals for comparing models.
-------------------------------------------------------
NOTE: The "Residual Deviance" is the scaled deviance.
-------------------------------------------------------
NOTE: To get n! use gamma(n+1).    
      To get log(n!) use lgamma(n+1)
-------------------------------------

##From Splus get the same plus corr matrix:
> summary(PoisReg.All) 

Call: glm(formula = Mrt ~ Rsp + Crd + Tmp + Hum + Crb + Slf + Nit +
                          Hdr  + Ozn + Par, family = poisson, data = LA)

Deviance Residuals:
       Min         1Q       Median        3Q      Max 
 -1.129783 -0.2238007 0.0005742856 0.2097785 1.316574

Coefficients:
                    Value   Std. Error     t value 
(Intercept)  4.484626e+00 0.1005944532  44.5812479
        Rsp  6.004882e-03 0.0014960379   4.0138569
        Crd  6.379706e-03 0.0005984307  10.6607267
        Tmp -1.203571e-04 0.0009268404  -0.1298575
        Hum -7.808055e-05 0.0003804829  -0.2052143
        Crb  4.957806e-04 0.0025383581   0.1953155
        Slf -7.446004e-03 0.0056458880  -1.3188366
        Nit -4.667693e-04 0.0022744029  -0.2052272
        Hdr  7.655378e-04 0.0006158848   1.2429887
        Ozn  2.230044e-03 0.0018401519   1.2118802
        Par  1.079798e-04 0.0005171768   0.2087871

(Dispersion Parameter for Poisson family taken to be 1 )

    Null Deviance: 588.6089 on 507 degrees of freedom

Residual Deviance: 57.84515 on 497 degrees of freedom

Number of Fisher Scoring Iterations: 3 

Correlation of Coefficients:
    (Intercept)        Rsp        Crd        Tmp        Hum        Crb 
Rsp  0.2242698                                                        
Crd -0.6689086  -0.5688460                                            
Tmp -0.8504103  -0.0618902  0.3291518                                 
Hum -0.6870933  -0.1005962  0.2441310  0.5135408                      
Crb  0.2000477   0.0873989 -0.2207805 -0.1166508  0.0517857           
Slf  0.1693699   0.2118884 -0.3702751 -0.0208349 -0.0818833 -0.1816456
Nit  0.1669253  -0.0829073  0.1707735 -0.2018831 -0.1595381 -0.1084359
Hdr -0.1772556  -0.0367965  0.0142721  0.0579830  0.0997768 -0.3700612
Ozn  0.4344092   0.0442298 -0.0850821 -0.6873258 -0.2334923  0.3819084
Par -0.2427340  -0.0413370 -0.0005778  0.2080051  0.1844139 -0.4639271
           Slf        Nit        Hdr        Ozn 
Rsp                                            
Crd                                            
Tmp           
Hum                                            
Crb                                            
Slf                                            
Nit -0.2983407                                 
Hdr -0.0686041 -0.4477533                      
Ozn -0.2718690 -0.1780738  0.0398134           
Par  0.1935141 -0.3772318 -0.0183395 -0.1039967

##Plot Mortality TS (Mrt) vs Fitted TS (Estimated mu=E(Mrt)) ##FIG 3


par(mfrow=c(2,1), oma=c(0,0,5,0))
par(cex=1.1)

acf(PoisReg.All$y-PoisReg.All$fitted.values)
cpgram(PoisReg.All$y-PoisReg.All$fitted.values)


ts.plot(cbind(PoisReg.All$y,PoisReg.All$fitted.values))
legend(200,245,c("________ Data","................ Fit"),bty="n")
mtext("Observed vs Predicted", cex=1.5)

ts.plot(PoisReg.All$residuals) ##(Working) residual plot somewhat sinusoidal.
mtext("Residuals", cex=1.5)

mtext("Poisson Regression of LA Mortality Count on", outer=T,line=2, cex=1.5)
mtext("Rsp, Crd, Tmp, Hum, Crb, Slf, Nit, Hdr, Ozn, Par", 
outer=T, line=0, cex=1.5)


##Reduce the number of covariates

> PoisReg.RCSHO <- glm(formula = Mrt ~ Rsp + Crd + Slf + Hdr + Ozn , 
        family = poisson, data = LA)

> summary(PoisReg.RCSHO)

Call: glm(formula = Mrt ~ Rsp + Crd + Slf + Hdr + Ozn, family = poisson, data = LA)
Deviance Residuals:
       Min         1Q      Median        3Q      Max 
 -1.136971 -0.2163543 0.005178808 0.2149715 1.332251

Coefficients:
                    Value   Std. Error     t value 
(Intercept)  4.4647362803 0.0415809850  107.374471
        Rsp  0.0059250366 0.0014752054    4.016415
        Crd  0.0064882213 0.0005299576   12.242907
        Slf -0.0077351230 0.0051911900   -1.490048
        Hdr  0.0008691312 0.0003447418    2.521108
        Ozn  0.0018460830 0.0010778144    1.712802


(Dispersion Parameter for Poisson family taken to be 1 )

    Null Deviance: 588.6089 on 507 degrees of freedom

Residual Deviance: 58.09185 on 502 degrees of freedom

Number of Fisher Scoring Iterations: 3 

Correlation of Coefficients:
    (Intercept)        Rsp        Crd        Slf        Hdr 
Rsp  0.3319260                                             
Crd -0.9050099  -0.5779281                                 
Slf  0.3962013   0.2078572 -0.3875707                      
Hdr -0.1627729  -0.0874557 -0.0607864 -0.5896395           
Ozn -0.6346736  -0.0657916  0.4914853 -0.5915154  0.2202745


###Deviance Test: 
PoisReg.All:   Deviance = 57.84515, No. parameters = 11 (include intercept)
PoisReg.RCSHO: Deviance = 58.09185, No. parameters = 6

> PoisReg.RCSHO$deviance-PoisReg.All$deviance
[1] 0.2467093=58.09185-57.84515 Too small.

> LogReg.GPA$deviance-LogReg$deviance
[1] 2.128012  ###Not large enough.

##p-value; df=5=difference in the number of parameters of the two models.
> 1-pchisq(PoisReg.RCSHO$deviance-PoisReg.All$deviance, 11-6)  
[1] 0.998527 ###Accept "No Diffrenece" between the two models.


> names(PoisReg.RCSHO)
 [1] "coefficients"      "residuals"         "fitted.values"
 [4] "effects"           "R"                 "rank"
 [7] "qr"                "family"            "linear.predictors"
[10] "deviance"          "aic"               "null.deviance"
[13] "iter"              "weights"           "prior.weights"
[16] "df.residual"       "df.null"           "y"
[19] "converged"         "boundary"          "model"
[22] "call"              "formula"           "terms"
[25] "data"              "offset"            "control"
[28] "method"            "contrasts"         "xlevels"


##Plot of data vs new fit.

par(mfrow=c(2,1), oma=c(0,0,5,0))
par(cex=1.1)


ts.plot(cbind(PoisReg.RCSHO$y,PoisReg.RCSHO$fitted.values))
legend(200,245,c("________ Data","................ Fit"),bty="n")


ts.plot(PoisReg.RCSHO$residuals) ##Residual plot somewhat sinusoidal.
mtext("Residuals", cex=1.2)

mtext("Poisson Regression of LA Mortality Count on", outer=T,line=2, cex=1.2)
mtext("Rsp, Crd, Slf, Hdr, Ozn", outer=T, line=0, cex=1.2)   ##FIG 4

##Residual behavior

acf(PoisReg.RCSHO$residuals)        ##FIG5 Not good!
cpgram(PoisReg.RCSHO$residuals)

====================================
"FINAL" MODEL: PoisReg.TCy1y2

##Use Tmp, any pollutant, and autoregress ###############
  as suggested by Shumway et al. (1988). Get uncorrelated residuals!

> mean(LA$Mrt)
[1] 169.0482

##Adding Mrt(t-1)=y(t-1), Mrt(t-2)=y(t-2) ##Autoregress on y(t-1),y(t-2)!!!
y1 <- c(169,LA$Mrt)[-509]
y2 <- c(169,169,LA$Mrt)[-c(509,510)]

LA.y1y2 <- cbind(LA,y1,y2)

> is.data.frame(LA.y1y2)
[1] TRUE


##Naming the new variables
names(LA.y1y2) <- c('Mrt','Rsp','Crd','Tmp','Hum','Crb',
'Slf','Nit','Hdr','Ozn','Par','y1','y2')


> LA.y1y2[1:4,]
     Mrt   Rsp    Crd   Tmp   Hum   Crb  Slf   Nit   Hdr  Ozn   Par     y1 
1 183.63 11.90  97.85 72.38 29.20 11.51 3.37  9.64 45.79 6.69 72.72 169.00
2 191.05 10.75 104.64 67.19 67.51  8.92 2.59 10.05 43.90 6.83 49.60 183.63
3 180.09  9.33  94.36 62.94 61.42  9.48 3.29  7.80 32.18 4.98 55.68 191.05
4 184.67  9.54  98.05 72.49 58.99 10.28 3.04 13.39 40.43 9.25 55.16 180.09
      y2 
1 169.00
2 169.00
3 183.63
4 191.05


==============================================
Use of I():

In function 'data.frame'.  Protecting an object by enclosing
          it in 'I()' in a call to 'data.frame' inhibits the conversion
          of character vectors to factors and the dropping of names,
          and ensures that matrices are inserted as single columns.
          'I' can also be used to protect objects which are to be added
          to a data frame, or converted to a data frame _via_
          'as.data.frame'.

In function 'formula'.  There it is used to inhibit the
          interpretation of operators such as '"+"', '"-"', '"*"' and
          '"^"' as formula operators, so they are used as arithmetical
          operators.  This is interpreted as a symbol by
          'terms.formula'.
=======================================================



##Use: log(Crb)!!!

PoisReg.TCy1y2 <- glm(formula = Mrt ~ Tmp + I(Tmp^2) + log(Crb) + y1 + y2 ,  
family = poisson, data = LA.y1y2)


> summary(PoisReg.TCy1y2)

Call: glm(formula = Mrt ~ Tmp + I(Tmp^2) + log(Crb) + y1 + y2, family
= poisson, data = LA.y1y2)
Deviance Residuals:
       Min         1Q      Median        3Q      Max 
 -1.923433 -0.3502484 -0.04177222 0.3573013 1.893937

Coefficients:
                    Value   Std. Error    t value 
(Intercept)  5.1751770503 2.092880e-01  24.727533
        Tmp -0.0212812318 5.074996e-03  -4.193350
   I(Tmp^2)  0.0001409826 3.362773e-05   4.192451
   log(Crb)  0.0524105160 8.576396e-03   6.111018
         y1  0.0017315880 3.707799e-04   4.670124
         y2  0.0020606382 3.594891e-04   5.732131

(Dispersion Parameter for Poisson family taken to be 1 )

    Null Deviance: 588.6089 on 507 degrees of freedom

Residual Deviance: 166.2256 on 502 degrees of freedom (508-6=502)

Number of Fisher Scoring Iterations: 3 

Correlation of Coefficients:
         (Intercept)        Tmp   I(Tmp^2)   log(Crb)         y1 
     Tmp -0.9594272                                             
I(Tmp^2)  0.9391324  -0.9962832                                 
log(Crb)  0.0464542   0.0036636 -0.0164882                      
      y1 -0.2175850   0.0958640 -0.0715993 -0.2170437           
      y2 -0.1895248   0.0884345 -0.0745675 -0.1863434 -0.5955002


#SE's From The Sample Information Matrix G_N From PL Using the Model
PoisReg.TCy1y2:  Mrt ~ Tmp + I(Tmp^2) + log(Crb) + y1 + y2

b0 <- PoisReg.TCy1y2$coefficients[1]
b1 <- PoisReg.TCy1y2$coefficients[2]
b2 <- PoisReg.TCy1y2$coefficients[3]
b3 <- PoisReg.TCy1y2$coefficients[4]
b4 <- PoisReg.TCy1y2$coefficients[5]
b5 <- PoisReg.TCy1y2$coefficients[6]

Mrt <- LA$Mrt
Tmp <- LA$Tmp
Crb <- LA$Crb


N <- length(LA$Mrt)
> N
[1] 508

Use our cumulative covariance matrix GN=G_{N}!!!

GN <- matrix(rep(0,36),ncol=6)
for(i in 3:N){
bZ <- b0+b1*Tmp[i]+b2*Tmp[i]^2 +b3*log(Crb[i])+b4*Mrt[i-1]+b5*Mrt[i-2]
Z <- c(1,Tmp[i],Tmp[i]^2,log(Crb[i]),Mrt[i-1],Mrt[i-2])
ZZ <- Z%*%t(Z)
GN <- GN + ZZ*exp(bZ)
}

InvGN <- solve(GN)
> InvGN   ###Covariance Matrix of PL Estimates
              [,1]          [,2]          [,3]          [,4]          [,5] 
[1,]  4.383214e-02 -1.020016e-03  6.616406e-06  8.068968e-05 -1.690211e-05
[2,] -1.020016e-03  2.578653e-05 -1.702455e-07  2.453168e-07  1.808388e-07
[3,]  6.616406e-06 -1.702455e-07  1.132379e-09 -5.361427e-09 -8.969003e-10
[4,]  8.068968e-05  2.453168e-07 -5.361427e-09  7.379477e-05 -6.898445e-07
[5,] -1.690211e-05  1.808388e-07 -8.969003e-10 -6.898445e-07  1.379105e-07
[6,] -1.419898e-05  1.595131e-07 -8.876829e-10 -5.788406e-07 -7.975442e-08
              [,6] 
[1,] -1.419898e-05
[2,]  1.595131e-07
[3,] -8.876829e-10
[4,] -5.788406e-07
[5,] -7.975442e-08
[6,]  1.296324e-07


SE of hat(b0,b1,b2,b3,b4,b5) from PL
> sqrt(c(InvGN[1,1],InvGN[2,2],InvGN[3,3],InvGN[4,4],InvGN[5,5],InvGN[6,6]))
[1] 2.093613e-01 5.078044e-03 3.365085e-05 8.590388e-03 3.713630e-04
[6] 3.600450e-04

From Splus: SAME!!!
    2.092880e-01 5.074996e-03 3.362773e-05 8.576396e-03 3.707799e-04 
    3.594891e-04 


##Get AIC below from step().


##Some predictions:

Recall in Poisson regression the link is 

                   log(mu)=eta  <----Link function

Therefore the predictor in Poisson regression is the estimated 

                 mu=exp(eta)  <----Inverse link

where eta is the linear predictor. We have:

log{Hat(mu)}= Hat(eta) = b0+b1*Tmp+b2*I(Tmp^2)+b3*log(Crb)+b4*y1+b5*y2

Thus, the predictor is:

Hat(E(Mrt))=Hat(mu)=exp(b0+b1*Tmp+b2*I(Tmp^2)+b3*log(Crb)+b4*y1+b5*y2) (***)

> LA.y1y2$Mrt[201:210] ##Observed Mrt values
[1] 177.09 183.34 184.83 173.11 164.21 164.73 170.84 178.25 170.86 179.93 

> for(i in 201:210){ 
predictor <- exp(5.1751770503 - 
 0.0212812318*LA.y1y2$Tmp[i]+0.0001409826*LA.y1y2$Tmp[i]^2
+0.0524105160*log(LA.y1y2$Crb[i])+0.0017315880*LA.y1y2$y1[i]
+0.0020606382*LA.y1y2$y2[i])
print(predictor)} 

[1] 176.1245
[1] 175.0899
[1] 185.3946
[1] 184.4768
[1] 180.7214           ##Corresponding predicted values
[1] 174.5726
[1] 171.2021
[1] 169.9865
[1] 176.1829
[1] 172.444

###Check: OK!!! Get the same prediction.

> PoisReg.TCy1y2$y[201:210] ###Observed
 [1] 177.09 183.34 184.83 173.11 164.21 164.73 170.84 178.25 170.86 179.93
> 

> PoisReg.TCy1y2$fitted.values[201:210]  ###Predicted
      201      202      203      204      205      206      207      208 
 176.1245 175.0899 185.3946 184.4768 180.7214 174.5726 171.2021 169.9865
      209     210 
 176.1829 172.444


##Plot of data vs new fit.   ##FIG 5

tsplot(PoisReg.TCy1y2$y,PoisReg.TCy1y2$fitted.values)
legend(350,228,c("________ Data","................ Fit"),bty="n")


tsplot(PoisReg.TCy1y2$residuals) ##(Working) residuals look good.
mtext("Residuals", cex=1.2)

mtext("Poisson Regression of LA Mortality Count on", outer=T,line=2, cex=1.2)
mtext("Tmp, Tmp^2, log(Crb), Crd, Mrt(t-1), Mrt(t-2)", 
outer=T, line=0, cex=1.2)

#Partial Plot: ##Fig 5'
> tsplot(PoisReg.TCy1y2$y[101:200],PoisReg.TCy1y2$fitted.values[101:200])
mtext("t in 101:200", cex=1.5)

> tsplot(PoisReg.TCy1y2$y[201:300],PoisReg.TCy1y2$fitted.values[201:300])
mtext("t in 201:300", cex=1.5)

mtext("Poisson Regression of LA Mortality Count on", outer=T,line=2, cex=1.2)
mtext("Tmp, Tmp^2, log(Crb), Crd, Mrt(t-1), Mrt(t-2)", 
outer=T, line=0, cex=1.2)

###Comparison of (working) residuals  ##FIG 6

par(mfrow=c(2,1), oma=c(0,0,5,0))

acf(PoisReg.All$residuals) ###Working residuals
mtext("Residual ACF: Mrt ~ Rsp + Crd + Tmp + Hum + Crb + Slf + Nit + Hdr + Ozn + Par", 
line=1, cex=1.2)

acf(PoisReg.TCy1y2$residuals)
mtext("Residual ACF: Mrt ~ Tmp + I(Tmp^2) + log(Crb) + Mrt(t-1) + Mrt(t-2)", 
line=1, cex=1.2)

mtext("ACF of Working Residuals From Two Poisson Regressions",
outer=T, line=2, cex=1.5) 

###Get Pearson Residuals

PearsonResd <-  resid(PoisReg.TCy1y2, type="pearson") 
> sum(PearsonResd^2)/(508-6)
[1] 0.3314884 <--- phi

##Residual behavior of "Final Model": Excellent! 

acf(PoisReg.TCy1y2$residuals)    ###Working residuals    
cpgram(PoisReg.TCy1y2$residuals) ###Working residuals
    
=============================

##3 concluding plots
par(mfrow=c(3,1), oma=c(0,0,5,0))     

##For R
ts.plot(ts(PoisReg.TCy1y2$y),ts(PoisReg.TCy1y2$fitted.values))

##For Splus
tsplot(PoisReg.TCy1y2$y,PoisReg.TCy1y2$fitted.values)
legend(350,228,c("________ Data","................ Fit"),bty="n")
mtext("Weekly Mortality vs Poisson Fit", cex=1.2)


tsplot(PoisReg.TCy1y2$residuals) ##(Working) residuals look good.
mtext("Residuals", cex=1.2)

acf(PoisReg.TCy1y2$residuals)
mtext("Residual ACF", line=0, cex=1)

mtext("Poisson Regression of LA Mortality Count on", outer=T,line=2, cex=1.2)
mtext("Tmp, Tmp^2, log(Crb), Crd, Mrt(t-1), Mrt(t-2)", 
outer=T, line=0, cex=1.2)     ##FIG 7



(30.5-300*0.079)/sqrt(300*0.079*0.921)
======================


###Poisson AR: No other covariates

log(mu)= b0+b1*Mrt(t-1)+b2*Mrt(t-2)
       = b0+b1*y(t-1)+b2*y(t-2)

> P.AR2 <- glm(formula = Mrt ~ y1 + y2, poisson, data = LA.y1y2)
> summary(P.AR2)


Call: glm(formula = Mrt ~ y1 + y2, family = poisson, data = LA.y1y2)
Deviance Residuals:
       Min         1Q      Median        3Q      Max 
 -2.150422 -0.4200039 -0.04843802 0.4570537 2.039434

Coefficients:
                  Value   Std. Error    t value 
(Intercept) 4.311907719 0.0424379602 101.604971
         y1 0.002273389 0.0003476951   6.538457
         y2 0.002554930 0.0003477661   7.346691

(Dispersion Parameter for Poisson family taken to be 1 )

    Null Deviance: 588.6089 on 507 degrees of freedom

Residual Deviance: 222.2333 on 505 degrees of freedom

Number of Fisher Scoring Iterations: 3 

Correlation of Coefficients:
   (Intercept)         y1 
y1 -0.3564393            
y2 -0.3571710  -0.7437231

##Comparison between 
glm(formula = Mrt ~ Tmp + I(Tmp^2) + log(Crb) + y1 + y2 ,  
family = poisson, data = LA.y1y2)
and
glm(formula = Mrt ~ y1 + y2, family = poisson, data = LA.y1y2)

NOTE: Sharp increase of deviance

PoisReg.TCy1y2 gives 166.2256 on 502 <--Include Tmp,Tmp^2,Crb,y(t-1),y(t-2)
         P.AR2 gives 222.2333 on 505 <--Include only          y(t-1),y(t-2)

Note: df=N-p=508-(5 coeff. + 1 intercept)=502 for PoisReg.TCy1y2
      df=N-p=508-(2 coeff. + 1 intercept)=505 for P.AR2

MSE.P.AR2 <- sum(P.AR2$y-P.AR2$fitted.values)^2/508 == 3.868998e-18 !!!

MSE.PoisReg.TCy1y2 <- sum(PoisReg.TCy1y2$y-PoisReg.TCy1y2$fitted.values)^2/508
                      ==3.785096e-19 ##<---Smaller!!!

> P.AR2$residuals%*%P.AR2$residuals  ##SS Working Residuals
         [,1] 
[1,] 1.294978
> PoisReg.TCy1y2$residuals%*%PoisReg.TCy1y2$residuals ##SS Working Residuals
          [,1] 
[1,] 0.9717937  ##Smaller!!!


par(mfrow=c(4,1), oma=c(0,0,5,0))     

tsplot(P.AR2$y,P.AR2$fitted.values)
tsplot(PoisReg.TCy1y2$y,PoisReg.TCy1y2$fitted.values)

tsplot(P.AR2$y[301:500],P.AR2$fitted.values[301:500])
tsplot(PoisReg.TCy1y2$y[301:500],PoisReg.TCy1y2$fitted.values[301:500])



mtext("Poisson Regression of LA Mortality Count on", outer=T,line=2, cex=1.2)
mtext("Mrt(t-1), Mrt(t-2) vs Tmp, Tmp^2, log(Crb), Crd, Mrt(t-1), Mrt(t-2)", 
outer=T, line=0, cex=1.2)   ###FIG 8



tsplot(P.AR2$residuals) ##(Working) residuals look good. 
tsplot(PoisReg.TCy1y2$residuals) ##(Working) residuals look good.


acf(P.AR2$residuals) 
acf(PoisReg.TCy1y2$residuals) ##Almost the same!!! Resemble WN!!!

mtext("Poisson Regression of LA Mortality Count on", outer=T,line=2, cex=1.2)
mtext("Mrt(t-1), Mrt(t-2) vs Tmp, Tmp^2, log(Crb), Crd, Mrt(t-1), Mrt(t-2)", 
outer=T, line=0, cex=1.2)   ##FIG 9



##Compare PoisReg.TCy1y2 and P.AR2

PoisReg.TCy1y2 <- glm(formula = Mrt ~ Tmp + I(Tmp^2) + log(Crb) + y1 + y2 ,  
family = poisson, data = LA.y1y2)
Residual Deviance: 166.2256 on 502 degrees of freedom

P.AR2 <- glm(formula = Mrt ~ y1 + y2, poisson, data = LA.y1y2)
Residual Deviance: 222.2333 on 505 degrees of freedom

> P.AR2$deviance
[1] 222.233
> P.AR2$df
[1] 505

> PoisReg.TCy1y2$deviance
[1] 166.2256
> PoisReg.TCy1y2$df
[1] 502


Test significance of Tmp, Tmp^2),log(Crb):
p-value; df=3=difference in the number of parameters of the two models.
> 1-pchisq(P.AR2$deviance-PoisReg.TCy1y2$deviance, 505-502)  
[1] 4.184986e-12 ###"Reject": Tmp, Tmp^2),log(Crb) are significant!!!


----------------------------------

###Now do stepwise model selection using step():

DESCRIPTION:
       Performs stepwise model selection.

       This function is generic (see Methods);  method  functions
       can  be  written  to  handle  specific  classes  of  data.
       Classes which  already  have  methods  for  this  function
       include:
       gam, glm.

Example: 

glm.object <- glm(Kyphosis ~ Age + Start + Number,
                         family = binomial, data = kyphosis)
step(glm.object)


Use everything except Crd and Rsp:

LM <- glm(formula = Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + log(Crb) + 
Hum + Slf + Nit + Hdr + Ozn + Par , family = poisson, data = LA.y1y2)


ResidLM <- resid(LM,type="pearson")
ResidLM%*%ResidLM/508 = 0.3117516


step(glm(formula = Mrt ~ y1 + y2 + Tmp  + Crb + 
Hum + Ozn  , family = poisson, data = LA.y1y2))

Get AIC from repeated use of step() applied to the respective model.
The results below are from S-Plus!!! R gives AIC=Inf!!!

AIC= 182.0731 
Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + log(Crb) + Hum + Slf + Nit + Hdr + Ozn + Par 

AIC= 180.159 
Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + log(Crb) + Hum +       Nit + Hdr + Ozn + Par 

AIC= 178.4193 
Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + log(Crb) + Hum +       Nit + Hdr + Ozn 

AIC= 176.8529 
Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + log(Crb) + Hum +       Nit + Hdr 

Step:  AIC= 175.0992 
Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + log(Crb) + Hum +             Hdr 



----------------------------------------
AIC= 173.8994 
Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + log(Crb) + Hum  ##Step() stopped here
Residual Deviance: 159.8994 on 501 degrees of freedom

glm(formula = Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + log(Crb) + Hum,
family = poisson, data = LA.y1y2)



----------
#######"Final" Model PoisReg.TCy1y2: Decided not to include humidity
#######Test for humidity by: D0-D1 ~ ChiSq(p-q)

step(glm(formula = Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + log(Crb),
family = poisson, data = LA.y1y2))


AIC= 178.2256  ##This is from S-Plus. R gives AIC=Inf.
Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + log(Crb) 
Residual (scaled)  Deviance: 166.2256 on 502 degrees of freedom

> 1-pchisq(166.2256-159.8994, 502-501) 
[1] 0.0118967 -----> Hum Significant.


--------------------


step(glm(formula = Mrt ~ y1 + y2, family = poisson, data = LA.y1y2))

AIC= 228.2333 
Mrt ~ y1 + y2 
Residual Deviance: 222.2333 on 505 degrees of freedom

> 1-pchisq(222.2333-159.8994, 505-501) 
[1] 9.370282e-13 ----> (Tmp + I(Tmp^2) + log(Crb) + Hum) Significant.

---------------------------





Just for comparison with a linear model (Gaussian canonical link; better to
use lm than glm; see Venables and Ripley section on the default
Gaussian link):

Note: Put y1, y2, first; before they were last.

LM <- lm(formula = Mrt ~  y1 + y2 + Tmp + I(Tmp^2) + log(Crb),
      data = LA.y1y2)

> summary(LM)

Call: lm(formula = Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + log(Crb), data = LA.y1y2)
Residuals:
    Min     1Q  Median    3Q   Max 
 -24.63 -4.684 -0.4163 4.723 28.68

Coefficients:
                Value Std. Error   t value  Pr(>|t|) 
(Intercept)  173.9072   21.4519     8.1069    0.0000
         y1    0.3116    0.0371     8.4016    0.0000
         y2    0.3635    0.0360    10.1039    0.0000
        Tmp   -3.6616    0.5183    -7.0643    0.0000
   I(Tmp^2)    0.0243    0.0034     7.0864    0.0000
   log(Crb)    8.6327    0.8519    10.1334    0.0000

Residual standard error: 7.622 on 502 degrees of freedom
Multiple R-Squared: 0.714 
F-statistic: 250.7 on 5 and 502 degrees of freedom, the p-value is 0 


Correlation of Coefficients:
         (Intercept)      y1      y2     Tmp I(Tmp^2) 
      y1 -0.2259                                     
      y2 -0.1966     -0.5740                         
     Tmp -0.9599      0.1021  0.0927                 
I(Tmp^2)  0.9402     -0.0781 -0.0786 -0.9964         
log(Crb)  0.0429     -0.2161 -0.1945  0.0093 -0.0217 


> step(LM)
Start:  AIC= 29864.27 ###Much larger than AIC in Poisson regression.
 Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + log(Crb) 

Single term deletions

Model:
Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + log(Crb)

scale:  58.10169 

         Df Sum of Sq      RSS       Cp 
  <none>              29167.05 29864.27
      y1  1  4101.226 33268.27 33849.29
      y2  1  5931.579 35098.63 35679.64
     Tmp  1  2899.490 32066.54 32647.56
I(Tmp^2)  1  2917.728 32084.78 32665.79
log(Crb)  1  5966.218 35133.27 35714.28
Call:
lm(formula = Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + log(Crb), data = LA.y1y2)

Coefficients:
 (Intercept)        y1        y2       Tmp   I(Tmp^2) log(Crb) 
    173.9072 0.3115606 0.3634993 -3.661645 0.02430015 8.632702

Degrees of freedom: 508 total; 502 residual
Residual standard error: 7.622446 


> ResidLM <- resid(LM,type="pearson")
> ResidLM%*%ResidLM/508
         [,1] 
[1,] 57.41545


Now compare with the Final model using Poisson reg.

Final <- glm(formula = Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + log(Crb),
family = poisson, data = LA.y1y2)


> summary(Final)

Call: glm(formula = Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + log(Crb), 
family = poisson, data
         = LA.y1y2)
Deviance Residuals:
       Min         1Q      Median        3Q      Max 
 -1.923433 -0.3502484 -0.04177222 0.3573013 1.893937

Coefficients:
                    Value   Std. Error    t value 
(Intercept)  5.1751770503 2.092880e-01  24.727533
         y1  0.0017315880 3.707799e-04   4.670124
         y2  0.0020606382 3.594891e-04   5.732131
        Tmp -0.0212812318 5.074996e-03  -4.193350
   I(Tmp^2)  0.0001409826 3.362773e-05   4.192451
   log(Crb)  0.0524105160 8.576396e-03   6.111018

(Dispersion Parameter for Poisson family taken to be 1 )

    Null Deviance: 588.6089 on 507 degrees of freedom

Residual Deviance: 166.2256 on 502 degrees of freedom

Number of Fisher Scoring Iterations: 3 

Correlation of Coefficients:
         (Intercept)         y1         y2        Tmp   I(Tmp^2) 
      y1 -0.2175850                                             
      y2 -0.1895248  -0.5955002                                 
     Tmp -0.9594272   0.0958640  0.0884345                      
I(Tmp^2)  0.9391324  -0.0715993 -0.0745675 -0.9962832           
log(Crb)  0.0464542  -0.2170437 -0.1863434  0.0036636 -0.0164882


ResidFinal <- resid(Final,type="pearson")
> ResidFinal%*%ResidFinal/508
          [,1] 
[1,] 0.3275731    #Much smaller than in LM!!!

----------------------------------------------


y1 <- c(169,LA$Mrt)[-509]
y2 <- c(169,169,LA$Mrt)[-c(509,510)]

LA.y1y2 <- cbind(LA,y1,y2)

Adding Tmp[t-1], Tmp[t-2]:

T1 <- c(mean(LA.y1y2$Tmp),LA.y1y2$Tmp)[-509]
T2 <- c(mean(LA.y1y2$Tmp),mean(LA.y1y2$Tmp),LA.y1y2$Tmp)[-c(509,510)]

> LA1 <- cbind(LA.y1y2,T1,T2)
> is.data.frame(LA1)
[1] T


names(LA1) <- c('Mrt','Rsp','Crd','Tmp','Hum','Crb',
'Slf','Nit','Hdr','Ozn','Par','y1','y2','T1','T2')


> cor(LA1)
           Mrt          Rsp        Crd         Tmp         Hum         Crb 
Mrt  0.9999999  0.726959705  0.9236739 -0.36018065 -0.24601072  0.55707395
Rsp  0.7269597  1.000000000  0.6137877 -0.33055755 -0.07576124  0.29582089
Crd  0.9236739  0.613787711  1.0000000 -0.43863967 -0.23572391  0.55744761
Tmp -0.3601806 -0.330557555 -0.4386397  1.00000000 -0.29328853 -0.09785584
Hum -0.2460107 -0.075761244 -0.2357239 -0.29328853  1.00000000 -0.43785578
Crb  0.5570740  0.295820892  0.5574476 -0.09785584 -0.43785578  1.00000012
Slf  0.2308592  0.004090783  0.2569989  0.40437400 -0.27213290  0.51300466
Nit  0.2200107  0.061664522  0.1674097  0.42985296 -0.37408376  0.68795371
Hdr  0.4407115  0.215747550  0.4011386  0.11413717 -0.42981786  0.85104489
Ozn -0.3518491 -0.324815482 -0.4260242  0.85137832 -0.04935255 -0.25106162
Par  0.4741024  0.264985323  0.4438713 -0.01723096 -0.43278363  0.86611748
 y1  0.7273812  0.670334339  0.7191584 -0.48296806 -0.02550465  0.41032946
 y2  0.7403291  0.638668060  0.7191422 -0.43919477 -0.06520956  0.39624998
 T1 -0.5124716 -0.390850842 -0.5447309  0.60201085  0.14257042 -0.36769858
 T2 -0.4361836 -0.380785674 -0.4911761  0.61820769  0.03660665 -0.23522018
             Slf         Nit          Hdr          Ozn         Par          y1 
Mrt  0.230859205  0.22001068  0.440711468 -0.351849109  0.47410244  0.72738117
Rsp  0.004090783  0.06166452  0.215747550 -0.324815482  0.26498532  0.67033434
Crd  0.256998926  0.16740966  0.401138604 -0.426024169  0.44387129  0.71915835
Tmp  0.404374003  0.42985296  0.114137165  0.851378322 -0.01723096 -0.48296806
Hum -0.272132903 -0.37408376 -0.429817855 -0.049352545 -0.43278363 -0.02550465
Crb  0.513004661  0.68795371  0.851044893 -0.251061618  0.86611748  0.41032946
Slf  1.000000000  0.73002124  0.612865746  0.405081540  0.46793401  0.12870277
Nit  0.730021238  1.00000000  0.822394788  0.346277297  0.73722708  0.07150596
Hdr  0.612865746  0.82239479  1.000000119 -0.007746873  0.80750394  0.26645952
Ozn  0.405081540  0.34627730 -0.007746873  1.000000000 -0.12284482 -0.42385262
Par  0.467934012  0.73722708  0.807503939 -0.122844823  0.99999994  0.33054382
 y1  0.128702775  0.07150596  0.266459525 -0.423852623  0.33054382  1.00000012
 y2  0.130368695  0.06948008  0.270989090 -0.401137859  0.28707892  0.72741520
 T1  0.100897610  0.03276984 -0.229703426  0.581990957 -0.25979626 -0.36011869
 T2  0.116964988  0.08248030 -0.129468620  0.560343921 -0.15373182 -0.51245707
             y2          T1          T2 
Mrt  0.74032909 -0.51247156 -0.43618360
Rsp  0.63866806 -0.39085084 -0.38078567
Crd  0.71914220 -0.54473090 -0.49117613
Tmp -0.43919477  0.60201085  0.61820769
Hum -0.06520956  0.14257042  0.03660665
Crb  0.39624998 -0.36769858 -0.23522018
Slf  0.13036869  0.10089761  0.11696499
Nit  0.06948008  0.03276984  0.08248030
Hdr  0.27098909 -0.22970343 -0.12946862
Ozn -0.40113786  0.58199096  0.56034392
Par  0.28707892 -0.25979626 -0.15373182
 y1  0.72741520 -0.36011869 -0.51245707
 y2  1.00000012 -0.48308662 -0.36013213
 T1 -0.48308662  1.00000000  0.60203445
 T2 -0.36013213  0.60203445  1.00000000
> 



cor(cbind(log(LA1$Mrt),LA1$Tmp,LA1$T1,LA1$T2,LA1$y1,LA1$y2,log(LA1$Crb),
LA1$Hum))

cor(log(LA1$Mrt),LA1$Tmp)=     -0.35902
cor(log(LA1$Mrt),LA1$T1)=      -0.5164694
cor(log(LA1$Mrt),LA1$T2)=      -0.4395802
cor(log(LA1$Mrt),LA1$y1)=       0.7170978 
cor(log(LA1$Mrt),LA1$y2)=       0.7361614
cor(log(LA1$Mrt),LA1$Crb)=      0.5615486
cor(log(LA1$Mrt),log(LA1$Crb))= 0.582759
cor(log(LA1$Mrt),LA1$Hum)=     -0.2548507

y1y2T1T2TmpLogCHum <- glm(formula = Mrt ~ y1 + y2 + T1 + T2  + Tmp +
log(Crb) + Hum, family = poisson, data = LA1)

Start:  AIC= 183.9295 
 Mrt ~ y1 + y2 + T1 + T2 + Tmp + log(Crb) + Hum 


Step:  AIC= 182.1486 
 Mrt ~ y1 + y2 + T1 + Tmp + log(Crb) + Hum 

Step:  AIC= 180.8557 
 Mrt ~ y1 + y2 + T1 + log(Crb) + Hum 

-------------------
y1y2T1LogCHum <- glm(formula = Mrt ~ y1 + y2 + T1 +
log(Crb) + Hum, family = poisson, data = LA1)

Start:  AIC= 180.8557 
 Mrt ~ y1 + y2 + T1 + log(Crb) + Hum 

Residual Deviance: 168.8557 on 502 degrees of freedom

> y1y2T1LogCHum$residuals%*%y1y2T1LogCHum$residuals
          [,1] 
[1,] 0.9856711

--------------------


Comparison of 4 models of Poisson Regression with canonical link
==================================================================
USE THIS MODEL

Model 4: Mrt(t) ~ Mrt(t-1) + Mrt(t-2) + Tmp(t-1) + log(Crb(t)) 
-------

y1y2T1LogC <- glm(formula = Mrt ~ y1 + y2 + T1 + log(Crb), 
family = poisson, data = LA1)


Start:  AIC= 184.5522 
 Mrt ~ y1 + y2 + T1 + log(Crb) 

Residual Deviance: 174.5522 on 503 degrees of freedom


> summary(y1y2T1LogC)

Call: glm(formula = Mrt ~ y1 + y2 + T1 + log(Crb), family = poisson, 
data = LA1)
Deviance Residuals:
       Min         1Q      Median       3Q      Max 
 -1.918812 -0.3562656 -0.03168986 0.360087 2.100885

Coefficients:
                   Value   Std. Error    t value 
(Intercept)  4.505127648 0.0694395719  64.878390
         y1  0.001888282 0.0003544952   5.326679
         y2  0.001837161 0.0003717740   4.941607
         T1 -0.001334397 0.0004426010  -3.014898
   log(Crb)  0.046829757 0.0086943161   5.386250

(Dispersion Parameter for Poisson family taken to be 1 )

    Null Deviance: 588.6089 on 507 degrees of freedom

Residual Deviance: 174.5522 on 503 degrees of freedom



> y1y2T1LogC$residuals%*%y1y2T1LogC$residuals
         [,1] 
[1,] 1.020108


Is humidity significant ?
174.5522 on 503 Model 4
168.8557 on 502 Model 4 + Hum
> 1-pchisq(174.5522 - 168.8557, 503-502)
[1] 0.01699878 ##p-value

Humidity is probably significant, but its coeficient is very small,
so I am including it.


tsplot(y1y2T1LogC$y,tsplot(y1y2T1LogC$fitted.values))
tsplot(y1y2T1LogC$residuals)

> mean(y1y2T1LogC$residuals)
[1] 7.929388e-06

acf(y1y2T1LogC$residuals)

Resid4 <- y1y2T1LogC$residuals  ##Working residuals.
> Resid4%*%Resid4
         [,1] 
[1,] 1.020108 

Another way:
WR  <- sum(residuals(y1y2T1LogC, type="w")^{2})
> WR
[1] 1.020108



Response residuals = y-hat(y) 
RespResid.y1y2T1LogC <- resid(y1y2T1LogC, type="response") ##Response resid.

> RespResid.y1y2T1LogC%*%RespResid.y1y2T1LogC/508  ###MSE
         [,1] 
[1,] 59.38317

Another way:
MSE <- sum(residuals(y1y2T1LogC, type="r")^{2})/508
> MSE
[1] 59.38317

Another way:
mean(residuals(y1y2T1LogC, type="r")^{2})

Use G_N to get SE's for Model 4
--------------------------------

Mrt ~ y1 + y2 + T1 + log(Crb)

b0 <- y1y2T1LogC$coefficients[1] ##Intercept
b1 <- y1y2T1LogC$coefficients[2]
b2 <- y1y2T1LogC$coefficients[3]
b3 <- y1y2T1LogC$coefficients[4]
b4 <- y1y2T1LogC$coefficients[5]


Mrt <- LA1$Mrt 
Tmp <- LA1$Tmp
Crb <- LA1$Crb


N <- length(LA1$Mrt)
> N
[1] 508

Use our cumulative covariance matrix GN=G_{N}!!!

GN <- matrix(rep(0,25),ncol=5)
for(i in 3:N){
bZ <- b0 + b1*Mrt[i-1]+b2*Mrt[i-2] + b3*Tmp[i-1] + b4*log(Crb[i])
Z <- c(1,Mrt[i-1],Mrt[i-2],Tmp[i-1],log(Crb[i]))
ZZ <- Z%*%t(Z)
GN <- GN + ZZ*exp(bZ)
}

> InvGN <- solve(GN) #Approx. Cov Matrix of MPLE hat{beta}.
> InvGN
              [,1]          [,2]          [,3]          [,4]          [,5] 
[1,]  4.821996e-03 -5.137528e-06 -1.186505e-05 -2.419959e-05 -6.718087e-05
[2,] -5.137528e-06  1.261890e-07 -8.756092e-08 -3.640349e-09 -5.800100e-07
[3,] -1.186505e-05 -8.756092e-08  1.385955e-07  5.080947e-08 -2.846573e-07
[4,] -2.419959e-05 -3.640349e-09  5.080947e-08  1.959349e-07  8.601128e-07
[5,] -6.718087e-05 -5.800100e-07 -2.846573e-07  8.601128e-07  7.581801e-05

SE of hat(b0,b1,b2,b3,b4,b) from PL
> sqrt(c(InvGN[1,1],InvGN[2,2],InvGN[3,3],InvGN[4,4],InvGN[5,5]))
[1] 0.0694405902 0.0003552309 0.0003722841 0.0004426454 0.0087073538

From S-Plus: Essentially the same!!!
0.0694395719,0.0003544952,0.0003717740,0.0004426010,0.0086943161


tsplot(y1y2T1LogC$residuals)        ###Same as tsplot(A[,'Mrt'])
mtext("y1y2T1LogC$residuals", line=0,cex=1.2)
PR.Mrt <- spectrum(y1y2T1LogC$residuals) ##Raw Periodogram
SP.Mrt <-  spectrum(y1y2T1LogC$residuals, spans=c(3,5,7))#Smoothed Period.

spectrum(rnorm(508), spans=c(3,5,7))#For comparison

par(mfrow=c(3,2), oma=c(0,0,5,0))
Mort <- LA1$Mrt
Temp <- LA1$Tmp
CO <- LA1$Crb
tsplot(Mort)
mtext("Mortality", line=0,cex=1.2)
acf(Mort)
tsplot(Temp)
mtext("Temperature", line=0,cex=1.2)
acf(Temp)
tsplot(log(CO))
mtext("log(CO)", line=0,cex=1.2)
acf(log(CO))

mtext("Mortality, Temperature, log(CO)", 
outer=T,line=1, cex=1.5)


Check fitted values of Model 4:
--------------------------------
Mrt ~ y1 + y2 + T1 + log(Crb)

b0 <- y1y2T1LogC$coefficients[1] ##Intercept
b1 <- y1y2T1LogC$coefficients[2]
b2 <- y1y2T1LogC$coefficients[3]
b3 <- y1y2T1LogC$coefficients[4]
b4 <- y1y2T1LogC$coefficients[5]

Mrt <- LA1$Mrt 
Tmp <- LA1$Tmp
Crb <- LA1$Crb

> N <- length(Mrt)
> N
[1] 508


mu <- rep(0,N)  ##Really hat(mu)
for(i in 3:N){
mu[i] <- exp(b0 + b1*Mrt[i-1]+b2*Mrt[i-2] + b3*Tmp[i-1] + b4*log(Crb[i]))
}

> y1y2T1LogC$fitted.values[101:105]
      101     102      103      104      105 
 198.2466 203.913 196.3034 194.3792 193.7002
> mu[101:105]
[1] 198.2466 203.9130 196.3034 194.3792 193.7002 ##SAME.

> y1y2T1LogC$fitted.values[504:508]
      504      505      506      507      508 
 153.4763 158.2064 162.6193 163.1669 168.3671
> mu[504:508]
[1] 153.4763 158.2064 162.6193 163.1669 168.3671 ##SAME.


Check acf,cpgram: Inside the WN bounds
-----------------------------------------
Best model
y1y2T1LogC <- glm(formula = Mrt ~ y1 + y2 + T1 + log(Crb), 
family = poisson, data = LA1)


 
acf(y1y2T1LogC$residuals)    ###Working residuals    
cpgram(y1y2T1LogC$residuals) ###Working residuals

ts.plot(y1y2T1LogC$residuals) 
qqnorm(y1y2T1LogC$residuals)
hist(y1y2T1LogC$residuals)


Compare to 
PoisReg.All <- glm(Mrt ~ Rsp+Crd+Tmp+Hum+Crb+Slf+Nit+Hdr+Ozn+Par, 
                            family=poisson, data=LA)

acf(PoisReg.All$residuals)    ###Working residuals    
cpgram(PoisReg.All$residuals) ###Working residuals

ts.plot(PoisReg.All$residuals) 
qqnorm(PoisReg.All$residuals)
hist(PoisReg.All$residuals)


Prediction interval for mu_t:
-------------------------------
Since Mrt is assumed Poisson, and sinec the mean of Mrt is high, 
use the normal approx: y = +/- mu + 1.96*sqrt(mu)

U <- rep(0,N)
for(i in 3:N){
U [i] <- mu[i] + 1.96*sqrt(mu[i])
}

L <- rep(0,N)
for(i in 3:N){
L [i] <- mu[i] - 1.96*sqrt(mu[i])
}


par(oma=c(0,0,5,0))
tsplot(Mrt[3:508],L[3:508],U[3:508])


mtext("LA Mortality and Prediction Bounds", 
outer=T,line=1, cex=1.5)



Logistic Regression: Using the covariates of Model 4.
--------------------
> x <- (LA1$Mrt > 180)
LR <- glm(formula = x ~ y1 + y2 + T1 + log(Crb), 
family = binomial, data = LA1)
> mean(x)
[1] 0.1889764
> length(x)
[1] 508


> summary(LR)

Call: glm(formula = x ~ y1 + y2 + T1 + log(Crb), family = binomial, data = LA1)
Deviance Residuals:
       Min         1Q     Median          3Q     Max 
 -2.398971 -0.3946087 -0.1765495 -0.06889809 2.96192

Coefficients:
                   Value Std. Error   t value 
(Intercept) -28.78559510 4.25454348 -6.765848
         y1   0.07231370 0.01801253  4.014634
         y2   0.07748155 0.01873651  4.135325
         T1  -0.03691172 0.02149358 -1.717337
   log(Crb)   1.82124808 0.43960315  4.142937

(Dispersion Parameter for Binomial family taken to be 1 )

    Null Deviance: 492.4911 on 507 degrees of freedom

Residual Deviance: 249.1258 on 503 degrees of freedom

Number of Fisher Scoring Iterations: 6 



> LR$y[50:70]
 [1] 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0
> LR$fitted.values[50:70]
        50        51        52        53        54        55        56      57 
 0.9926582 0.8860716 0.9495808 0.8664859 0.9023523 0.8528324 0.7852525 0.84669
       58        59        60        61         62        63        64 
 0.690341 0.8888058 0.4264606 0.2323411 0.07574699 0.0504389 0.1027491
         65         66         67         68         69        70 
 0.07480436 0.04605685 0.02240811 0.03680879 0.09486271 0.1855351
> LR$fitted.values[50:70]*1
        50        51        52        53        54        55        56      57 
 0.9926582 0.8860716 0.9495808 0.8664859 0.9023523 0.8528324 0.7852525 0.84669
       58        59        60        61         62        63        64 
 0.690341 0.8888058 0.4264606 0.2323411 0.07574699 0.0504389 0.1027491
         65         66         67         68         69        70 
 0.07480436 0.04605685 0.02240811 0.03680879 0.09486271 0.1855351

(LR$y-LR$fitted.values)%*%(LR$y-LR$fitted.values)/508=0.07416702

tsplot(LR$y[95:200],LR$fitted.values[95:200])
------------------------------------------
Model 5:
----------
y1y2T1T2LogC <- glm(formula = Mrt ~ y1 + y2 + T1 + T2 +log(Crb),
family = poisson, data = LA1)

Residual Deviance: 174.5268 on 502 degrees of freedom

Start:  AIC= 186.5268 
 Mrt ~ y1 + y2 + T1 + T2 + log(Crb) 

Is T2 significant ?
174.5268 on 502 Model 5 (y1 + y2 + T1 + T2 +log(Crb))
174.5522 on 503 Model 4 (y1 + y2 + T1 + log(Crb))

> 1-pchisq(174.5522 - 174.5268 , 503-502)
[1] 0.8733744 ##p-value
So, T2 NOT significant!!!

> Resid5 <- y1y2T1T2LogC$residuals
> Resid5%*%Resid5
        [,1] 
[1,] 1.01988

> RespResid.y1y2T1T2LogC <- resid(y1y2T1T2LogC, type="response")
> RespResid.y1y2T1T2LogC%*%RespResid.y1y2T1T2LogC/508
         [,1] 
[1,] 59.38478
-------------------------------------------
Model 6: Mrt ~ y1 + y2 + T1 + Tmp + log(Crb)

y1y2T1TmpLogC <- glm(formula = Mrt ~ y1 + y2 + T1 +Tmp+log(Crb),
family = poisson, data = LA1)

Residual Deviance: 171.4078 on 502 degrees of freedom
Start:  AIC= 183.4078 
 Mrt ~ y1 + y2 + T1 + Tmp + log(Crb) 

Is Tmp significant ?
Model 5: Mrt ~ y1 + y2 + T1 + Tmp + log(Crb), 171.4078 on 502 df
Model 4: Mrt ~ y1 + y2 + T1       + log(Crb), 174.5522 on 503 df
> 1-pchisq(174.5522 - 171.4078  , 503-502)
[1] 0.07618802 #p-value

Tmp not quite significant

> y1y2T1TmpLogC$residuals%*%y1y2T1TmpLogC$residuals
          [,1] 
[1,] 0.9992826

> resid(y1y2T1TmpLogC,type="response")
                  %*%resid(y1y2T1TmpLogC,type="response")/508
         [,1] 
[1,] 58.44298



----------------------------------

Model 3: Mrt(t) ~ Mrt(t-1) + Mrt(t-2) + Tmp(t-1) 
-------

y1y2T1 <- glm(formula = Mrt ~ y1 + y2 + T1, family = poisson, 
data = LA1)

Residual Deviance: 203.5214 on 504 degrees of freedom

> y1y2T1$residuals%*%y1y2T1$residuals
         [,1] 
[1,] 1.190659

Start:  AIC= 211.5214 
 Mrt ~ y1 + y2 + T1 

> mean(y1y2T1$residuals)
[1] 2.819284e-06

Is log(Crb) significant ?
Model 3: 203.5214 on 504
Model 4: 174.5522 on 503
> 1-pchisq(203.5214 - 174.5522,504-503)
[1] 7.353829e-08

log(Carbon) is very significant!!!

RespResid.y1y2T1 <- resid(y1y2T1, type="response")
> RespResid.y1y2T1%*%RespResid.y1y2T1/508
         [,1] 
[1,] 69.04306

Resid3 <- y1y2T1$residuals
> Resid3%*%Resid3
         [,1] 
[1,] 1.190659


--------------------------------

Model 2: Mrt(t) ~ Mrt(t-1) + Mrt(t-2) 
-------

y1y2 <- glm(formula = Mrt ~ y1 + y2 , family = poisson, data = LA1)

Start:  AIC= 228.2333 
 Mrt ~ y1 + y2 

> y1y2$residuals%*%y1y2$residuals
         [,1] 
[1,] 1.294978

> mean(y1y2$residuals)
[1] -5.101422e-06


Response Residuals= y-hat(y)
RespResid.y1y2 <- resid(y1y2, type="response")
> RespResid.y1y2%*%RespResid.y1y2/508
         [,1] 
[1,] 75.52384



Residual Deviance: 222.2333 on 505 degrees of freedom


174.5522 on 503 Model 4: Mrt ~ y1 + y2 + T1 + log(Crb)
222.2333 on 505 Model 2: Mrt ~ y1 + y2

Tmp(t-1)=T1,log(Crb)=log(Carbon) significant ?
>  1-pchisq(222.2333 - 174.5522,505-503)
[1] 4.427725e-11 ###Yes. T1, log(Crb) are significant!!!

acf(y1y2$residuals)

Resid2 <- y1y2$residuals
> Resid2%*%Resid2
         [,1] 
[1,] 1.294978


---------------------------------------------
Model 1: Mrt(t) ~ Mrt(t-1) 

y1 <- glm(formula = Mrt ~ y1  , family = poisson, data = LA1)

Residual Deviance: 276.071 on 506 degrees of freedom


> y1$residuals%*%y1$residuals
         [,1] 
[1,] 1.620043


RespResid.y1 <- resid(y1, type="response")
> RespResid.y1%*%RespResid.y1/508
         [,1] 
[1,] 93.01784

AIC= 280.071 
 Mrt ~ y1 

Start:  AIC= 280.071 
 Mrt ~ y1 



----------------------------------------------

Model 0: Mrt(t) ~ Tmp(t) + Hum(t) + Crb(t) + Slf(t) + Nit(t) + Hdr(t)
                   + Ozn(t) + Par(t)


WeathPol <- glm(formula = Mrt ~ Tmp+Hum+Crb+Slf+Nit+Hdr+Ozn+Par, 
                            family=poisson, data=LA)

Residual Deviance: 315.6854 on 499 degrees of freedom

Start:  AIC= 333.6854 
 Mrt ~ Tmp + Hum + Crb + Slf + Nit + Hdr + Ozn + Par 

acf(WeathPol$residuals)
Resid0 <- WeathPol$residuals
> Resid0%*%Resid0 ##SS of working residuals
         [,1] 
[1,] 1.857614


> WeathPol$residuals%*%WeathPol$residuals
         [,1] 
[1,] 1.857614


RespResid.WeathPol <- resid(WeathPol, type="response")
> RespResid.WeathPol%*%RespResid.WeathPol/508
        [,1] 
[1,] 108.9936


Summary: Poisson Regression
-----------------------------

Note: When the scale parameter is phi=1, then S-Plus gives UP TP AN
      ADDITIVE CONSTANT (Venables and Ripley p. 187)

       AIC = Deviance + 2*(Number of estimated parameters)

      For example: In Model 1 we estimate 8+1=9 parameters, and
      deviance=315.6854. Thus AIC=315.6854+2*9=333.6854


         NumPar SS.WorkResid  SS.RespRes/508   Deviance   df    AIC
        -------------------------------------------------------------
Model 0    9    1.857614       108.9936        315.6854  499  333.6854 
Model 1    2    1.620043        93.0178        276.0710  506  280.071
Model 2    3    1.294978        75.5238        222.2333  505  228.2333
Model 3    4    1.190659        69.0431        203.5214  504  211.5214
Model 4    5    1.020108        59.3832        174.5522  503  184.5522
Model 5    6    1.019880        59.3848        174.5268  502  186.5268
Model 6    6    0.999283        58.4430        171.4078  502  183.4078
Try(below) 6    0.988           57.87          169.6683  502  181.67

Model    Pearson={\cal {\chi}}^{2}        BIC
----------------------------------------------
0              320.17                    371.76
1              276.11                    288.53
2              222.32                    240.92
3              203.78                    228.44
4              174.91                    205.71  
5              174.89                    211.91
6              171.72                    208.79
Try(below)     169.95                    207.05

From BIC, Model 4 is the winner.


Model 0:  Mrt ~ Tmp+Hum+Crb+Slf+Nit+Hdr+Ozn+Par
Model 1:  Mrt ~ y1
Model 2:  Mrt ~ y1 + y2
Model 3:  Mrt ~ y1 + y2 + T1
Model 4:  Mrt ~ y1 + y2 + T1 +      log(Crb)
Model 5:  Mrt ~ y1 + y2 + T1 + T2 + log(Crb) 
Model 6:  Mrt ~ y1 + y2 + T1 + Tmp+ log(Crb)


DIAGNOSTICS FUNCTION
For a glm object x:

out.f   <- function(x)
{
p       <- length(x$coef)
wr      <- sum(residuals(x, type="w")^{2})
mse     <- mean(residuals(x, type="r")^{2})
x2      <- sum(residuals(x, type="p")^{2})
d       <- x$deviance
df      <- x$df
aic     <- d+2*length(x$coeff)
bic     <- d+length(x$coef)*log(length(x$resid))
c(p, wr, mse, x2, d, df, aic,bic)
}







Plots
-------

par(mfrow=c(2,2), oma=c(0,0,5,0))

tsplot(y1y2T1LogC$y,y1y2T1LogC$fitted.values)
legend(350,228,c("____ Data","........ Fit"),bty="n")

mtext("Poisson Regression of Filtered LA Mortality Mrt(t) on", 
outer=T,line=2, cex=1.2)
mtext("Mrt(t-1), Mrt(t-2), Tmp(t-1), log(Crb(t))", 
outer=T, line=0, cex=1.2)



tsplot(Resid0,main="Model 0 Residuals") ##(Working) residuals for Model 1.
tsplot(Resid4,main="Model 4 Residuals") ##(Working) residuals for Model 4.
acf(Resid0)
acf(Resid4)

mtext("Comparison of Residuals", outer=T,line=2, cex=1.5)

-----------------------------------
Add Tmp^2 as suggested by Shumway et al.

Try <- glm(formula = Mrt ~ y1 + y2 + T1 + I(Tmp^2 )+ log(Crb), 
family = poisson, data = LA1)



> summary(Try)

Call: glm(formula = Mrt ~ y1 + y2 + T1 + I(Tmp^2) + log(Crb), 
          family = poisson, 
data
         = LA1)
Deviance Residuals:
       Min         1Q      Median        3Q      Max 
 -1.821623 -0.3464865 -0.01389327 0.3605499 2.116257

Coefficients:
                    Value   Std. Error    t value 
(Intercept)  4.477866e+00 0.0705140873  63.503144
         y1  2.172803e-03 0.0003769724   5.763824
         y2  1.816273e-03 0.0003718341   4.884632
         T1 -1.987967e-03 0.0005323697  -3.734185
   I(Tmp^2)  7.757581e-06 0.0000035087   2.210956
   log(Crb)  4.062347e-02 0.0091413189   4.443940

(Dispersion Parameter for Poisson family taken to be 1 )

    Null Deviance: 588.6089 on 507 degrees of freedom

Residual Deviance: 169.6683 on 502 degrees of freedom


> Try$residuals%*%Try$residuals
          [,1] 
[1,] 0.9884745

Residual Deviance: 169.6683 on 502 degrees of freedom


> RespResid.Try <- resid(Try, type="response")
> RespResid.Try%*%RespResid.Try/508
         [,1] 
[1,] 57.86894

> Try$residuals%*%Try$residuals
          [,1] 
[1,] 0.9884745

Start:  AIC= 181.6683 
 Mrt ~ y1 + y2 + T1 + I(Tmp^2) + log(Crb) 


AIC=169.6683 + 2*6=181.6683
BIC=169.6683 + 6*log(508)=207.0512 (But in Model 4: BIC=205.71)
resid(Try, type="p")%*%resid(Try, type="p")=169.9458
resid(Try, type="pearson")%*%resid(Try, type="pearson")=169.9458 ##SAME
--------------------------------------------

Try1 <- glm(formula = Mrt ~ y1 + y2 + T1 + I(Tmp^2 )+ log(Crb), 
family = poisson, data = LA1)

Residual Deviance: 169.6683 on 502 degrees of freedom
Start:  AIC= 181.6683 
 Mrt ~ y1 + y2 + T1 + I(Tmp^2) + log(Crb) 
---------------------------------------------------
Try2 <- glm(formula = Mrt ~ y1 + y2 + Tmp + I(Tmp^2 )+ log(Crb), 
family = poisson, data = LA1) ##"FINAL" from above

Residual Deviance: 166.2256 on 502 degrees of freedom
Start:  AIC= 178.2256 
 Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + log(Crb) 

> r <- resid(Try2, type="response")

> r%*%r/508
         [,1] 
[1,] 56.42665


> Try2$residuals%*%Try2$residuals
          [,1] 
[1,] 0.9717937

------------------------------------------------

A. Introduction to lm() and glm()
==================================

glm(formula, family = gaussian, data, weights, subset,
         na.action, start = NULL, etastart, mustart,
         offset, control = glm.control(...), model = TRUE,
         method = "glm.fit", x = FALSE, y = TRUE, contrasts = NULL,
         ...)

formula: an object of class '"formula"' (or one that can be 
         coerced to that class): a symbolic description of the 
          model to be fitted. 

Ex:                
Reg1 <- glm(lot1 ~ log(u), data=clotting, family=Gamma)
summary(Reg1)

The "formula" gives the model: lot1 ~ log(u)


 

Note: +,-,*,^,: etc. have special meaning in the "formula"
in lm, glm. To interpret an expression literally, we need 
to use the identity function I().

Thus, to avoid confusion, the function 'I()' can be used to 
bracket those portions of a model formula where the operators 
are used in their arithmetic sense.  

For example, in the formula 'y ~ a + I(b+c)', the term 'b+c' 
is to be interpreted as the sum of 'b' and 'c'. Thus y is
regressed on two covariates. This is different than 
'y ~ a + b + c' where there are three covariates.


Another example, suppose we want to include the product of a and b 
in a formula specification. If use a*b then this mean a or b or 
a*b!!! To include ONLY a*b use I(a*b) to protect the 
expression a*b:  That is use lm(y~I(a*b))!!!
Formally, the '*' operator denotes factor crossing.
 

The '^' operator indicates crossing to the specified degree.  
For example (a+b+c)^2 is identical to (a+b+c)*(a+b+c) which 
in turn expands  to a formula containing the main effects for 
'a', 'b' and 'c' together with their second-order interactions.
 
To remove the intercept term use 'y ~ x - 1'.


The '-' operator removes the specified terms, so that 
'(a+b+c)^2 - a:b' is identical to 'a + b + c + b:c + a:c'.  
It can also used to remove the intercept term: 
'y ~ x - 1' is a line through the origin.  A model with no 
intercept can be also specified as 'y ~ x + 0' or 'y ~ 0 + x'.


---------------------

Some formula examples:
Y ~ X1,  Y is modeled by X1.
Y ~ X1 + X2, Y is modeled by X1 and X2 as in multiple regression.
Y ~ X1 * X2, Y is modeled by X1, X2 and X1*X2.
Y ~ (X1 + X2)^2, Two-way interactions. Note usual powers.
Y ~ X1+ I(X2^2), Y is modeled by X1 and X2^2.
Y ~ X1 | X2, Y is modeled by X1 conditioned on X2.

-----------------------


Simple example first
======================

y <- c(183.63,191.05,180.09,184.67,173.60)
x1 <- c(11.90,10.75,9.33,9.54,8.27)
x2 <- c( 97.85,104.64,94.36,98.05, 95.85)

> lm(y ~ x1+x2)

Call:
lm(formula = y ~ x1 + x2)

Coefficients:
(Intercept)           x1           x2
     60.511        1.651        1.077


A <- data.frame(cbind(y,x1,x2))
> is.data.frame(A)
[1] FALSE

> A
       y    x1     x2
1 183.63 11.90  97.85
2 191.05 10.75 104.64
3 180.09  9.33  94.36
4 184.67  9.54  98.05
5 173.60  8.27  95.85



colnames(A) <- c('yy','xx1','xx2')

> A
      yy   xx1    xx2
1 183.63 11.90  97.85
2 191.05 10.75 104.64
3 180.09  9.33  94.36
4 184.67  9.54  98.05
5 173.60  8.27  95.85

> is.data.frame(A)
[1] TRUE

> yy ##Not good by itself
Error: object 'yy' not found

> A$yy ##OK now
[1] 183.63 191.05 180.09 184.67 173.60


##Within lm(),glm() we can refer to the col names provided the 
  data frame is specified:

> lm(formula = yy ~ xx1 + xx2, data=A)

Call:
lm(formula = yy ~ xx1 + xx2, data = A)

Coefficients:
(Intercept)          xx1          xx2
     60.511        1.651        1.077


##Now glm() with family "gaussian": get the same thing.

> glm(formula = yy ~ xx1 + xx2, family = gaussian, data = A)


Call:  glm(formula = yy ~ xx1 + xx2, family = gaussian, data = A)

Coefficients:
(Intercept)          xx1          xx2
     60.511        1.651        1.077

Degrees of Freedom: 4 Total (i.e. Null);  2 Residual
Null Deviance:      164
Residual Deviance: 32.44        AIC: 31.54

##If use family=poisson" for non-integer response, R gives warnings
  and AIC=Inf:

> glm(formula = yy ~ xx1 + xx2, family = poisson, data = A)

Call:  glm(formula = yy ~ xx1 + xx2, family = poisson, data = A)

Coefficients:
(Intercept)          xx1          xx2
   4.545518     0.009147     0.005811

Degrees of Freedom: 4 Total (i.e. Null);  2 Residual
Null Deviance:      0.9
Residual Deviance: 0.1824       AIC: Inf
Warning messages:                                
1: In dpois(y, mu, log = TRUE) : non-integer x = 183.630000 ## realy yy
2: In dpois(y, mu, log = TRUE) : non-integer x = 191.050000
3: In dpois(y, mu, log = TRUE) : non-integer x = 180.090000
4: In dpois(y, mu, log = TRUE) : non-integer x = 184.670000
5: In dpois(y, mu, log = TRUE) : non-integer x = 173.600000


##To avoide the warnings use: "family = quasipoisson". 
  We get same estimates. AIC=NA

> glm(formula = yy ~ xx1 + xx2, family = quasipoisson, data = A)

Call:  glm(formula = yy ~ xx1 + xx2, family = quasipoisson, data = A)

Coefficients:
(Intercept)          xx1          xx2
   4.545518     0.009147     0.005811

Degrees of Freedom: 4 Total (i.e. Null);  2 Residual
Null Deviance:      0.9
Residual Deviance: 0.1824       AIC: NA

----------------------------------------

Now our environmental models 
-----------------------------

Thus, the following 2 models are identical since "formula"
treats log(Crb) and I(log(Crb)) alike.

A: 
--
glm(formula = Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + log(Crb),
family = poisson, data = LA.y1y2)

B:
--
 glm(formula = Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + I(log(Crb)),
family = poisson, data = LA.y1y2)


A:
--
> glm(formula = Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + log(Crb),
+ family = poisson, data = LA.y1y2)

Call:  glm(formula = Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + log(Crb), 
           family = poisson,      data = LA.y1y2)

Coefficients:
(Intercept)           y1           y2          Tmp     I(Tmp^2)     log(Crb)
  5.1751771    0.0017316    0.0020606   -0.0212812    0.0001410    0.0524105

Degrees of Freedom: 507 Total (i.e. Null);  502 Residual
Null Deviance:      588.6
Residual Deviance: 166.2        AIC: Inf
There were 50 or more warnings (use warnings() to see the first 50)

B:
--
> glm(formula = Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + I(log(Crb)),
+ family = poisson, data = LA.y1y2)

Call:  glm(formula = Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + I(log(Crb)), 
        family = poisson,      data = LA.y1y2)

Coefficients:
(Intercept)           y1           y2          Tmp     I(Tmp^2)  I(log(Crb))
  5.1751771    0.0017316    0.0020606   -0.0212812    0.0001410    0.0524105

Degrees of Freedom: 507 Total (i.e. Null);  502 Residual
Null Deviance:      588.6
Residual Deviance: 166.2        AIC: Inf
There were 50 or more warnings (use warnings() to see the first 50)


But 

C:
---
glm(formula = Mrt ~ y1 + y2 + Tmp + Tmp^2 + log(Crb),
family = poisson, data = LA.y1y2)

Is different:

> glm(formula = Mrt ~ y1 + y2 + Tmp + Tmp^2 + log(Crb),
+ family = poisson, data = LA.y1y2)

Call:  glm(formula = Mrt ~ y1 + y2 + Tmp + Tmp^2 + log(Crb), 
family = poisson,      data = LA.y1y2)

Coefficients:
(Intercept)           y1           y2          Tmp     log(Crb)
  4.352e+00    1.842e-03    2.172e-03   -8.707e-05    5.293e-02

Degrees of Freedom: 507 Total (i.e. Null);  503 Residual
Null Deviance:      588.6
Residual Deviance: 183.6        AIC: Inf
There were 50 or more warnings (use warnings() to see the first 50)

R simply dropped Tmp^2!!! To see this, go to D where there is no Tmp^2.

D:
---
glm(formula = Mrt ~ y1 + y2 + Tmp + log(Crb),
family = poisson, data = LA.y1y2)

Call:  glm(formula = Mrt ~ y1 + y2 + Tmp + log(Crb), 
family = poisson,      data = LA.y1y2)

Coefficients:
(Intercept)           y1           y2          Tmp     log(Crb)
  4.352e+00    1.842e-03    2.172e-03   -8.707e-05    5.293e-02

Degrees of Freedom: 507 Total (i.e. Null);  503 Residual
Null Deviance:      588.6
Residual Deviance: 183.6        AIC: Inf
There were 50 or more warnings (use warnings() to see the first 50)



==================================================================

B. Comparison between Poisson and Negative Binomial

See: Venables and Ripley pp. 206, 207
=========================================================

1. Poisson
-------------
PoisReg.All <- glm(Mrt ~ Rsp+Crd+Tmp+Hum+Crb+Slf+Nit+Hdr+Ozn+Par, 
                            family=poisson, data=LA)

par(mfrow=c(2,2))
acf(PoisReg.All$residuals)        ##FIG5 Not good!
cpgram(PoisReg.All$residuals)



> PoisReg.All

Call:  glm(formula = Mrt ~ Rsp + Crd + Tmp + Hum + Crb + Slf + Nit +      Hdr + Ozn + Par, family = poisson, data = LA)

Coefficients:
(Intercept)          Rsp          Crd          Tmp          Hum          Crb
  4.485e+00    6.005e-03    6.380e-03   -1.204e-04   -7.808e-05    4.958e-04
        Slf          Nit          Hdr          Ozn          Par
 -7.446e-03   -4.668e-04    7.655e-04    2.230e-03    1.080e-04

Degrees of Freedom: 507 Total (i.e. Null);  497 Residual
Null Deviance:      588.6
Residual Deviance: 57.85        AIC: Inf

-----------------------------------------------------
2. NB
----------
##Need library MASS:
library(MASS)

Specifies the information required to fit a Negative Binomial
     generalized linear model, with known 'theta' parameter, using
     'glm()'.

Usage:
negative.binomial(theta = stop("'theta' must be specified"), link = "log")


a. Now use glm() Do not specify link, and take theta=2
-------------------------------------------------------------
NB.All <- glm(Mrt ~ Rsp+Crd+Tmp+Hum+Crb+Slf+Nit+Hdr+Ozn+Par, 
                            family=negative.binomial(2),data=LA)

> NB.All$family

Family: Negative Binomial(2)
Link function: log

> NB.All

Call:  glm(formula = Mrt ~ Rsp + Crd + Tmp + Hum + Crb + Slf + Nit +      Hdr + Ozn + Par, family = negative.binomial(2), data = LA)

Coefficients:
(Intercept)          Rsp          Crd          Tmp          Hum          Crb
  4.477e+00    6.189e-03    6.413e-03   -6.990e-05   -7.571e-05    3.864e-04
        Slf          Nit          Hdr          Ozn          Par
 -7.551e-03   -4.854e-04    7.599e-04    2.184e-03    1.351e-04

Degrees of Freedom: 507 Total (i.e. Null);  497 Residual
Null Deviance:      6.743 ##Much smaller
Residual Deviance: 0.6798       AIC: 5861 ###Get AIC!!!

> NB.All$family ###Note!!

Family: Negative Binomial(2)
Link function: log


par(mfrow=c(2,2))
acf(NB.All$residuals)        ##FIG5 Not good!
cpgram(NB.All$residuals)

b. Now use glm.nb. R estimates theta
   The coeef estimates are close to previous case using glm()
----------------------------------------------------------------
NB.All <- glm.nb(Mrt ~ Rsp+Crd+Tmp+Hum+Crb+Slf+Nit+Hdr+Ozn+Par, data=LA)

glm.nb(formula, data, weights, subset, na.action,
       start = NULL, etastart, mustart,
       control = glm.control(...), method = "glm.fit",
       model = TRUE, x = FALSE, y = TRUE, contrasts = NULL, ...,
       init.theta, link = log)

> NB.All  

Call:  glm.nb(formula = Mrt ~ Rsp + Crd + Tmp + Hum + Crb + Slf + Nit +      Hdr + Ozn + Par, data = LA, init.theta = 26049012.6728116,      link = log)

Coefficients:
(Intercept)          Rsp          Crd          Tmp          Hum          Crb
  4.485e+00    6.005e-03    6.380e-03   -1.204e-04   -7.808e-05    4.958e-04
        Slf          Nit          Hdr          Ozn          Par
 -7.446e-03   -4.668e-04    7.655e-04    2.230e-03    1.080e-04

Degrees of Freedom: 507 Total (i.e. Null);  497 Residual
Null Deviance:      588.6
Residual Deviance: 57.84        AIC: 3620

> NB.All$theta  ###NOTE!!!!
[1] 26049013


If specify link=log get the same thing.

NB.All <- glm.nb(Mrt ~ Rsp+Crd+Tmp+Hum+Crb+Slf+Nit+Hdr+Ozn+Par, 
link = log, data=LA)


> attributes(NB.All)
$names
 [1] "coefficients"      "residuals"         "fitted.values"
 [4] "effects"           "R"                 "rank"
 [7] "qr"                "family"            "linear.predictors"
[10] "deviance"          "aic"               "null.deviance"
[13] "iter"              "weights"           "prior.weights"
[16] "df.residual"       "df.null"           "y"
[19] "converged"         "boundary"          "model"
[22] "call"              "formula"           "terms"
[25] "data"              "offset"            "control"
[28] "method"            "contrasts"         "xlevels"

$class
[1] "glm" "lm"

> NB.All$family

Family: Negative Binomial(2)
Link function: log

==========================
Comparison: Best model, Poisson,NB,Quasi

##Need 
library(MASS)

###Poisson
PoisReg.TCy1y2 <-glm(formula = Mrt ~ Tmp + I(Tmp^2) + log(Crb) + y1 + y2, family = poisson, 
    data = LA.y1y2)

Coefficients:
(Intercept)          Tmp     I(Tmp^2)     log(Crb)           y1           y2  
   5.175177    -0.021281     0.000141     0.052411     0.001732     0.002061  

Degrees of Freedom: 507 Total (i.e. Null);  502 Residual
Null Deviance:	    588.6 
Residual Deviance: 166.2 	AIC: Inf 


Nice way to plot two ts 
ts.plot(ts(PoisReg.TCy1y2$y),
ts(PoisReg.TCy1$fitted.values),xlab="Weeks",ylab="Counts",type="l",col=c(1,2))
legend('topright', col=c(1,2), c('original','QLM'),pch=c('-','-'))

   


###NB
NB_Reg.TCy1y2 <- glm.nb(formula = Mrt ~ Tmp + I(Tmp^2) + log(Crb) + y1 + y2, link = log,
data = LA.y1y2)

Coefficients:
(Intercept)          Tmp     I(Tmp^2)     log(Crb)           y1           y2  
   5.175177    -0.021281     0.000141     0.052411     0.001732     0.002061  

Degrees of Freedom: 507 Total (i.e. Null);  502 Residual
Null Deviance:	    588.6 
Residual Deviance: 166.2 	AIC: 3719 


###NB2
NB2_Reg.TCy1y2 <- glm( Mrt ~ Tmp + I(Tmp^2) + log(Crb) + y1 + y2, 
family = negative.binomial(2), data = LA.y1y2)

Coefficients:
(Intercept)          Tmp     I(Tmp^2)     log(Crb)           y1           y2  
  5.1787370   -0.0212338    0.0001405    0.0519774    0.0016709    0.0021005  

Degrees of Freedom: 507 Total (i.e. Null);  502 Residual
Null Deviance:	    6.743 
Residual Deviance: 1.919 	AIC: 5852 

### QUASI: 777777777777777777777777777
attach(LA.y1y2)

Here we assume a working variance of the form phi*V(mu), where V(mu) is a variance
function. R allows V(mu) of form "constant", "mu(1-mu)", "mu", "mu^2" and "mu^3".
Here we use V(u)=mu.

Qausi_Reg.TCy1y2 <- glm(Mrt ~ Tmp + I(Tmp^2) + log(Crb) + y1 + y2, data= LA.y1y2,
   family = quasi(link = "log", variance = "mu")) 

Coefficients:
(Intercept)          Tmp     I(Tmp^2)     log(Crb)           y1           y2  
   5.175177    -0.021281     0.000141     0.052411     0.001732     0.002061  

Degrees of Freedom: 507 Total (i.e. Null);  502 Residual
Null Deviance:	    588.6 
Residual Deviance: 166.2 	AIC: NA 


acf(Qausi_Reg.TCy1y2$residuals)
cpgram(Qausi_Reg.TCy1y2$residuals)

Fit <- 
exp(5.175177-0.021281*Tmp+0.000141*Tmp^2 + 0.052411*log(Crb) + 0.001732*y1 + 0.002061*y2)

Mrt <- ts(Mrt)
Fit <- ts(Fit)

ts.plot(ts(Mrt),ts(Fit),xlab="Weeks",ylab="Counts",type="l",col=c(1,2))
legend('topright', col=c(1,2), c('original','QLM'),pch=c('-','-'))

#### Comprae with  bad model:
M1 <- glm(Mrt ~ log(Crb), family = quasi(link = "log", variance = "mu")) 
ts.plot(ts(M1$y),ts(M1$fitted.values),xlab="Weeks",ylab="Counts",type="l",col=c(1,2))
legend('topright', col=c(1,2), c('original','QLM'),pch=c('-','-'))

acf(M1$residuals)
cpgram(M1$residuals)







