LOG concerning input of Data on Consumption & Price of
Beef and Pork, and correlation analyses in SAS.
============================================== 10/8/08
The data, including explanation of variables, can be
found at :
http://lib.stat.cmu.edu/DASL/Datafiles/agecondat.html
(When you download, you will find that the numeric data
fields are separated by tabs not blanks, and you will
have to replace them by blanks using word-processor
commands in order to read into SAS.)
libname home ".";
data home.meat;
infile "ASCdata/Meat2.dat";
input yr pbe cbe ppo cbo pfo dinc cfo rdinc rfp ;
run;
proc corr; run;
/* Gives all paiwise correlations plus some simple
statistics like means and variances. */
* ## Now we do some more targeted analyses. ;
options linesize = 70 nodate nocenter;
proc corr data=home.meat ;
var pbe cbe;
with ppo cbo pfo dinc cfo rdinc rfp;
run;
Pearson Correlation Coefficients, N = 17
Prob > |r| under H0: Rho=0
pbe cbe
ppo 0.32677 0.08341
0.2005 0.7503
cbo -0.34615 -0.29150
0.1735 0.2563
pfo -0.40552 0.13072
0.1063 0.6170
dinc -0.65276 0.32024
0.0045 0.2102
cfo -0.77336 0.50682
0.0003 0.0379
rdinc -0.64062 0.46816
0.0056 0.0581
rfp -0.34021 0.29144
0.1815 0.2564
* This suggests that the "strong" predictive
variables for either the price (pbe) or
consumption amounts (cbe) of beef are:
6. DINC = Disposable income per capita index (1947-1949 = 100)
7. CFO = Food consumption per capita index (1947-1949 = 100)
8. RDINC = Index of real disposable income per capita (1947-1949 = 100)
proc corr;
var pbe cbe; /* CORR=-0.75244, p-val= 0.0005 */
proc corr ; /* CORR=-0.65961, p-val= 0.0054 */
var pbe cbe;
partial cfo;
proc corr ; /* CORR=-0.69838, p-val= 0.0038 */
var pbe cbe;
partial cfo dinc;
proc corr ; /* CORR=-0.87163, p-val= 0.0001 */
var pbe cbe;
partial cfo dinc rdinc; run;
### This succession of partial correlations says that
"removing" the effect of these other important
variables only makes the negative relationship between
PBE and CBE stronger !!!
### The linear algebra meaning of these results can be
seen as follows:
data meatmat;
set home.meat (keep=pbe cbe dinc cfo rdinc);
/*
meatmat = 17 x 5 matrix with columns pbe, cbe, dinc, cfo, rdinc
*/
> cor(pbe, meatmat[,2:5])
cbe cfo dinc rdinc
[1,] -0.7524356 -0.7733555 -0.652759 -0.6406248
> cor(cbe, meatmat[,c(1,3:5)])
pbe cfo dinc rdinc
[1,] -0.7524356 0.5068163 0.3202374 0.468159
meatmat2 = 17 x 4 matrix with columns pbe, cbe, dinc, rdinc
obtained by replacing each column of meatmat1 by its
projection orthogonal to the meatmat1 cfo column
## Now the partial correlation of pbe with the other variables
after "removing the effect of cfo" is
> cor(meatmat2[,1], meatmat2[,2:4])
cbe dinc rdinc
[1,] -0.659605 -0.3016153 0.1461794
COMPARE with:
proc corr data = meatmat;
var pbe; with cbe dinc rdinc; partial cfo;
run;
pbe
cbe -0.65961
0.0054
dinc -0.30162
0.2563
rdinc 0.14618
0.5891
meatmat3 = 17 x 3 matrix with columns pbe, cbe, rdinc
obtained by replacing each column of meatmat2 by its
projection orthogonal to the meatmat2 dinc column
> cor(meatmat3[,1], meatmat3[,2:3])
cbe rdinc
[1,] -0.698382 0.515713
COMPARE WITH:
proc corr data=meatmat;
var pbe; with cbe rdinc; partial cfo dinc ;
run;
pbe
cbe -0.69838
0.0038
rdinc 0.51571
0.0491
meatmat4 = 17 x 2 matrix with columns pbe, cbe
obtained by replacing each column of meatmat3 by its
projection orthogonal to the meatmat3 rdinc column
> cor(meatmat4)
pbe cbe
pbe 1.0000000 -0.8716332
cbe -0.8716332 1.0000000
COMPARE WITH:
proc corr data=meatmat;
var pbe; with cbe; partial cfo dinc rdinc;
run;
pbe
cbe -0.87163
<.0001
## All of this shows the meaning of "partial correlations" as
a results of removing successive projections
### Here is how to do the successive "removing of projections"
### directly using SAS.
proc reg data=Meat;
model pbe cbe dinc rdinc = cfo ;
output out= meatfil1
r = pbe1 cbe1 dinc1 rdinc1;
run;
options nocenter linesize=70 nodate;
proc corr data = meatfil1;
var pbe1 cbe1 dinc1 rdinc1; run;
Pearson Correlation Coefficients, N = 17
Prob > |r| under H0: Rho=0
pbe1 cbe1 dinc1 rdinc1
pbe1 1.00000 -0.65961 -0.30162 0.14618
Residual 0.0040 0.2394 0.5756
cbe1 -0.65961 1.00000 -0.02028 0.04962
Residual 0.0040 0.9384 0.8500
dinc1 -0.30162 -0.02028 1.00000 0.69207
Residual 0.2394 0.9384 0.0021
rdinc1 0.14618 0.04962 0.69207 1.00000
Residual 0.5756 0.8500 0.0021
### In particular, note that the correlation here between
pbe1 and cbe1 exactly matches the "partial correlation of
pbe and cbe with cfo removed" previously calculated.
## To go one step farther in partial correlations, we do:
proc reg data=meatfil1;
model pbe1 cbe1 = dinc1 ;
output out= meatfil2
r = pbe2 cbe2;
proc corr data=meatfil2;
var pbe2 cbe2; run;
/* THINK OF HOW LABORIOUS IT WOULD BE TO DO THESE CALCULATIONS
"BY HAND" IN SAS WITHOUT PROC REG.
CAN YOU THINK HOW YOU WOULD DO THE CALCULATIONS USING DATA-STEP
AND PROC MEANS ? */
## and we get
Pearson Correlation Coefficients, N = 17
Prob > |r| under H0: Rho=0
pbe2 cbe2
pbe2 1.00000 -0.69838 ## matching partial corr of pbe, cbe
Residual 0.0018 ## after removing cfo ,dinc