LOG concerning input of Data on Consumption & Price of Beef and Pork, and correlation analyses in SAS. ============================================== 10/8/08 The data, including explanation of variables, can be found at : http://lib.stat.cmu.edu/DASL/Datafiles/agecondat.html (When you download, you will find that the numeric data fields are separated by tabs not blanks, and you will have to replace them by blanks using word-processor commands in order to read into SAS.) libname home "."; data home.meat; infile "ASCdata/Meat2.dat"; input yr pbe cbe ppo cbo pfo dinc cfo rdinc rfp ; run; proc corr; run; /* Gives all paiwise correlations plus some simple statistics like means and variances. */ * ## Now we do some more targeted analyses. ; options linesize = 70 nodate nocenter; proc corr data=home.meat ; var pbe cbe; with ppo cbo pfo dinc cfo rdinc rfp; run; Pearson Correlation Coefficients, N = 17 Prob > |r| under H0: Rho=0 pbe cbe ppo 0.32677 0.08341 0.2005 0.7503 cbo -0.34615 -0.29150 0.1735 0.2563 pfo -0.40552 0.13072 0.1063 0.6170 dinc -0.65276 0.32024 0.0045 0.2102 cfo -0.77336 0.50682 0.0003 0.0379 rdinc -0.64062 0.46816 0.0056 0.0581 rfp -0.34021 0.29144 0.1815 0.2564 * This suggests that the "strong" predictive variables for either the price (pbe) or consumption amounts (cbe) of beef are: 6. DINC = Disposable income per capita index (1947-1949 = 100) 7. CFO = Food consumption per capita index (1947-1949 = 100) 8. RDINC = Index of real disposable income per capita (1947-1949 = 100) proc corr; var pbe cbe; /* CORR=-0.75244, p-val= 0.0005 */ proc corr ; /* CORR=-0.65961, p-val= 0.0054 */ var pbe cbe; partial cfo; proc corr ; /* CORR=-0.69838, p-val= 0.0038 */ var pbe cbe; partial cfo dinc; proc corr ; /* CORR=-0.87163, p-val= 0.0001 */ var pbe cbe; partial cfo dinc rdinc; run; ### This succession of partial correlations says that "removing" the effect of these other important variables only makes the negative relationship between PBE and CBE stronger !!! ### The linear algebra meaning of these results can be seen as follows: data meatmat; set home.meat (keep=pbe cbe dinc cfo rdinc); /* meatmat = 17 x 5 matrix with columns pbe, cbe, dinc, cfo, rdinc */ > cor(pbe, meatmat[,2:5]) cbe cfo dinc rdinc [1,] -0.7524356 -0.7733555 -0.652759 -0.6406248 > cor(cbe, meatmat[,c(1,3:5)]) pbe cfo dinc rdinc [1,] -0.7524356 0.5068163 0.3202374 0.468159 meatmat2 = 17 x 4 matrix with columns pbe, cbe, dinc, rdinc obtained by replacing each column of meatmat1 by its projection orthogonal to the meatmat1 cfo column ## Now the partial correlation of pbe with the other variables after "removing the effect of cfo" is > cor(meatmat2[,1], meatmat2[,2:4]) cbe dinc rdinc [1,] -0.659605 -0.3016153 0.1461794 COMPARE with: proc corr data = meatmat; var pbe; with cbe dinc rdinc; partial cfo; run; pbe cbe -0.65961 0.0054 dinc -0.30162 0.2563 rdinc 0.14618 0.5891 meatmat3 = 17 x 3 matrix with columns pbe, cbe, rdinc obtained by replacing each column of meatmat2 by its projection orthogonal to the meatmat2 dinc column > cor(meatmat3[,1], meatmat3[,2:3]) cbe rdinc [1,] -0.698382 0.515713 COMPARE WITH: proc corr data=meatmat; var pbe; with cbe rdinc; partial cfo dinc ; run; pbe cbe -0.69838 0.0038 rdinc 0.51571 0.0491 meatmat4 = 17 x 2 matrix with columns pbe, cbe obtained by replacing each column of meatmat3 by its projection orthogonal to the meatmat3 rdinc column > cor(meatmat4) pbe cbe pbe 1.0000000 -0.8716332 cbe -0.8716332 1.0000000 COMPARE WITH: proc corr data=meatmat; var pbe; with cbe; partial cfo dinc rdinc; run; pbe cbe -0.87163 <.0001 ## All of this shows the meaning of "partial correlations" as a results of removing successive projections ### Here is how to do the successive "removing of projections" ### directly using SAS. proc reg data=Meat; model pbe cbe dinc rdinc = cfo ; output out= meatfil1 r = pbe1 cbe1 dinc1 rdinc1; run; options nocenter linesize=70 nodate; proc corr data = meatfil1; var pbe1 cbe1 dinc1 rdinc1; run; Pearson Correlation Coefficients, N = 17 Prob > |r| under H0: Rho=0 pbe1 cbe1 dinc1 rdinc1 pbe1 1.00000 -0.65961 -0.30162 0.14618 Residual 0.0040 0.2394 0.5756 cbe1 -0.65961 1.00000 -0.02028 0.04962 Residual 0.0040 0.9384 0.8500 dinc1 -0.30162 -0.02028 1.00000 0.69207 Residual 0.2394 0.9384 0.0021 rdinc1 0.14618 0.04962 0.69207 1.00000 Residual 0.5756 0.8500 0.0021 ### In particular, note that the correlation here between pbe1 and cbe1 exactly matches the "partial correlation of pbe and cbe with cfo removed" previously calculated. ## To go one step farther in partial correlations, we do: proc reg data=meatfil1; model pbe1 cbe1 = dinc1 ; output out= meatfil2 r = pbe2 cbe2; proc corr data=meatfil2; var pbe2 cbe2; run; /* THINK OF HOW LABORIOUS IT WOULD BE TO DO THESE CALCULATIONS "BY HAND" IN SAS WITHOUT PROC REG. CAN YOU THINK HOW YOU WOULD DO THE CALCULATIONS USING DATA-STEP AND PROC MEANS ? */ ## and we get Pearson Correlation Coefficients, N = 17 Prob > |r| under H0: Rho=0 pbe2 cbe2 pbe2 1.00000 -0.69838 ## matching partial corr of pbe, cbe Residual 0.0018 ## after removing cfo ,dinc