Family of statistical analyzes that allows to study and explore a data set of quantitative variables.
Canonical Correlation Analysis (CCA)
Principal Component Analysis (PCA)
Linear discriminant analysis (LDA)
Redundancy Analysis (RDA)
One of the goals behind PCA is to graphically represent the essential information contained in a (quantitative) data table.
Useful way to discover (hidden) patterns in the data by compressing data.
Not performed directly on the data but on either the covariance or correlation matrix of the data.
PCA analysis is applied to rectangular data format.
\[ X_{n,p} = \begin{bmatrix} a_{1,1} & a_{1,2} & \cdots & a_{1,p} \\ a_{2,1} & a_{2,2} & \cdots & a_{2,p} \\ \vdots & \vdots & \ddots & \vdots \\ a_{n,1} & a_{n,2} & \cdots & a_{m,p} \end{bmatrix} \]
\(n\) objects in the rows (observations)
\(p\) quantitative variables in the columns (variables)
mpg | cyl | disp | hp | drat | wt | qsec | |
---|---|---|---|---|---|---|---|
Mazda RX4 | 21.00 | 6.00 | 160.00 | 110.00 | 3.90 | 2.62 | 16.46 |
Mazda RX4 Wag | 21.00 | 6.00 | 160.00 | 110.00 | 3.90 | 2.88 | 17.02 |
Datsun 710 | 22.80 | 4.00 | 108.00 | 93.00 | 3.85 | 2.32 | 18.61 |
Hornet 4 Drive | 21.40 | 6.00 | 258.00 | 110.00 | 3.08 | 3.21 | 19.44 |
Hornet Sportabout | 18.70 | 8.00 | 360.00 | 175.00 | 3.15 | 3.44 | 17.02 |
Valiant | 18.10 | 6.00 | 225.00 | 105.00 | 2.76 | 3.46 | 20.22 |
Duster 360 | 14.30 | 8.00 | 360.00 | 245.00 | 3.21 | 3.57 | 15.84 |
Merc 240D | 24.40 | 4.00 | 146.70 | 62.00 | 3.69 | 3.19 | 20.00 |
Merc 230 | 22.80 | 4.00 | 140.80 | 95.00 | 3.92 | 3.15 | 22.90 |
Merc 280 | 19.20 | 6.00 | 167.60 | 123.00 | 3.92 | 3.44 | 18.30 |
One option to visualize this dataset is to look at all pairs of correlation.