Introdction to basics linear analysis in R

Student’s t-test

The t-test is a test used to determine whether there is a significant difference between the means of two groups. It assumes that the dependent variable fits a normal distribution.

H0: \(\mu_1 = \mu_2\)

H1: The means are differents

t-test

A working example.

##    count spray
## 1     10     A
## 2      7     A
## 3     20     A
## 4     14     A
## 5     14     A
## 6     12     A
## 7     10     A
## 8     23     A
## 9     17     A
## 10    20     A
## 11    14     A
## 12    13     A
## 13    11     B
## 14    17     B
## 15    21     B
## 16    11     B
## 17    16     B
## 18    14     B
## 19    17     B
## 20    17     B
## 21    19     B
## 22    21     B
## 23     7     B
## 24    13     B
## 25     0     C
## 26     1     C
## 27     7     C
## 28     2     C
## 29     3     C
## 30     1     C
## 31     2     C
## 32     1     C
## 33     3     C
## 34     0     C
## 35     1     C
## 36     4     C
## 37     3     D
## 38     5     D
## 39    12     D
## 40     6     D
## 41     4     D
## 42     3     D
## 43     5     D
## 44     5     D
## 45     5     D
## 46     5     D
## 47     2     D
## 48     4     D
## 49     3     E
## 50     5     E
## 51     3     E
## 52     5     E
## 53     3     E
## 54     6     E
## 55     1     E
## 56     1     E
## 57     3     E
## 58     2     E
## 59     6     E
## 60     4     E
## 61    11     F
## 62     9     F
## 63    15     F
## 64    22     F
## 65    15     F
## 66    16     F
## 67    13     F
## 68    10     F
## 69    26     F
## 70    26     F
## 71    24     F
## 72    13     F

t-test

t-test (assumptions)

  1. Data normality
  2. Homogeneous variances

Normality

Visually it does not look that our data is normally distributed.

Normality

We can use the shapiro.test() function to determine if our data is normally distributed (null model).

## 
##  Shapiro-Wilk normality test
## 
## data:  InsectSprays2$count[InsectSprays2$spray == "C"]
## W = 0.85907, p-value = 0.04759

We can reject the null hypothesis that our data is normally distributed.

Normality

## 
##  Shapiro-Wilk normality test
## 
## data:  InsectSprays2$count[InsectSprays2$spray == "F"]
## W = 0.88475, p-value = 0.1009

Despite the non-normal look of this data, the test suggests it is normally distributed.

Normality

Since our data is not normal, what can we do?

  • Use a mathematical tranformation to normalize the data and do the statistical test on these transformed data.

Here, we work with data that is distributed asymmetrically with a dominance of low values, and some strong values. This type of distribution typically corresponds to a log-normal distribution, that is, the log-transformed values follow a Normal distribution.

Normality