## Student’s t-test

The t-test is a test used to determine whether there is a significant difference between the means of two groups. It assumes that the dependent variable fits a normal distribution.

H0: $$\mu_1 = \mu_2$$

H1: The means are differents

## t-test

A working example.

##    count spray
## 1     10     A
## 2      7     A
## 3     20     A
## 4     14     A
## 5     14     A
## 6     12     A
## 7     10     A
## 8     23     A
## 9     17     A
## 10    20     A
## 11    14     A
## 12    13     A
## 13    11     B
## 14    17     B
## 15    21     B
## 16    11     B
## 17    16     B
## 18    14     B
## 19    17     B
## 20    17     B
## 21    19     B
## 22    21     B
## 23     7     B
## 24    13     B
## 25     0     C
## 26     1     C
## 27     7     C
## 28     2     C
## 29     3     C
## 30     1     C
## 31     2     C
## 32     1     C
## 33     3     C
## 34     0     C
## 35     1     C
## 36     4     C
## 37     3     D
## 38     5     D
## 39    12     D
## 40     6     D
## 41     4     D
## 42     3     D
## 43     5     D
## 44     5     D
## 45     5     D
## 46     5     D
## 47     2     D
## 48     4     D
## 49     3     E
## 50     5     E
## 51     3     E
## 52     5     E
## 53     3     E
## 54     6     E
## 55     1     E
## 56     1     E
## 57     3     E
## 58     2     E
## 59     6     E
## 60     4     E
## 61    11     F
## 62     9     F
## 63    15     F
## 64    22     F
## 65    15     F
## 66    16     F
## 67    13     F
## 68    10     F
## 69    26     F
## 70    26     F
## 71    24     F
## 72    13     F

## t-test (assumptions)

1. Data normality
2. Homogeneous variances

## Normality

Visually it does not look that our data is normally distributed.

## Normality

We can use the shapiro.test() function to determine if our data is normally distributed (null model).

##
##  Shapiro-Wilk normality test
##
## data:  InsectSprays2$count[InsectSprays2$spray == "C"]
## W = 0.85907, p-value = 0.04759

We can reject the null hypothesis that our data is normally distributed.

## Normality

##
##  Shapiro-Wilk normality test
##
## data:  InsectSprays2$count[InsectSprays2$spray == "F"]
## W = 0.88475, p-value = 0.1009

Despite the non-normal look of this data, the test suggests it is normally distributed.

## Normality

Since our data is not normal, what can we do?

• Use a mathematical tranformation to normalize the data and do the statistical test on these transformed data.

Here, we work with data that is distributed asymmetrically with a dominance of low values, and some strong values. This type of distribution typically corresponds to a log-normal distribution, that is, the log-transformed values follow a Normal distribution.