Research assistant at Takuvik (Laval University, Québec, Canada)
Data Science is an interdisciplinary field about processes and systems to extract knowledge or insights from large volumes of data in various forms, either structured or unstructured (Wikipedia).
As scientists, you need to learn and practice essential tools for capturing, manipulating and sharing data.
I wish I knew more about code workflow and data organization when I started my PhD!
Provide tools (R) and knowledge (statistics) for ecologists.
Best practices to data manipulation and data analysis.
You might wonder why it is important to learn programming in science. There are at least two good reasons:
Reproducibility (ability to recreate what you did).
Automation (run existing analysis on new data).
It is often said that 80% of data analysis is spent on the cleaning and preparing data. And it’s not just a first step, but it must be repeated many over the course of analysis as new problems come to light or new data is collected (Hadley Whickham).
During this course we will focus on R.
It is a free software!
It is cross-platforms (Windows, Mac and Linux).
Has exceptional graphics capabilities (ideal for preparing scientific manuscripts).
Easy to develop your own functions (automation).
Allows to keep a trace on how analyzes have been done (reproducible).
Packages and active development (10 000+ packages available on CRAN).
This course will be divided into two main parts:
An introduction to R (data manipulation, graphics, etc.)
Statistical analysis using R
ggplot2
The best way to learn programming and statistics is by practicing. After each concept, we will do exercises together.
Download R https://cran.r-project.org/
I strongly recommend to install RStudio as your integrated developing environment (IDE).
Download RStudio https://www.rstudio.com/products/RStudio/