Numerical ecology

drawing

Research assistant at Takuvik (Laval University, Québec, Canada)

  • Remote sensing
  • Modeling & statistics
  • Data sciences

Data science

Data Science is an interdisciplinary field about processes and systems to extract knowledge or insights from large volumes of data in various forms, either structured or unstructured (Wikipedia).

venn

Source: http://berkeleysciencereview.com/article/first-rule-data-science

Source: http://berkeleysciencereview.com/article/first-rule-data-science

Data science in ecology

  • Data relevant to environmental issues is produced very rapidly at high spacial and temporal scales.
    • Remote sensing, animal tagging/tracking
    • Automated in-situ sensors (salinity, Chla, …)
    • No, we are not doing big data science
  • As scientists, you need to learn and practice essential tools for capturing, manipulating and sharing data.

  • I wish I knew more about code workflow and data organization when I started my PhD!

What is this class about

  • Provide tools (R) and knowledge (statistics) for ecologists.

  • Focus on understanding what you are doing not necessary on the mathematical details.
    • \(x(n)y(n) \Leftrightarrow \frac{1}{{2\pi }}\int\limits_{ - \pi }^\pi {X(e^{j\theta } )Y(e^{j(\omega - \theta )} )d\theta }\)
  • Best practices to data manipulation and data analysis.

Why programing?

You might wonder why it is important to learn programming in science. There are at least two good reasons:

  1. Reproducibility (ability to recreate what you did).

  2. Automation (run existing analysis on new data).

Why is it important?

It is often said that 80% of data analysis is spent on the cleaning and preparing data. And it’s not just a first step, but it must be repeated many over the course of analysis as new problems come to light or new data is collected (Hadley Whickham).

Why using R

During this course we will focus on R.

  • It is a free software!

  • It is cross-platforms (Windows, Mac and Linux).

  • Has exceptional graphics capabilities (ideal for preparing scientific manuscripts).

  • Easy to develop your own functions (automation).

  • Allows to keep a trace on how analyzes have been done (reproducible).

  • Packages and active development (10 000+ packages available on CRAN).

Outline

This course will be divided into two main parts:

  1. An introduction to R (data manipulation, graphics, etc.)

  2. Statistical analysis using R

Class schedule

  • Introduction to R
  • Data manipulation (working with real data)
  • Introduction to ggplot2
  • Statistics
    • t.test, ANOVA
    • Linear regressions
    • Data transformation
    • Variance partitioning
    • Canonical analysis
      • Principal component analysis (PCA)
      • Redundancy analysis (RDA)
    • Generalized linear models (GLM)
    • Introduction to spatial analysis

What I am expecting

Your participation.

The best way to learn programming and statistics is by practicing. After each concept, we will do exercises together.

Class notes

http://www.pmassicotte.com/stats-denmark-2019/

Installing R and RStudio

Download R https://cran.r-project.org/

I strongly recommend to install RStudio as your integrated developing environment (IDE).

Download RStudio https://www.rstudio.com/products/RStudio/

First steps with RStudio