$$\newcommand{\esp}{\mathbb{E}\left(#1\right)} \newcommand{\var}{\mbox{Var}\left(#1\right)} \newcommand{\deriv}{\dot{#1}(t)} \newcommand{\prob}{ \mathbb{P}\!(#1)} \newcommand{\eqdef}{\mathop{=}\limits^{\mathrm{def}}} \newcommand{\by}{\boldsymbol{y}} \newcommand{\bc}{\boldsymbol{c}} \newcommand{\bpsi}{\boldsymbol{\psi}} \def\pmacro{\texttt{p}} \def\like{{\cal L}} \def\llike{{\cal LL}} \def\logit{{\rm logit}} \def\probit{{\rm probit}} \def\one{{\rm 1\!I}} \def\iid{\mathop{\sim}_{\rm i.i.d.}} \def\simh0{\mathop{\sim}_{H_0}} \def\df{\texttt{df}} \def\res{e} \def\xomega{x} \newcommand{\argmin}{{\rm arg}\min_{#1}} \newcommand{\argmax}{{\rm arg}\max_{#1}} \newcommand{\Rset}{\mbox{\mathbb{R}}} \def\param{\theta} \def\setparam{\Theta} \def\xnew{x_{\rm new}} \def\fnew{f_{\rm new}} \def\ynew{y_{\rm new}} \def\nnew{n_{\rm new}} \def\enew{e_{\rm new}} \def\Xnew{X_{\rm new}} \def\hfnew{\widehat{\fnew}} \def\degree{m} \def\nbeta{d} \newcommand{\limite}{\mathop{\longrightarrow}\limits_{#1}} \def\ka{k{a}} \def\ska{k{\scriptscriptstyle a}} \def\kel{k{e}} \def\skel{k{\scriptscriptstyle e}} \def\cl{C{\small l}} \def\Tlag{T\hspace{-0.1em}{lag}} \def\sTlag{T\hspace{-0.07em}{\scriptscriptstyle lag}} \def\Tk{T\hspace{-0.1em}{k0}} \def\sTk{T\hspace{-0.07em}{\scriptscriptstyle k0}} \def\thalf{t{1/2}} \newcommand{\Dphi}{\partial_\pphi #1} \def\asigma{a} \def\pphi{\psi} \newcommand{\stheta}{{\theta^\star}} \newcommand{\htheta}{{\widehat{\theta}}}$$

The purpose of this course is to show how statistics may be efficiently used in practice.

The course presents both statistical theory and practical analysis on real data sets. The R statistical software and several R packages are used for implementing methods presented in the course and analyzing real data.

Topics covered in the current version of the course are:

• hypopthesis testing (single and multiple comparisons)
• regression models (linear and nonlinear models)
• mixed effects models (linear and nonlinear models)
• mixture models
• detection of change points
• image restoration

We are aware that important aspects of statistics are not addressed, both in terms of models and methods. We plan to fill some of these gaps shortly.

Even if R is extensively used for this course, this is not a R programming course. On one hand, our objective is not to propose the most efficient implementation of an algorithm, but rather to provide a code that is easy to understand, to reuse and to extend.

On the other hand, the R functions used to illustrate a method are not used as “black boxes”. We show in detail how the results of a given function are obtained. Then, the course may be read at two different levels: we may be only interested in the statistical technique to use (and then the R function to use) for a given problem (see the first part of the course about polynomial regression), or we may want to go into details and understand how these results are computed (see the second part of this course about polynomial regression).

This course was first given at Ecole Polytechnique (France) in 2017.

Marc Lavielle
Inria Saclay (Xpop) & Ecole Polytechnique (CMAP)
March, 2017