Is a sample i.i.d or is a collection of random variables i.i.d.?

Basic terminology question. I hear “let the sample be i.i.d.“ and “let these random variables be i.i.d.” being used interchangeably. Even Wikipedia uses both:

A collection of random variables is independent and identically distributed if.

It is commonly assumed that observations in a sample are effectively i.i.d.

Are there different nuances or are they equivalent? Also, are the points in a sample regarded as repeated measurements of one random variable, or one measurement each of multiple i.i.d. random variables?

asked Jul 3, 2020 at 7:18 1,071 13 13 silver badges 20 20 bronze badges

1 Answer 1

$\begingroup$

From Wikipedia, two Random Variables (RVs) (remark: you can generalize this to any number of RVs) are independent and identically distributed (i.i.d.) if their Cumulative Distribution Function (CDF) is the same for any element of the domain $I$ and if their joint CDF factorizes in the product of the marginal CDFs. This means that: $$&F_(x)=F_(x)\,&\forall x\in I\\&F_(x,y)=F_(x)\cdot F_(y)\,&\forall x,y\in I\end>$$

(Note that this also imply that their pdfs are the same (almost everywhere, i.e. on the whole domain except for sets of measure zero, but this is a technical condition so don't worry about it)

Realizations of an RV are usually referred to as samples, i.e. roughly speaking their outcome. The assumption that samples generated by an RV are i.i.d. simply refers to the fact that underlying RVs, whose realizations you observe in the samples, are i.i.d.

So replying to your questions:

they are essentially the same thing.
You can regard it as repeated measurements of one random variable since the CDFs of two i.i.d. RVs are the same.