234 lines
7.4 KiB
Markdown
234 lines
7.4 KiB
Markdown
# Chapter 4
|
|
## Expected Value
|
|
### Definition
|
|
Let $X$ be a random variable with probability distribution $f(x)$. The mean, or expected value, of $X$ is:
|
|
|
|
For a discrete distribution
|
|
$$E[X] = \sum\limits_x xf(x)$$
|
|
|
|
For a continuous distribution:
|
|
$$E[X] = \int\limits_{-\infty}^{\infty} xf(x)dx$$
|
|
|
|
Given $\{1, 2, 3, 3, 5\}$, the mean is:
|
|
$${1+2+3+3+5 \over 5} = 2.8$$
|
|
$$f(x) = \begin{cases}
|
|
{1\over5} & x=1 \\
|
|
{1\over5} & x=2 \\
|
|
{2\over5} & x=3 \\
|
|
{1\over5} & x=5 \\
|
|
\end{cases}$$
|
|
|
|
$$\sum\limits_x xf(x) = {1\over5}(1) + {1\over5}(2) + {1\over5}(3) + {1\over5}(5) = 2.8$$
|
|
|
|
|
|
### Example
|
|
The probability distribution of a discrete random variable $X$ is:
|
|
$$f(x) = {3 \choose x}\left({1 \over 4}\right)^x\left({3\over4}\right)^{3-x}, x \in \{0, 1, 2, 3\}$$
|
|
Find $E[X]$:
|
|
$$f(x) =
|
|
\begin{cases}
|
|
0 & x=0 \\
|
|
0.422 & x=1 \\
|
|
0.14 & x=2 \\
|
|
{1\over64} & x=3
|
|
\end{cases}$$
|
|
|
|
$$E[X] = \sum\limits_x x {3 \choose x}\left({1\over4}\right)^x \left({3\over4}\right)^{3-x}$$
|
|
$$E[X] = 0(0)+ 0.422(1) + 0.14(2) + {1\over64}(3) = 0.75$$
|
|
|
|
### Example
|
|
Let $X$ be the random variable that denotes the life in hours of a certain electronic device. The PDF is:
|
|
$$f(x) =
|
|
\begin{cases}
|
|
{20000\over x^3} & x > 100 \\
|
|
0 & elsewhere
|
|
\end{cases}$$
|
|
|
|
Find the expected life of this type of device:
|
|
$$E[X] = \int\limits_{-\infty}^{\infty} xf(x)dx = \int\limits_{100}^{\infty}x{20000 \over x^3}dx = 200 \text{[hrs]}$$
|
|
|
|
**Note:**
|
|
$$E[x^2] = \int\limits_{\infty}^{\infty}x^2f(x)dx$$
|
|
|
|
### Properties of Expectations
|
|
$$E(b) = b$$
|
|
Where $b$ is a constant
|
|
$$E(aX) = aE[X]$$
|
|
Where $a$ is a constant
|
|
$$E(aX + b) aE[X] + b$$
|
|
$$E[X + Y] = E[X] + E[Y]$$
|
|
Where $X$ and $Y$ are random variables
|
|
|
|
### Example
|
|
Given:
|
|
$$f(x) = \begin{cases}
|
|
{x^2\over3} & -1 < x < 2 \\
|
|
0 & \text{elsewhere}
|
|
\end{cases}$$
|
|
Find the expected value of $Y = 4X + 3$:
|
|
|
|
$$E[Y] = E[4X + 3] = 4E[X] + 3$$
|
|
$$E[X] = \int\limits_{-1}^{3} {X^3 \over 3}dx = {1\over12}X^4 \Big|_{-1}^{3}={5\over4}$$
|
|
|
|
### Variance of a Random Variable
|
|
The expected value/mean is of special importance because it describes where the probability distribution is centered. However, we also need to characterize the variance of the distribution.
|
|
|
|
### Definition
|
|
Let $X$ be a random variable with probability distribution, $f(x)$, and mean, $\mu$. The variance of $X$ is given by:
|
|
$$\text{Var}[X] = E[(X-\mu)^2]$$
|
|
Which is the average squared distance away from the mean. This simplifies to:
|
|
$$\text{Var}[X] = E[X^2] - E[X]^2$$
|
|
**Note:** Generally,
|
|
$$E[X^2] \ne E[X]^2$$
|
|
|
|
The standard deviation, $\sigma$, is given by:
|
|
$$\sigma = \sqrt{\text{Var}[X]}$$
|
|
|
|
**Note**: The variance is a measure of uncertainty (spread) in the data.
|
|
|
|
|
|
### Example
|
|
The weekly demand for a drinking water product in thousands of liters from a local chain of efficiency stores is a continuous random variable, $X$, having the probability density:
|
|
$$F(x) = \begin{cases}
|
|
2(x-1) & 1 < x < 2 \\
|
|
0 & \text{elsewhere}
|
|
\end{cases}$$
|
|
|
|
Find the expected value:
|
|
$$E[X] = \int\limits_1^2 x (2(x-1)) dx = 2\int\limits_1^2 (x^2 - x)dx$$
|
|
$$E[X] = 2\left[{1\over3}x^3 - {1\over2}x^2 \Big|_1^2 \right] = {5\over3}$$
|
|
|
|
Find the variance:
|
|
$$\text{Var}[X] = E[X^2] - E[X]^2$$
|
|
$$E[X^2] = \int\limits_1^2 2x^2(x-1)dx = 2\int\limits_1^2 (x^3 - x^2)dx$$
|
|
$$E[X^2] = {17\over6}$$
|
|
$$\text{Var}[X] = {17\over6} - \left({5\over3}\right)^2 = {1\over18}$$
|
|
|
|
Find the standard deviation:
|
|
$$\sigma = \sqrt{\text{Var}[X]} = {1\over3\sqrt{2}} = {\sqrt{2}\over6}$$
|
|
|
|
### Example
|
|
The mean and variance are useful when comparing two or more distributions.
|
|
|
|
| | Plan 1 | Plan 2
|
|
|-|-|-
|
|
|Avg Score Improvement | $+17$ | $+15$
|
|
|Standard deviation | $\pm8$ | $\pm2$
|
|
|
|
### Theorem
|
|
If $X$ has variance, $\text{Var}[X]$, then $\text{Var}[aX + b] = a^2\text{Var}[X]$.
|
|
|
|
### Example
|
|
The length of time, in minutes, for an airplane to obtain clearance at a certain airport is a random variable, $Y = 3X - 2$, where $X$ has the density:
|
|
$$F(x) = \begin{cases}
|
|
{1\over4} e^{x/4} & x > 0 \\
|
|
0 & \text{elsewhere}
|
|
\end{cases}$$
|
|
|
|
$$E[X] = 4$$
|
|
$$\text{Var}[X] = 16$$
|
|
|
|
Find $E[Y]$:
|
|
$$E[Y] = E[3X-2] = 3E[X] - 2 = 10$$
|
|
$$\text{Var}[Y] = 3^2\text{Var}[X] = 144$$
|
|
$$\sigma = \sqrt{\text{Var}[Y]} = 12$$
|
|
|
|
## The Exponential Distribution
|
|
The continuous random variable, $X$, has an exponential distribution with parameter $\beta$ if its density function is given by:
|
|
$$F(x) = \begin{cases}
|
|
{1\over\beta}e^{-x/\beta} & x > 0 \\
|
|
0 & \text{elsewhere}
|
|
\end{cases}$$
|
|
|
|
Where $\beta > 0$.
|
|
|
|
$$E[X] = \beta$$
|
|
$$E[X] = \int\limits_0^{\infty} x{1\over\beta}e^{-x/\beta} dx$$
|
|
|
|
Aside:
|
|
$$\Gamma(Z) = \int\limits_0^\infty x^{Z - 1}e^{-x}dx$$
|
|
Where $\Gamma(Z) = (Z - 1)!$
|
|
|
|
$$E[X] = \beta \int\limits_0^\infty \left({x\over\beta}\right)^{(2-1)} e^{-x/\beta} \left({dx\over\beta}\right) = \beta\Gamma(2)$$
|
|
$$E[X] = \beta(2-1)! = \beta$$
|
|
|
|
$$\text{Var}[X] = E[X^2] - E[X]^2$$
|
|
$$E[X^2] = \int\limits_0^\infty x^2{1\over\beta}e^{-x/\beta}dx = \beta^2 \int\limits_0^\infty \left({x\over\beta}\right)^{(2-1)} e^{-x/\beta} \left({dx\over\beta}\right)$$
|
|
$$E[X^2] = \beta^2\Gamma(3) = 2\beta^2$$
|
|
$$\text{Var}[X] = 2\beta^2 - \beta^2 = \beta^2$$
|
|
|
|
#### Application
|
|
Reliability analysis: the time to failure of a certain electronic component can be modeled by an exponential distribution.
|
|
|
|
### Example
|
|
Let $T$ be the random variable which measures the time to failure of a certain electronic component. Suppose $T$ has an exponential distribution with $\beta = 5$.
|
|
|
|
$$F(x) = \begin{cases}
|
|
{1\over5}e^{-x/5} & x > 0 \\
|
|
0 & \text{elsewhere}
|
|
\end{cases}$$
|
|
|
|
If 6 of these components are in use, what is the probability that exactly 3 components are still functioning at the end of 8 years?
|
|
|
|
What is the probability that an individual component is still functioning after 8 years?
|
|
|
|
$$P(T > 8) = \int\limits_8^\infty {1\over5}e^{-x/5}dx \approx 0.2$$
|
|
|
|
$${6 \choose 3}(0.2)^3(0.8)^3 = 0.08192$$
|
|
|
|
```python
|
|
>>> from math import comb
|
|
>>> comb(6,3) * 0.2**3 * 0.8**3
|
|
0.08192000000000003
|
|
```
|
|
|
|
## The Normal Distribution
|
|
The most important continuous probability distribution in the field of statistics is the normal distribution. It is characterized by 2 parameters, the mean, $\mu$, and the variance, $\sigma^2$.
|
|
$$\text{mean} = \text{median} = \text{mode}$$
|
|
|
|
$$F(x|\mu,\sigma^2) = {1 \over \sqrt{2\pi} \sigma^2} e^{\left({1 \over 2\sigma^2}(x-\mu)^2\right)}$$
|
|
$$E[X] = \mu$$
|
|
$$\text{Var}[X] = \sigma^2$$
|
|
|
|
For a normal curve:
|
|
$$P(x_1 < x < x_2) = \int\limits_{x_1}^{x_2} F(x)dx$$
|
|
|
|
### Definition
|
|
The distribution of a normal variable with mean 0 and variance 1 is called a standard normal distribution.
|
|
|
|
The transformation of any random variable, $X$ into a standard normal variable, $Z$:
|
|
$$Z = {X - \mu \over \sigma}$$
|
|
|
|
### Example
|
|
Given a normal distribution with mean $\mu = 30$ and standard deviation, $\sigma = 6$, find the normal curve area to the right of $x = 17$.
|
|
|
|
Transform to standard normal.
|
|
$$Z = {17 - 30 \over 6} = -2.16$$
|
|
|
|
That is, $x = 17$ on a normal distribution with $\mu = 30$ and $\sigma = 6$ is equivalent to $Z=-2.16$ on a normal distribution with $\mu = 0$ and $\sigma = 1$.
|
|
|
|
$$P(X > 17) = P(Z > -2.16)$$
|
|
|
|
$$P(Z > -2.16) = 1 -P(Z \le -2.16) = 0.9846$$
|
|
|
|
```python
|
|
>>> from scipy.stats import norm
|
|
>>> norm.cdf(-2.16)
|
|
0.015386334783925445
|
|
```
|
|
|
|
### Example
|
|
The finished inside diameter of a piston ring is normally distributed with mean, $\mu = 10$[cm], and standard deviation, $\sigma = 0.03$[cm].
|
|
|
|
What is the probability that a piston ring will have inside diameter between 9.97[cm] and 10.03[cm]?
|
|
|
|
$$Z_1 = {9.97 - 10 \over 0.03} = -1$$
|
|
$$Z_2 = {10.03 - 10 \over 3} = 1$$
|
|
$$P(9.97 < x < 10.03) = 0.68$$
|
|
|
|
```python
|
|
>>> from scipy.stats import norm
|
|
>>> norm.cdf(1) - norm.cdf(-1)
|
|
0.6826894921370859
|
|
```
|