# Chapter 4 ## Expected Value ### Definition Let $X$ be a random variable with probability distribution $f(x)$. The mean, or expected value, of $X$ is: For a discrete distribution $$E[X] = \sum\limits_x xf(x)$$ For a continuous distribution: $$E[X] = \int\limits_{-\infty}^{\infty} xf(x)dx$$ Given $\{1, 2, 3, 3, 5\}$, the mean is: $${1+2+3+3+5 \over 5} = 2.8$$ $$f(x) = \begin{cases} {1\over5} & x=1 \\ {1\over5} & x=2 \\ {2\over5} & x=3 \\ {1\over5} & x=5 \\ \end{cases}$$ $$\sum\limits_x xf(x) = {1\over5}(1) + {1\over5}(2) + {1\over5}(3) + {1\over5}(5) = 2.8$$ ### Example The probability distribution of a discrete random variable $X$ is: $$f(x) = {3 \choose x}\left({1 \over 4}\right)^x\left({3\over4}\right)^{3-x}, x \in \{0, 1, 2, 3\}$$ Find $E[X]$: $$f(x) = \begin{cases} 0 & x=0 \\ 0.422 & x=1 \\ 0.14 & x=2 \\ {1\over64} & x=3 \end{cases}$$ $$E[X] = \sum\limits_x x {3 \choose x}\left({1\over4}\right)^x \left({3\over4}\right)^{3-x}$$ $$E[X] = 0(0)+ 0.422(1) + 0.14(2) + {1\over64}(3) = 0.75$$ ### Example Let $X$ be the random variable that denotes the life in hours of a certain electronic device. The PDF is: $$f(x) = \begin{cases} {20000\over x^3} & x > 100 \\ 0 & elsewhere \end{cases}$$ Find the expected life of this type of device: $$E[X] = \int\limits_{-\infty}^{\infty} xf(x)dx = \int\limits_{100}^{\infty}x{20000 \over x^3}dx = 200 \text{[hrs]}$$ **Note:** $$E[x^2] = \int\limits_{\infty}^{\infty}x^2f(x)dx$$ ### Properties of Expectations $$E(b) = b$$ Where $b$ is a constant $$E(aX) = aE[X]$$ Where $a$ is a constant $$E(aX + b) aE[X] + b$$ $$E[X + Y] = E[X] + E[Y]$$ Where $X$ and $Y$ are random variables ### Example Given: $$f(x) = \begin{cases} {x^2\over3} & -1 < x < 2 \\ 0 & \text{elsewhere} \end{cases}$$ Find the expected value of $Y = 4X + 3$: $$E[Y] = E[4X + 3] = 4E[X] + 3$$ $$E[X] = \int\limits_{-1}^{3} {X^3 \over 3}dx = {1\over12}X^4 \Big|_{-1}^{3}={5\over4}$$ ### Variance of a Random Variable The expected value/mean is of special importance because it describes where the probability distribution is centered. However, we also need to characterize the variance of the distribution. ### Definition Let $X$ be a random variable with probability distribution, $f(x)$, and mean, $\mu$. The variance of $X$ is given by: $$\text{Var}[X] = E[(X-\mu)^2]$$ Which is the average squared distance away from the mean. This simplifies to: $$\text{Var}[X] = E[X^2] - E[X]^2$$ **Note:** Generally, $$E[X^2] \ne E[X]^2$$ The standard deviation, $\sigma$, is given by: $$\sigma = \sqrt{\text{Var}[X]}$$ **Note**: The variance is a measure of uncertainty (spread) in the data. ### Example The weekly demand for a drinking water product in thousands of liters from a local chain of efficiency stores is a continuous random variable, $X$, having the probability density: $$F(x) = \begin{cases} 2(x-1) & 1 < x < 2 \\ 0 & \text{elsewhere} \end{cases}$$ Find the expected value: $$E[X] = \int\limits_1^2 x (2(x-1)) dx = 2\int\limits_1^2 (x^2 - x)dx$$ $$E[X] = 2\left[{1\over3}x^3 - {1\over2}x^2 \Big|_1^2 \right] = {5\over3}$$ Find the variance: $$\text{Var}[X] = E[X^2] - E[X]^2$$ $$E[X^2] = \int\limits_1^2 2x^2(x-1)dx = 2\int\limits_1^2 (x^3 - x^2)dx$$ $$E[X^2] = {17\over6}$$ $$\text{Var}[X] = {17\over6} - \left({5\over3}\right)^2 = {1\over18}$$ Find the standard deviation: $$\sigma = \sqrt{\text{Var}[X]} = {1\over3\sqrt{2}} = {\sqrt{2}\over6}$$ ### Example The mean and variance are useful when comparing two or more distributions. | | Plan 1 | Plan 2 |-|-|- |Avg Score Improvement | $+17$ | $+15$ |Standard deviation | $\pm8$ | $\pm2$ ### Theorem If $X$ has variance, $\text{Var}[X]$, then $\text{Var}[aX + b] = a^2\text{Var}[X]$. ### Example The length of time, in minutes, for an airplane to obtain clearance at a certain airport is a random variable, $Y = 3X - 2$, where $X$ has the density: $$F(x) = \begin{cases} {1\over4} e^{x/4} & x > 0 \\ 0 & \text{elsewhere} \end{cases}$$ $$E[X] = 4$$ $$\text{Var}[X] = 16$$ Find $E[Y]$: $$E[Y] = E[3X-2] = 3E[X] - 2 = 10$$ $$\text{Var}[Y] = 3^2\text{Var}[X] = 144$$ $$\sigma = \sqrt{\text{Var}[Y]} = 12$$ ## The Exponential Distribution The continuous random variable, $X$, has an exponential distribution with parameter $\beta$ if its density function is given by: $$F(x) = \begin{cases} {1\over\beta}e^{-x/\beta} & x > 0 \\ 0 & \text{elsewhere} \end{cases}$$ Where $\beta > 0$. $$E[X] = \beta$$ $$E[X] = \int\limits_0^{\infty} x{1\over\beta}e^{-x/\beta} dx$$ Aside: $$\Gamma(Z) = \int\limits_0^\infty x^{Z - 1}e^{-x}dx$$ Where $\Gamma(Z) = (Z - 1)!$ $$E[X] = \beta \int\limits_0^\infty \left({x\over\beta}\right)^{(2-1)} e^{-x/\beta} \left({dx\over\beta}\right) = \beta\Gamma(2)$$ $$E[X] = \beta(2-1)! = \beta$$ $$\text{Var}[X] = E[X^2] - E[X]^2$$ $$E[X^2] = \int\limits_0^\infty x^2{1\over\beta}e^{-x/\beta}dx = \beta^2 \int\limits_0^\infty \left({x\over\beta}\right)^{(2-1)} e^{-x/\beta} \left({dx\over\beta}\right)$$ $$E[X^2] = \beta^2\Gamma(3) = 2\beta^2$$ $$\text{Var}[X] = 2\beta^2 - \beta^2 = \beta^2$$ #### Application Reliability analysis: the time to failure of a certain electronic component can be modeled by an exponential distribution. ### Example Let $T$ be the random variable which measures the time to failure of a certain electronic component. Suppose $T$ has an exponential distribution with $\beta = 5$. $$F(x) = \begin{cases} {1\over5}e^{-x/5} & x > 0 \\ 0 & \text{elsewhere} \end{cases}$$ If 6 of these components are in use, what is the probability that exactly 3 components are still functioning at the end of 8 years? What is the probability that an individual component is still functioning after 8 years? $$P(T > 8) = \int\limits_8^\infty {1\over5}e^{-x/5}dx \approx 0.2$$ $${6 \choose 3}(0.2)^3(0.8)^3 = 0.08192$$ ```python >>> from math import comb >>> comb(6,3) * 0.2**3 * 0.8**3 0.08192000000000003 ``` ## The Normal Distribution The most important continuous probability distribution in the field of statistics is the normal distribution. It is characterized by 2 parameters, the mean, $\mu$, and the variance, $\sigma^2$. $$\text{mean} = \text{median} = \text{mode}$$ $$F(x|\mu,\sigma^2) = {1 \over \sqrt{2\pi} \sigma^2} e^{\left({1 \over 2\sigma^2}(x-\mu)^2\right)}$$ $$E[X] = \mu$$ $$\text{Var}[X] = \sigma^2$$ For a normal curve: $$P(x_1 < x < x_2) = \int\limits_{x_1}^{x_2} F(x)dx$$ ### Definition The distribution of a normal variable with mean 0 and variance 1 is called a standard normal distribution. The transformation of any random variable, $X$ into a standard normal variable, $Z$: $$Z = {X - \mu \over \sigma}$$ ### Example Given a normal distribution with mean $\mu = 30$ and standard deviation, $\sigma = 6$, find the normal curve area to the right of $x = 17$. Transform to standard normal. $$Z = {17 - 30 \over 6} = -2.16$$ That is, $x = 17$ on a normal distribution with $\mu = 30$ and $\sigma = 6$ is equivalent to $Z=-2.16$ on a normal distribution with $\mu = 0$ and $\sigma = 1$. $$P(X > 17) = P(Z > -2.16)$$ $$P(Z > -2.16) = 1 -P(Z \le -2.16) = 0.9846$$ ```python >>> from scipy.stats import norm >>> norm.cdf(-2.16) 0.015386334783925445 ``` ### Example The finished inside diameter of a piston ring is normally distributed with mean, $\mu = 10$[cm], and standard deviation, $\sigma = 0.03$[cm]. What is the probability that a piston ring will have inside diameter between 9.97[cm] and 10.03[cm]? $$Z_1 = {9.97 - 10 \over 0.03} = -1$$ $$Z_2 = {10.03 - 10 \over 3} = 1$$ $$P(9.97 < x < 10.03) = 0.68$$ ```python >>> from scipy.stats import norm >>> norm.cdf(1) - norm.cdf(-1) 0.6826894921370859 ```