class: center, middle, inverse, title-slide # Margin of Error ## Chapter 5 ### Colin Gillespie (
@csgillespie
) --- layout: true background-image: url(assets/white_logo.png) <div class="jr-header-inverse"> <img class="logo" src="assets/white_logo_full.png"/> <span class="social"><table><tr><td><img src="assets/twitter.gif"/></td><td> @jumping_uk</td></tr></table></span> </div> <div class="jr-footer-inverse"><span>© 2019 Jumping Rivers (jumpingrivers.com)</span><div></div></div> --- class: inverse, center, middle ### Testing leads to failure, and failure leads to understanding ### __Burt Rutan__ --- layout: true <div class="jr-header"> <img class="logo" src="assets/white_logo_full.png"/> <span class="social"><table><tr><td><img src="assets/twitter.gif"/></td><td> @jumping_uk</td></tr></table></span> </div> <div class="jr-footer"><span>© 2019 Jumping Rivers (jumpingrivers.com)</span><div></div></div> --- # Big picture > By assuming an underlying normal distribution we can use information from a sample to inform us about the population This sounds like a big assumption, but it's not too bad --- # Introduction & motativating example We want to compare two advert designs - At great expense, it has been decided to change the font to Comic Sans - Does this change work? -- * Being a (data) scientist we decide to (humanely) experiment on people by randomly showing them the advert -- * Past experience, you know that customers spent 45 seconds (on average) on your site. -- * After switching to comic sans, we recorded the amount of time spent on the site by 20 customers ``` 34 51 30 79 54 31 57 62 59 41 77 55 35 3 69 46 47 66 63 59 ``` Should we consider switching to comic sans? --- # Introduction & motativating example * Clearly time will vary visit–by–visit * The average time `$$\bar x = \frac{34 + 51 + 30 + \ldots + 59}{20} = 50.9$$` -- * The new website does seem to be perform slightly better (compared to 45) * But we have a very small sample -- * How to account for variability? - Hypothesis test --- # One sample test * One–sample z–test * Compare the mean of a set of sample observations to a target value -- * The mean in our sample is denoted by `\(\bar x\)` * The population mean is denoted as `\(\mu\)` (pronounced "mu") * `\(\bar x\)` is our sample _estimate_ of `\(\mu\)` --- # Hypothesis testing Null hypothesis `$$H_0: \mu = 45$$` -- We usually test against a general alternative hypothesis `\(H_1\)` `$$H_1: \mu \ne 45$$` which says `\(\mu\)` is not equal to 45 -- * Null hypthesis is always the dull/boring one --- # Hypothesis testing * When performing the hypothesis test, we _assume_ `\(H_0\)` to be true * We then ask ourselves the question > How likely is it that we would observe the data we have, or > indeed anything more extreme than this, if the null hypothesis is true? --- # Hypothesis testing * Use the [Central Limit Theorem](https://en.wikipedia.org/wiki/Central_limit_theorem) * Although we will not go into the details here, this result tells us that the quantity `$$Z = \frac{\bar x - \mu}{s/\sqrt{n}}$$` follows a normal distribution (when `\(n\)` is reasonably large) * `\(\bar x\)` is our sample mean * `\(\mu\)` is the assumed value of the population mean under the null hypothesis `\(H_0\)` * `\(s\)` is the sample standard deviation * `\(n\)` is the sample size --- # Hypothesis testing If the null hypothesis is true, then `\(\mu = 45\)`, so `$$Z = \frac{\bar x - \mu}{s/\sqrt{n}} = \frac{50.9 - 45}{18.2/\sqrt{20}} = 1.45$$` How likely is it to have observed this value? --- # How likely is this value? <img src="chapter5_files/figure-html/unnamed-chunk-1-1.svg" width="80%" style="display: block; margin: auto;" /> --- # How likely is this value? * Since the normal distribution is symmetric, `\(Z = 1.45\)` is just as extreme as `\(Z = −1.45\)` * The shaded region in the following diagram illustrates the `\(p\)`–value * The probability of observing the data we have * The closer the area of the shaded region (the `\(p\)`–value) is to 0 - The less plausible it is, the more evidence we have to reject `\(H_0\)` --- # How likely is this value? So, we need to work out the area of the shaded region under the curve in the diagram above, which can be done using R ```r pnorm(1.45, lower.tail = FALSE) * 2 #> [1] 0.147 ``` So the `\(p\)`-value is 0.15. --- # How likely is this value? Earlier, we said that the smaller this `\(p\)`–value is, the more evidence we have to reject `\(H_0\)`. The question now, is: > What constitutes a p–value small enough to reject `\(H_0\)`? The convention (but by no means a hard–and–fast cut–off) is to reject `\(H_0\)` if the p–value is smaller than 5%. Thus, here we would say: -- * Our p–value is greater than 5% (in fact, it’s larger than 10% – a computer can tell us that it’s exactly 14.7%) * Thus, we do not reject `\(H_0\)` * There is insufficient evidence to suggest a real deviation from the previous value --- # How likely is this value? > Absence of evidence is not evidence of absence --- # Example R ```r comic = c(34, 51, 30, 79, 54, 31, 57, 62, 59, 41, 77, 55, 35, 3, 69, 46, 47, 66, 63, 59) t.test(comic, mu = 45) #> #> One Sample t-test #> #> data: comic #> t = 1, df = 20, p-value = 0.2 #> alternative hypothesis: true mean is not equal to 45 #> 95 percent confidence interval: #> 42.4 59.4 #> sample estimates: #> mean of x #> 50.9 ``` --- # Example: OKCupid * The OKCupid dataset provides heights of their users * How consistent are the heights given by users with the average height across the USA? * From the [CDC](https://www.cdc.gov/nchs/data/series/sr_11/sr11_252.pdf) paper we discover the average height in the USA is 69.3 inches --- # Example: OKCupid ```r ## Select Males height = cupid$height[cupid$sex == "m"] ## Remove missing values height = height[!is.na(height)] mean(height) #> [1] 70.4 ``` --- # Example: OKCupid ```r t.test(height, mu = 69.3) #> #> One Sample t-test #> #> data: height #> t = 70, df = 40000, p-value <2e-16 #> alternative hypothesis: true mean is not equal to 69.3 #> 95 percent confidence interval: #> 70.4 70.5 #> sample estimates: #> mean of x #> 70.4 ``` --- # Exercise * From the CDC, the average female height is 63.8 inches * Test whether females are taller in the OKcupid sample --- # Errors <img src="graphics/type1and2.jpg" width="80%" style="display: block; margin: auto;" /> --- ## Two sample z-test * Suppose we want to test another improvement to our website * We think that adding a [blink](https://en.wikipedia.org/wiki/Blink_element) tag would be a good way of attracting customers. * Monitoring the first twenty customers we get ``` 21 32 46 19 29 31 37 28 50 29 34 40 26 20 48 7 39 30 40 34 ``` How do we compare the website that uses the Comic Sans font to the blinking site? We use a two sampled z-test! --- # Two sample z-test `$$H_0: \mu_1 = \mu_2$$` While the alternative hypothesis is that the two pages differ, i.e. `$$H_1: \mu_1 \ne \mu_2.$$` -- The corresponding test statistic is `$$Z = \frac{\bar x_1 - \bar x_2}{s \sqrt{1/n_1 + 1/n_2}}.$$` --- # Two sample z-test ```r blink = c(21, 32, 46, 19, 29, 31, 37, 28, 50, 29, 34, 40, 26, 20, 48, 7, 39, 30, 40, 34) t.test(comic, blink, var.equal = TRUE) #> #> Two Sample t-test #> #> data: comic and blink #> t = 4, df = 40, p-value = 3e-04 #> alternative hypothesis: true difference in means is not equal to 0 #> 95 percent confidence interval: #> 9.37 28.43 #> sample estimates: #> mean of x mean of y #> 50.9 32.0 ``` In this example, since the p-value is relatively, we can conclude that the two web-designs do appear to be different. --- # Confidence intervals * When we get an answer, we don't just want a point estimate, i.e. a single number - we want a plausible range * Confidence intervals provide an alternative to hypothesis tests for assessing questions about the population mean (or population means in two sample problems) * Recall that the sample mean `\(\bar x\)` is an estimate of the population mean `\(\mu\)` * The problem is, if we were to take many samples from the population, and so calculate many `\(\bar x\)`'s, they are all likely to be different to each other. Which one would we trust the most? -- Central to the idea of margin of error, is the [_central limit theorem_](https://en.wikipedia.org/wiki/Central_limit_theorem). --- # Construction 1. Find the mean in our sample, `\(\bar x\)` 1. Subtract some amount from `\(\bar x\)` to obtain the _lower bound_ of our confidence interval 1. Add the same amount in (2) to our sample mean `\(\bar x\)` to obtain the _upper bound_ of our confidence interval --- # Formula `$$\left(\bar{x}-z \times \frac{s}{\sqrt{n}}, \hspace{0.5cm} \bar{x}+z\times \frac{s}{\sqrt{n}}\right),$$` often condensed to just `$$\bar{x} \pm z \times \frac{s}{\sqrt{n}}$$` where `\(z\)` is a critical value from the standard normal distribution. -- For the standard interval 95% confidence interval, the `\(z\)` value is 1.96, often rounded to 2. So the interval becomes `$$\bar{x} \pm \frac{2 s}{\sqrt{n}}$$` If we wanted a 90% interval, we would use `\(z = 1.645\)`. For a 99% interval, we would use `\(z = 2.576\)` --- # Example: Comic Sans Let's return to our Comics Sans example. The average time spent on the site was `\(\bar x = 50.9\)` with a standard deviation of `\(s = 18.2\)`. This gives a 95% confidence interval of `$$50.9 \pm 1.96 \frac{18.2}{\sqrt{20}} = (42.92, 58.88)$$` --- # Example: Comic Sans Alternatively, we could use R and extract the confidence interval from ```r t.test(comic) #> #> One Sample t-test #> #> data: comic #> t = 10, df = 20, p-value = 1e-10 #> alternative hypothesis: true mean is not equal to 0 #> 95 percent confidence interval: #> 42.4 59.4 #> sample estimates: #> mean of x #> 50.9 ``` to get the interval `\((42.38,59.42)\)`. Notice this interval is slightly wider, since it's using the exact `\(t\)`-distribution. --- # Summary The region `$$\bar x \pm 2 s$$` contains approximately 95% of the data -- The region `$$\bar x \pm 2 s/\sqrt(n)$$` is a 95% confidence interval for the mean --- # Summary ```r s = sd(cupid$age) n = length(cupid$age) m = mean(cupid$age) ``` ```r # Approx 95% of the data c(m - 2 * s, m + 2 * s) #> [1] 13.4 51.2 ``` ```r # A confidence interval around the mean c(m - 2 * s/sqrt(n), m + 2 * s/sqrt(n)) #> [1] 32.3 32.4 ``` --- layout: true background-image: url(assets/white_logo.png) <div class="jr-header-inverse"> <img class="logo" src="assets/white_logo_full.png"/> <span class="social"><table><tr><td><img src="assets/twitter.gif"/></td><td> @jumping_uk</td></tr></table></span> </div> <div class="jr-footer-inverse"><span>© 2019 Jumping Rivers (jumpingrivers.com)</span><div></div></div> --- class: inverse, center, middle # Break time