class: center, middle, inverse, title-slide # Distributions: what & why ## Chapter 4 ### Colin Gillespie (
@csgillespie
) --- layout:true --- background-image: url(graphics/doll.jpg) background-size: cover # About me .pull-right[ * Academic: * Senior [Statistics](http://www.mas.ncl.ac.uk/~ncsg3/) lecturer, [Newcastle University](https://en.wikipedia.org/wiki/Newcastle_University), UK * Consultant at [Jumping Rivers](https://jumpingrivers.com) * Data science training & consultancy * R, [Stan](https://www.jumpingrivers.com/courses/13_introductions-to-bayesian-inference-using-rstan), Scala * Algorithm development and validation * Public courses and in-house training * [Efficient R programming](http://shop.oreilly.com/product/0636920047995.do), O'Reilly ] --- layout: true <div class="jr-header"> <img class="logo" src="assets/white_logo_full.png"/> <span class="social"><table><tr><td><img src="assets/twitter.gif"/></td><td> @jumping_uk</td></tr></table></span> </div> <div class="jr-footer"><span>© 2019 Jumping Rivers (jumpingrivers.com)</span><div></div></div> --- # Probability Distribution * A distribution is a mathematical function that describes probability * Probability _mass_ function: discrete -- * Probability _density_ function: continuous <img src="chapter4_files/figure-html/unnamed-chunk-1-1.svg" width="80%" style="display: block; margin: auto;" /> --- # Probability _Density_ function Properties * Strictly positive * Area under the curve sums to 1 --- # The Uniform Distribution * Values from 0 to 1 are equally likely * Values less than 0 or greater than one, are impossible <img src="chapter4_files/figure-html/unnamed-chunk-2-1.svg" width="80%" style="display: block; margin: auto;" /> --- # The Uniform Distribution Area under the curve sums to 1 <img src="chapter4_files/figure-html/unnamed-chunk-3-1.svg" width="80%" style="display: block; margin: auto;" /> --- # The Uniform Distribution * We don't tend to use the Uniform directly, __but__ * it's fundemental to everything we do * If you've ever simulated __anything__ you have been using the uniform distribution * Behind every random number, is a uniform --- # Aside: Random numbers * Computers generate _pseudo random numbers_ * Pretend random numbers generated using an algorithm * If there's an algorithm, they're not really random * They just _appear_ random --- # Algorithms A basic algorithm is the [Linear congruential](https://en.wikipedia.org/wiki/Linear_congruential_generator) generator `$$r_i = a \times r_{i-1} + c \mod m$$` where `\(a\)`, `\(c\)` and `\(m\)` are integers -- For example, if `\(a = 7\)`, `\(c=0\)`, `\(m = 5\)` and `\(r_0 = 1\)`, then `$$r_1 = 7 \times 1 \mod 5 = 2$$` -- `$$r_2 = 7 \times r_1 \mod 5 = 7 \times 2 \mod 5 = 4$$` -- $$r_3 = 7 \times r_2 \mod 5 =... $$ --- # Hard choosing initial conditions * [RANDU](https://en.wikipedia.org/wiki/RANDU): `\(a = 65539\)`, `\(c = 0\)`, `\(m = 2^{31}\)` <img src="graphics/randu.png" width="100%" style="display: block; margin: auto;" /> --- # Random number generators * Never write your own - it's hard! * For serial applications, use [Merseene Twister](https://en.wikipedia.org/wiki/Mersenne_Twister) - Be careful how you _seed_ the application * Parallel variants --- # Exercise ```r # n = number of random numbers to generate # r0, a, b, m should be positive integers lcg = function(r0, a, c, m, n= 100) { rngs = numeric(n) rngs[1] = r0 for(i in 2:n) { rngs[i] = (a * rngs[i-1] + c) %% m } return(rngs) } rngs = lcg(5, a = 5, c = 7, m = 29) # type = "l" - the letter "l" plot(rngs, type="l") ``` The above function generates a stream of random numbers. Try different values of a, c and m. --- layout: true background-image: url(assets/white_logo.png) <div class="jr-header-inverse"> <img class="logo" src="assets/white_logo_full.png"/> <span class="social"><table><tr><td><img src="assets/twitter.gif"/></td><td> @jumping_uk</td></tr></table></span> </div> <div class="jr-footer-inverse"><span>© 2019 Jumping Rivers (jumpingrivers.com)</span><div></div></div> --- class: inverse, center, middle # Back to business --- layout: true <div class="jr-header"> <img class="logo" src="assets/white_logo_full.png"/> <span class="social"><table><tr><td><img src="assets/twitter.gif"/></td><td> @jumping_uk</td></tr></table></span> </div> <div class="jr-footer"><span>© 2019 Jumping Rivers (jumpingrivers.com)</span><div></div></div> --- # The Bernoulli distribution * Discrete [probability](https://en.wikipedia.org/wiki/Bernoulli_distribution) distribution, .e.g. Tossing a coin - Heads probability 0.5, tails probability 0.5 - During his interment, [John Kerrich](https://en.wikipedia.org/wiki/John_Edmund_Kerrich) tossed a coin 10,000 times -- * Tails represented as 0, Heads as 1 <img src="chapter4_files/figure-html/unnamed-chunk-6-1.svg" width="60%" style="display: block; margin: auto;" /> --- # The Binomial distribution * The Binomial distribution concerns __sums__ of Bernoulli random variables -- * E.g. we toss a coin `\(n=5\)` times and get: 0, 1, 0,1, 0 so `\(x =2\)` * E.g. we toss a coin `\(n=5\)` times and get: 1, 1, 0, 1, 0 so `\(x =3\)` * E.g. we toss a coin `\(n=5\)` times and get: 0, 0, 1, 0, 0 so `\(x =1\)` -- * What's the probability of observing * No heads in five throws * One head in five throws * Two heads in five throws * ... --- # The Binomial distribution <img src="chapter4_files/figure-html/unnamed-chunk-7-1.svg" width="100%" style="display: block; margin: auto;" /> --- # The Binomial distribution <img src="chapter4_files/figure-html/unnamed-chunk-8-1.svg" width="100%" style="display: block; margin: auto;" /> --- # The Binomial distribution <img src="chapter4_files/figure-html/unnamed-chunk-9-1.svg" width="100%" style="display: block; margin: auto;" /> --- # The Binomial distribution <img src="chapter4_files/figure-html/unnamed-chunk-10-1.svg" width="100%" style="display: block; margin: auto;" /> --- # Limiting > As `\(n\)` increases, we approach a limit. > > What's the limit? --- # The normal/Guassian distribution <img src="chapter4_files/figure-html/unnamed-chunk-11-1.svg" width="100%" style="display: block; margin: auto;" /> --- # The normal distribution * The normal/Guassian distribution is the most famous distribution * It has two parameters * mean & variance * Remember all those nice mathematical properties? -- * Symmetric about the mean * The standard normal is when the mean is 0 and variance equals to 1 --- # The standard normal <img src="chapter4_files/figure-html/unnamed-chunk-12-1.svg" width="100%" style="display: block; margin: auto;" /> --- # The standard normal <img src="chapter4_files/figure-html/unnamed-chunk-13-1.svg" width="100%" style="display: block; margin: auto;" /> --- # The standard normal <img src="chapter4_files/figure-html/unnamed-chunk-14-1.svg" width="100%" style="display: block; margin: auto;" /> --- # The standard normal <img src="chapter4_files/figure-html/unnamed-chunk-15-1.svg" width="100%" style="display: block; margin: auto;" /> --- # The standard normal Mean `\(\pm\)` 2 standard deviations <img src="chapter4_files/figure-html/unnamed-chunk-16-1.svg" width="100%" style="display: block; margin: auto;" /> --- layout: true background-image: url(assets/white_logo.png) <div class="jr-header-inverse"> <img class="logo" src="assets/white_logo_full.png"/> <span class="social"><table><tr><td><img src="assets/twitter.gif"/></td><td> @jumping_uk</td></tr></table></span> </div> <div class="jr-footer-inverse"><span>© 2019 Jumping Rivers (jumpingrivers.com)</span><div></div></div> --- class: inverse, center, middle # Break time