{"id":233,"date":"2018-12-09T05:09:38","date_gmt":"2018-12-09T05:09:38","guid":{"rendered":"http:\/\/statisticalbiophysicsblog.org\/?p=233"},"modified":"2020-04-26T15:32:17","modified_gmt":"2020-04-26T15:32:17","slug":"absolutely-the-simplest-introduction-to-bayesian-statistics","status":"publish","type":"post","link":"https:\/\/statisticalbiophysicsblog.org\/?p=233","title":{"rendered":"Absolutely the simplest introduction to Bayesian statistics"},"content":{"rendered":"<p>I realized that I owe you something.  In a <a href=\"https:\/\/statisticalbiophysicsblog.org\/?p=213\">prior post<\/a>, I invoked some Bayesian ideas to contrast with boostrapping analysis of high-variance data.  (More precisely, it was high <em>log-<\/em>variance data for which there was a problem, as described in <a href=\"https:\/\/arxiv.org\/abs\/1806.01998\">our preprint<\/a>.)  But the Bayesian discussion in my earlier post was pretty quick.  Although there are a number of good, brief introductions to Bayesian statistics, many get quite technical.<\/p>\n<p>Here, I\u2019d like to introduce Bayesian thinking in absolutely the simplest way possible.  We want to understand the point of it, and get a better grip on those mysterious priors.<\/p>\n<p><!--more--><\/p>\n<p>Let\u2019s look at the classic example of coin flipping.  Say we flipped a coin three times and got the sequence HHT (heads-heads-tails).  Was our coin fair?  Was the probability of heads, <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-ca508916eb515722deba44f70908ffef_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;&#40;&#72;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"18\" width=\"39\" style=\"vertical-align: -4px;\"\/>, exactly half?  And how confident can we be about our answer?<\/p>\n<p>First notice that the question, \u201cWas our coin fair?,\u201d inquires about the underlying system itself, not the data we got.  This is a key point \u2026<\/p>\n<p><strong>Key Observation 1: <\/strong>Bayesian analysis attempts to characterize the underlying model or distribution, rather than characterizing the observed data.  Of course, the data is used to characterize the model.<\/p>\n<p>If our Bayesian analysis really can tell us about <em>all<\/em> possible <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-ca508916eb515722deba44f70908ffef_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;&#40;&#72;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"18\" width=\"39\" style=\"vertical-align: -4px;\"\/> values, then we can assess whether we should believe the value 1\/2 or 2\/3 or whether we really can\u2019t tell the difference.  To start on this process, let\u2019s define the probability of heads by the symbol <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-1092cee9df034dd61ca299efdadbd8a1_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;&#32;&#61;&#32;&#92;&#109;&#97;&#116;&#104;&#114;&#109;&#123;&#80;&#114;&#111;&#98;&#125;&#40;&#72;&#41;&#32;&#92;&#101;&#113;&#117;&#105;&#118;&#32;&#112;&#40;&#72;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"18\" width=\"161\" style=\"vertical-align: -4px;\"\/>, then <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-ac7da57d7f507262338bb5168feb3e06_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"8\" style=\"vertical-align: -4px;\"\/> represents the underlying distribution, which we\u2019ll call the \u201cmodel\u201d.<\/p>\n<p>The goal of Bayesian analysis is to estimate the <em>conditional<\/em> probability of a model (a <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-ac7da57d7f507262338bb5168feb3e06_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"8\" style=\"vertical-align: -4px;\"\/> value) <em>given<\/em> the particular data (HHT) that was obtained, denoted by <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-3bf85f1087e9fbed3a319341134ac1a2_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"10\" style=\"vertical-align: -4px;\"\/>(model | data) \u2026 and possibly incorporating prior information about the model.  The function <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-3bf85f1087e9fbed3a319341134ac1a2_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"10\" style=\"vertical-align: -4px;\"\/>(model | data) is called the <em>posterior distribution<\/em> because it is obtained after the data.<\/p>\n<p>In our case, the posterior distribution we want is <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-12c609807cd76a756901880c0dfd3b95_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;&#40;&#113;&#124;&#72;&#72;&#84;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"18\" width=\"81\" style=\"vertical-align: -4px;\"\/>, an estimate of relative probabilities for different <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-ac7da57d7f507262338bb5168feb3e06_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"8\" style=\"vertical-align: -4px;\"\/> values given the sequence of flips HHT.  If we know the probability of each <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-ac7da57d7f507262338bb5168feb3e06_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"8\" style=\"vertical-align: -4px;\"\/>, not only can we examine the special value <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-4e07e24cfc45a4ecf497cb6d24d4cbbb_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;&#61;&#49;&#47;&#50;\" title=\"Rendered by QuickLaTeX.com\" height=\"18\" width=\"58\" style=\"vertical-align: -5px;\"\/> but, more importantly, we can see how <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-4e07e24cfc45a4ecf497cb6d24d4cbbb_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;&#61;&#49;&#47;&#50;\" title=\"Rendered by QuickLaTeX.com\" height=\"18\" width=\"58\" style=\"vertical-align: -5px;\"\/> compares to <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-9cab43929f024663f64b5da531ea71df_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;&#61;&#50;&#47;&#51;\" title=\"Rendered by QuickLaTeX.com\" height=\"18\" width=\"59\" style=\"vertical-align: -5px;\"\/> or any other value of interest.<\/p>\n<p>Bayesian analysis can be seen a simple application (though confusing to the beginner!) of elementary probability theory, so let\u2019s start with a common sense approach.  One thing that we can obviously do is <em>assume<\/em> a model (a <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-ac7da57d7f507262338bb5168feb3e06_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"8\" style=\"vertical-align: -4px;\"\/> value) and then calculate the probability of our data (HHT) in that model.  To do this we don\u2019t even need to assign a numerical <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-ac7da57d7f507262338bb5168feb3e06_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"8\" style=\"vertical-align: -4px;\"\/> value.  We can use elementary probability theory: the product of a series of events is simply the product of the individual probabilities.  Thus, we have <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-3ecbab71445a2a6240a7a3e1250fa863_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;&#40;&#72;&#72;&#84;&#124;&#113;&#41;&#32;&#61;&#32;&#113;&#94;&#50;&#32;&#40;&#49;&#45;&#113;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"174\" style=\"vertical-align: -4px;\"\/>, which is true for <em>any<\/em> <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-ac7da57d7f507262338bb5168feb3e06_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"8\" style=\"vertical-align: -4px;\"\/>.  In other words, we have an explicit function for the probability of the data given a model embodied in <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-ac7da57d7f507262338bb5168feb3e06_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"8\" style=\"vertical-align: -4px;\"\/>.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1096\" height=\"658\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2018\/12\/word-image-4.png\" class=\"wp-image-248\" srcset=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2018\/12\/word-image-4.png 1096w, https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2018\/12\/word-image-4-300x180.png 300w, https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2018\/12\/word-image-4-768x461.png 768w, https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2018\/12\/word-image-4-1024x615.png 1024w, https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2018\/12\/word-image-4-788x473.png 788w\" sizes=\"auto, (max-width: 1096px) 100vw, 1096px\" \/><\/p>\n<p>The function <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-3bf85f1087e9fbed3a319341134ac1a2_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"10\" style=\"vertical-align: -4px;\"\/>(HHT|q) is plotted here and looks like what we would expect, with a peak at <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-9cab43929f024663f64b5da531ea71df_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;&#61;&#50;&#47;&#51;\" title=\"Rendered by QuickLaTeX.com\" height=\"18\" width=\"59\" style=\"vertical-align: -5px;\"\/>.  That is, there is no other <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-ac7da57d7f507262338bb5168feb3e06_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"8\" style=\"vertical-align: -4px;\"\/> value that yields higher probability for the sequence of flips HHT.  On the other hand, there are other features that make intuitive sense: the probability to observe HHT given <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-ac7da57d7f507262338bb5168feb3e06_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"8\" style=\"vertical-align: -4px;\"\/> values near zero or one are vanishingly small.  All the preceding comments apply explicitly to the probability of seeing the data <em>given<\/em> the model embodied in <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-ac7da57d7f507262338bb5168feb3e06_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"8\" style=\"vertical-align: -4px;\"\/>.<\/p>\n<p>And yet, all the observations could be \u201cturned around\u201d and qualitatively, <em>they also apply to the posterior distribution we\u2019re seeking.<\/em>  The posterior is the probability of <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-ac7da57d7f507262338bb5168feb3e06_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"8\" style=\"vertical-align: -4px;\"\/> given the data, <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-3bf85f1087e9fbed3a319341134ac1a2_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"10\" style=\"vertical-align: -4px;\"\/>(q|HHT).  Clearly the most likely <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-ac7da57d7f507262338bb5168feb3e06_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"8\" style=\"vertical-align: -4px;\"\/> (in the absence of any other information) must be 2\/3.  And since we have seen both H and T values in our data, the probability that <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-4154b50cd718de7df1237b73bb5bebbb_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;&#61;&#48;\" title=\"Rendered by QuickLaTeX.com\" height=\"16\" width=\"41\" style=\"vertical-align: -4px;\"\/> or 1 must vanish.  So we already seem to be on the right track.  At least in our case, we can draw a tentative conclusion \u2026<\/p>\n<p><strong>Key Observation 2: <\/strong>Qualitatively, <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-3bf85f1087e9fbed3a319341134ac1a2_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"10\" style=\"vertical-align: -4px;\"\/>( model | data ) <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-73b509b2aabc7a32202cf25d81424aa1_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#115;&#105;&#109;&#32;&#112;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"27\" style=\"vertical-align: -4px;\"\/>( data | model ).  That is, it seems we can use easy-to-calculate <em>data probabilities<\/em> to estimate <em>model probabilities<\/em>.  This point will be formalized below.<\/p>\n<p>We are finally ready to jump into the formal math of the Bayesian approach, which should be a bit easier to appreciate with the HHT example in mind.  The Bayesian framework requires us only to understand the following rule of probability for a two-dimensional probability distribution: <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-7e5279df273e3513312c63c0020b4d07_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;&#40;&#120;&#44;&#121;&#41;&#32;&#61;&#32;&#112;&#40;&#121;&#124;&#120;&#41;&#32;&#92;&#44;&#32;&#112;&#40;&#120;&#41;&#32;&#61;&#32;&#112;&#40;&#120;&#124;&#121;&#41;&#32;&#92;&#44;&#32;&#112;&#40;&#121;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"18\" width=\"262\" style=\"vertical-align: -4px;\"\/>.  Here <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-ede05c264bba0eda080918aaa09c4658_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#120;\" title=\"Rendered by QuickLaTeX.com\" height=\"8\" width=\"10\" style=\"vertical-align: 0px;\"\/> and <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-0af556714940c351c933bba8cf840796_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#121;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"9\" style=\"vertical-align: -4px;\"\/> are any variables of interest (continuous or discrete-valued like H and T), <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-1c95ac3533820f2fc10ff2962627bff5_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;&#40;&#120;&#44;&#121;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"18\" width=\"50\" style=\"vertical-align: -4px;\"\/> is the <em>joint<\/em> distribution over both variables, meaning that <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-7d417651fcb87d11fbac5f9a48621352_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;&#32;&#92;&#44;&#32;&#100;&#120;&#32;&#92;&#44;&#32;&#100;&#121;\" title=\"Rendered by QuickLaTeX.com\" height=\"17\" width=\"53\" style=\"vertical-align: -4px;\"\/> is the probability in a <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-e3c0f69e9c96a72ff617e895fc7db546_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#100;&#120;&#100;&#121;\" title=\"Rendered by QuickLaTeX.com\" height=\"17\" width=\"37\" style=\"vertical-align: -4px;\"\/> square for a continuous distribution and <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-3bf85f1087e9fbed3a319341134ac1a2_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"10\" style=\"vertical-align: -4px;\"\/> is simply the absolute probability in a discrete system.  Since <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-1c95ac3533820f2fc10ff2962627bff5_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;&#40;&#120;&#44;&#121;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"18\" width=\"50\" style=\"vertical-align: -4px;\"\/> is a two-dimensional distribution it must be normalized when integrated (or summed) over all <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-ede05c264bba0eda080918aaa09c4658_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#120;\" title=\"Rendered by QuickLaTeX.com\" height=\"8\" width=\"10\" style=\"vertical-align: 0px;\"\/> and <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-0af556714940c351c933bba8cf840796_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#121;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"9\" style=\"vertical-align: -4px;\"\/> values.  Also, we can integrate (sum) over all <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-0af556714940c351c933bba8cf840796_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#121;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"9\" style=\"vertical-align: -4px;\"\/> values for each <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-ede05c264bba0eda080918aaa09c4658_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#120;\" title=\"Rendered by QuickLaTeX.com\" height=\"8\" width=\"10\" style=\"vertical-align: 0px;\"\/> to get the \u201cmarginal\u201d distribution <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-281d25eadace5f1ac42638e934e3eff1_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;&#40;&#120;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"18\" width=\"33\" style=\"vertical-align: -4px;\"\/>, or <em>vice versa<\/em> to get the other marginal <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-64e9ecc7b0b5fea0d3b25f7001a4cc71_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;&#40;&#121;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"18\" width=\"32\" style=\"vertical-align: -4px;\"\/>.  See any basic probability reference for more information on these issues.<\/p>\n<p>The Bayesian picture employs the two-dimensional joint distribution of data <em>and<\/em> models.  <em>This is the key conceptual leap<\/em>, assuming model probabilities can be quantified together in the first place.  We can therefore refine our first observation.<\/p>\n<p><strong>Key Observation 1a.  <\/strong>The Bayesian picture assumes that there is a true mathematical distribution of models (not just data), and not all models are equally likely.<\/p>\n<p>This is a rather abstract point, and perhaps you may question it.  After all, logically, it seems that there must have been only a <em>single<\/em> model that generated our data \u2013 we just don\u2019t know what it was.  True, but that single model is assumed to be unknowable (with absolute certainty).  Instead, from the statistical perspective, we instead hope only to characterize the likelihood of different underlying models based on the information we have.<\/p>\n<p>Back to coin flipping.  To make progress, we simply need to manipulate the rules for two-dimensional probability distributions discussed above in order to get the posterior (distribution for <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-ac7da57d7f507262338bb5168feb3e06_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"8\" style=\"vertical-align: -4px;\"\/>) we want.  We write<\/p>\n<p class=\"ql-center-displayed-equation\" style=\"line-height: 42px;\"><span class=\"ql-right-eqno\"> &nbsp; <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-2e7e03eeb39cc07c14bf08519193f369_l3.png\" height=\"42\" width=\"232\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"&#92;&#91;&#112;&#40;&#113;&#124;&#72;&#72;&#84;&#41;&#32;&#61;&#32;&#92;&#102;&#114;&#97;&#99;&#123;&#32;&#112;&#40;&#72;&#72;&#84;&#124;&#113;&#41;&#32;&#92;&#44;&#32;&#112;&#40;&#113;&#41;&#32;&#125;&#123;&#32;&#112;&#40;&#72;&#72;&#84;&#41;&#32;&#125;&#32;&#92;&#44;&#32;&#44;&#92;&#93;\" title=\"Rendered by QuickLaTeX.com\"\/><\/p>\n<p> which is called Bayes\u2019 rule or formula.<\/p>\n<p>Here, the posterior of interest is on the left.  On the right are factors that may be familiar or not.  The factor <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-22956f9e142dce0417940abaa4cfa6b7_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;&#40;&#72;&#72;&#84;&#124;&#113;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"18\" width=\"81\" style=\"vertical-align: -4px;\"\/> is the simplest \u2013 it\u2019s the probability of HHT given a specific <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-ac7da57d7f507262338bb5168feb3e06_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"8\" style=\"vertical-align: -4px;\"\/> value, as we already discussed.  The factor <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-0808828b61a506c3fdff300336a575ea_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;&#40;&#72;&#72;&#84;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"18\" width=\"68\" style=\"vertical-align: -4px;\"\/> technically is the overall probability of HHT in all possible models, but the nice thing about it is that it does not depend on q at all: it\u2019s a constant for any <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-ac7da57d7f507262338bb5168feb3e06_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"8\" style=\"vertical-align: -4px;\"\/> value on the left, so we can think of it as an uninteresting normalization constant.<\/p>\n<p>Most intriguing here is the so-called \u201cprior\u201d distribution of <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-ac7da57d7f507262338bb5168feb3e06_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"8\" style=\"vertical-align: -4px;\"\/> values, <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-26e09a38f403e8c233f55669b1accb2d_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;&#40;&#113;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"18\" width=\"31\" style=\"vertical-align: -4px;\"\/>.  This factor represents our knowledge about which <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-ac7da57d7f507262338bb5168feb3e06_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"8\" style=\"vertical-align: -4px;\"\/> values are likely &#8211; <em>independent of the data.<\/em>  For instance, if our HHT data was generated by a real physical coin, perhaps it\u2019s much harder to make a coin exhibiting very small or large <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-ac7da57d7f507262338bb5168feb3e06_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"8\" style=\"vertical-align: -4px;\"\/> values (near 0 or 1), so those probabilities might be smaller.  Perhaps it\u2019s your big brother using his magic set with unfair coins and you know that he only has coins with the values <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-9280573bb382111ec481c6551e07d3df_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;&#61;&#32;&#48;&#46;&#51;\" title=\"Rendered by QuickLaTeX.com\" height=\"16\" width=\"55\" style=\"vertical-align: -4px;\"\/> and <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-b7ab7f7da74090c2370accce55f96e3b_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#48;&#46;&#53;\" title=\"Rendered by QuickLaTeX.com\" height=\"13\" width=\"22\" style=\"vertical-align: 0px;\"\/>.<\/p>\n<p>Let\u2019s look more closely at the example of your big brother, who\u2019s tried to fool you lots of times \u2013 and so you know he\u2019s 90% likely to choose the unfair coin (<img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-337f103dec33a92c26a8c659b5b261ad_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;&#61;&#48;&#46;&#51;\" title=\"Rendered by QuickLaTeX.com\" height=\"16\" width=\"55\" style=\"vertical-align: -4px;\"\/>) and only 10% likely to choose the fair coin.  In that case, given just two coin flips to keep things simple, we can make the following table listing all the joint and marginal probabilities.<\/p>\n<table>\n<tbody>\n<tr>\n<td><\/td>\n<td><strong>HH<\/strong><\/td>\n<td><strong>HT<\/strong><\/td>\n<td><strong>TH<\/strong><\/td>\n<td><strong>TT<\/strong><\/td>\n<td><strong>Marginal: p(q)<\/strong><\/td>\n<\/tr>\n<tr>\n<td><strong>q = 0.3<\/strong><\/td>\n<td>\n  0.3*0.3*0.9<\/td>\n<td>\n  0.3*0.7*0.9<\/td>\n<td>\n  0.7*0.3*0.9<\/td>\n<td>\n  0.7*0.7*0.9<\/td>\n<td>\n  0.9<\/td>\n<\/tr>\n<tr>\n<td><strong>q = 0.5 <\/strong><\/td>\n<td>\n  0.5*0.5*0.1<\/td>\n<td>\n  0.5*0.5*0.1<\/td>\n<td>\n  0.5*0.5*0.1<\/td>\n<td>\n  0.5*0.5*0.1<\/td>\n<td>\n  0.1<\/td>\n<\/tr>\n<tr>\n<td><strong>Marginal<\/strong><\/td>\n<td>\n  0.106<\/td>\n<td>\n  0.214<\/td>\n<td>\n  0.214<\/td>\n<td>\n  0.466<\/td>\n<td><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>This table is worth staring at for a while.  The eight entries in the middle of the table are the estimated posterior probabilities &#8211; with proper normalization to make things concrete.  The marginal <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-26e09a38f403e8c233f55669b1accb2d_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;&#40;&#113;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"18\" width=\"31\" style=\"vertical-align: -4px;\"\/> in the right column is the prior.<\/p>\n<p>Try picking a given data value (e.g., HH or TT) and see what the estimated <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-ac7da57d7f507262338bb5168feb3e06_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"8\" style=\"vertical-align: -4px;\"\/> probabilities are.  This will tell you both the best guess for <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-ac7da57d7f507262338bb5168feb3e06_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"8\" style=\"vertical-align: -4px;\"\/> given any particular data, as well as the relative chance you\u2019re wrong, given the assumed prior.  If the two flips were TT, for example, then the table shows there is a 95% likelihood (0.441\/0.466) that <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-ac7da57d7f507262338bb5168feb3e06_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"8\" style=\"vertical-align: -4px;\"\/> = 0.3.  The key thing here is that this is different than the 90% prior: the data have helped point you toward the more likely outcome.  (Other pairs of flips make <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-ac7da57d7f507262338bb5168feb3e06_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"8\" style=\"vertical-align: -4px;\"\/> = 0.3 less likely than its prior, as you can check.)  More data will help a lot, not surprisingly, as described below.<\/p>\n<p>Let\u2019s build on Bayes\u2019 rule to re-examine the case where <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-ac7da57d7f507262338bb5168feb3e06_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"8\" style=\"vertical-align: -4px;\"\/> can take on any value between 0 and 1.  The posterior, our estimate for the likelihood of <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-ac7da57d7f507262338bb5168feb3e06_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"8\" style=\"vertical-align: -4px;\"\/>, is given by <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-c8840aea3d1d1baa1a5de1be251be3b1_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;&#40;&#113;&#124;&#72;&#72;&#84;&#41;&#32;&#92;&#112;&#114;&#111;&#112;&#116;&#111;&#32;&#112;&#40;&#72;&#72;&#84;&#124;&#113;&#41;&#32;&#92;&#44;&#32;&#112;&#40;&#113;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"18\" width=\"221\" style=\"vertical-align: -4px;\"\/>, where we have omitted the unimportant constant factor <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-0808828b61a506c3fdff300336a575ea_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;&#40;&#72;&#72;&#84;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"18\" width=\"68\" style=\"vertical-align: -4px;\"\/>.  The relation is almost what we hypothesized before from common sense: the probability of the model given the data is proportional to the probability of the data given the model <em>aside from prior information we may have to correct that<\/em>.  But in hindsight, it\u2019s also clear that prior information about possible or likely <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-ac7da57d7f507262338bb5168feb3e06_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"8\" style=\"vertical-align: -4px;\"\/> values must correct our estimate.  Thus, we can make another observation.<\/p>\n<p><strong>Key Observation 3: <\/strong>Bayes\u2019 formula tells us, mathematically, how to incorporate \u201cprior\u201d knowledge about the set of (im)possible models.  The formula makes intuitive sense, in that the intrinsic, data-independent prior probability of any model acts as a corrective pre-factor or weight for the overall \u201cposterior\u201d estimate of a model\u2019s probability.  If we have no prior information about models, then the posterior model probability is simply proportional to the data probability <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-22956f9e142dce0417940abaa4cfa6b7_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;&#40;&#72;&#72;&#84;&#124;&#113;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"18\" width=\"81\" style=\"vertical-align: -4px;\"\/>.<\/p>\n<p>Let\u2019s consider whether and when the prior really matters.  Perhaps you\u2019ve heard the choice of prior is extremely important \u2013 or maybe you\u2019ve heard the prior doesn\u2019t really matter.  Well, both are true in different situations.<\/p>\n<p>We need to look more closely at the data probability, <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-39658ce77b596a0766c0ab758fd82925_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;&#40;&#72;&#72;&#84;&#32;&#124;&#32;&#113;&#41;&#32;&#61;&#32;&#113;&#94;&#50;&#32;&#40;&#49;&#45;&#113;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"174\" style=\"vertical-align: -4px;\"\/>.  This is just the product of the probabilities for each event, so more generally, we have<\/p>\n<p class=\"ql-center-displayed-equation\" style=\"line-height: 18px;\"><span class=\"ql-right-eqno\"> &nbsp; <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-56df802354dccbc760303b4b318a8a20_l3.png\" height=\"18\" width=\"441\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"&#92;&#91;&#112;&#40;&#120;&#95;&#49;&#44;&#32;&#120;&#95;&#50;&#44;&#32;&#120;&#95;&#51;&#44;&#32;&#92;&#108;&#100;&#111;&#116;&#115;&#44;&#32;&#120;&#95;&#78;&#32;&#124;&#32;&#92;&#109;&#98;&#111;&#120;&#123;&#109;&#111;&#100;&#101;&#108;&#125;&#41;&#32;&#61;&#32;&#112;&#40;&#120;&#95;&#49;&#41;&#32;&#92;&#44;&#32;&#112;&#40;&#120;&#95;&#50;&#41;&#32;&#92;&#44;&#32;&#112;&#40;&#120;&#95;&#51;&#41;&#32;&#92;&#44;&#32;&#92;&#99;&#100;&#111;&#116;&#115;&#32;&#92;&#44;&#32;&#112;&#40;&#120;&#95;&#78;&#41;&#44;&#92;&#93;\" title=\"Rendered by QuickLaTeX.com\"\/><\/p>\n<p>where each factor on the right side is really <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-be0ccfb044f7edd0b577eddefd4bda4e_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;&#40;&#120;&#95;&#105;&#32;&#124;&#32;&#32;&#92;&#109;&#98;&#111;&#120;&#123;&#109;&#111;&#100;&#101;&#108;&#125;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"18\" width=\"90\" style=\"vertical-align: -4px;\"\/>.<\/p>\n<p>If each event is a coin toss <em>and there are many of them<\/em>, then in contrast to the <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-d148c99cbc07bb7b65cbc6332ce2eae9_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;&#94;&#50;&#32;&#40;&#49;&#45;&#113;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"68\" style=\"vertical-align: -4px;\"\/> function shown above, the full product of probabilities will become a <em>very sharply peaked <\/em>function of <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-ac7da57d7f507262338bb5168feb3e06_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"8\" style=\"vertical-align: -4px;\"\/>.  (You should check this for a sequence containing some arbitrary large numbers of H and T values.)  Thus, in the case of large <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-5793832f979c2268e3694c246d53b1bb_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#78;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"16\" style=\"vertical-align: 0px;\"\/>, really only a small range of <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-ac7da57d7f507262338bb5168feb3e06_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"8\" style=\"vertical-align: -4px;\"\/> values will be probable.  If the peak in <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-ac7da57d7f507262338bb5168feb3e06_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"8\" style=\"vertical-align: -4px;\"\/> is sharp enough, just a tiny range will be probable <em>regardless of the prior<\/em>.  So we can add to our list of observations.<\/p>\n<p><strong>Key Observation 4:<\/strong> The prior distribution (assumption about models) is more important when there is less data and decreasingly important as the amount of data grows.<\/p>\n<p>Maybe your brain is getting tired, even in this briefest of introductions, but there is one more key point about Bayesian statistics.<\/p>\n<p>One of the most important features of the Bayesian formulation is that it inherently brings with it the ability to characterize the range of uncertainty associated with a certain model.  In other words, it\u2019s easy to find the most likely <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-ac7da57d7f507262338bb5168feb3e06_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"8\" style=\"vertical-align: -4px;\"\/> value (which was 2\/3 for our HHT example), but our original question was whether we think the coin was fair or not.<\/p>\n<p>So what do we think \u2026 was the coin fair?!?  Of course it\u2019s impossible to answer yes or no based on the data alone, but we can make a simple calculation to quantify the situation.<\/p>\n<p>Let\u2019s take the simple case of a constant prior for now, in which case the posterior distribution of <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-ac7da57d7f507262338bb5168feb3e06_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"8\" style=\"vertical-align: -4px;\"\/> is <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-cd69f6a5deb61c1a1540bde8a6d59096_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;&#40;&#113;&#32;&#124;&#32;&#72;&#72;&#84;&#41;&#32;&#92;&#112;&#114;&#111;&#112;&#116;&#111;&#32;&#113;&#94;&#50;&#32;&#40;&#49;&#45;&#113;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"174\" style=\"vertical-align: -4px;\"\/>.  Since we know the distribution for any <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-ac7da57d7f507262338bb5168feb3e06_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"8\" style=\"vertical-align: -4px;\"\/> value, we can simply evaluate it at the two values of interest.  We find<\/p>\n<p class=\"ql-center-displayed-equation\" style=\"line-height: 43px;\"><span class=\"ql-right-eqno\"> &nbsp; <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-bcb3f82cea3bf036516d252cf71206a6_l3.png\" height=\"43\" width=\"196\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"&#92;&#91;&#92;&#102;&#114;&#97;&#99;&#32;&#123;&#32;&#112;&#40;&#113;&#61;&#49;&#47;&#50;&#32;&#92;&#44;&#32;&#124;&#32;&#92;&#44;&#32;&#72;&#72;&#84;&#41;&#32;&#125;&#123;&#32;&#112;&#40;&#113;&#61;&#50;&#47;&#51;&#32;&#92;&#44;&#32;&#124;&#32;&#92;&#44;&#32;&#72;&#72;&#84;&#41;&#32;&#125;&#32;&#92;&#97;&#112;&#112;&#114;&#111;&#120;&#32;&#48;&#46;&#56;&#52;&#92;&#93;\" title=\"Rendered by QuickLaTeX.com\"\/><\/p>\n<p>That is, with a constant prior, the likelihood that HHT data was generated by a fair coin is 84% as high as the simple guess <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-9cab43929f024663f64b5da531ea71df_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;&#61;&#50;&#47;&#51;\" title=\"Rendered by QuickLaTeX.com\" height=\"18\" width=\"59\" style=\"vertical-align: -5px;\"\/>.  There\u2019s quite a good chance the coin was fair.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"851\" height=\"509\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2018\/12\/c-users-zuckermd-box-sync-blog-figs-bayesian-prob-2.png\" class=\"wp-image-249\" alt=\"C:\\Users\\zuckermd\\Box Sync\\blog\\figs\\Bayesian-prob-q-given-HHT.png\" srcset=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2018\/12\/c-users-zuckermd-box-sync-blog-figs-bayesian-prob-2.png 851w, https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2018\/12\/c-users-zuckermd-box-sync-blog-figs-bayesian-prob-2-300x179.png 300w, https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2018\/12\/c-users-zuckermd-box-sync-blog-figs-bayesian-prob-2-768x459.png 768w, https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2018\/12\/c-users-zuckermd-box-sync-blog-figs-bayesian-prob-2-788x471.png 788w\" sizes=\"auto, (max-width: 851px) 100vw, 851px\" \/><\/p>\n<p>You can take this type of analysis further by determining a range of reasonably probable <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-ac7da57d7f507262338bb5168feb3e06_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"8\" style=\"vertical-align: -4px;\"\/> values as sketched above.  What range you want to use will depend on what question you are trying to answer.  Quantifying a range, however, is a topic beyond the scope of this post.  In any case, we can make our final observation.<\/p>\n<p><strong>Key Observation 5:<\/strong> The Bayesian formulation intrinsically quantifies uncertainty in the underlying model.  This is because the posterior distribution of models provides an estimate for the likelihood of all models which thus can be compared directly.<\/p>\n<p>Note that standard confidence intervals from frequentist statistics are not direct characterizations of the underlying <em>model<\/em> in the same way that a Bayesian analysis.  Rather, frequentist analysis characterizes ranges of outcomes, as emphasized in an <a href=\"https:\/\/statisticalbiophysicsblog.org\/?p=213\">earlier post<\/a>.  To take an extreme case, if you know a coin can be heads or tails but with an unknown <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-ac7da57d7f507262338bb5168feb3e06_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"8\" style=\"vertical-align: -4px;\"\/> value &#8211; and only a <em>single coin flip <\/em>is performed yielding H &#8211; frequentist statistics really cannot characterize anything about the situation whereas Bayesian analysis can.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I realized that I owe you something. In a prior post, I invoked some Bayesian ideas to contrast with boostrapping analysis of high-variance data. (More precisely, it was high log-variance data for which there was a problem, as described in our preprint.) But the Bayesian discussion in my earlier post was pretty quick. Although there [&hellip;]<\/p>\n","protected":false},"author":6,"featured_media":252,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20,18],"tags":[],"class_list":["post-233","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-bayesian-statistics","category-statistical-uncertainty"],"_links":{"self":[{"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=\/wp\/v2\/posts\/233","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=233"}],"version-history":[{"count":11,"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=\/wp\/v2\/posts\/233\/revisions"}],"predecessor-version":[{"id":444,"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=\/wp\/v2\/posts\/233\/revisions\/444"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=\/wp\/v2\/media\/252"}],"wp:attachment":[{"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=233"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=233"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=233"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}