{"id":180,"date":"2017-12-21T02:05:45","date_gmt":"2017-12-21T02:05:45","guid":{"rendered":"http:\/\/statisticalbiophysicsblog.org\/?p=180"},"modified":"2019-01-29T18:28:28","modified_gmt":"2019-01-29T18:28:28","slug":"lets-stop-being-sloppy-about-uncertainty","status":"publish","type":"post","link":"https:\/\/statisticalbiophysicsblog.org\/?p=180","title":{"rendered":"Let\u2019s stop being sloppy about uncertainty"},"content":{"rendered":"<p>Let\u2019s draw a line.  Across the calendar, I mean.  Let\u2019s all pledge that from today on we\u2019re going to give honest accounting of the uncertainty in our data.  I mean \u2018honest\u2019 in the sense that if someone tried to reproduce our data in the future, their confidence interval and ours would overlap.<\/p>\n<p>There are a few conceptual issues to address up front.  Let\u2019s set up our discussion in terms of some variable <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-ac7da57d7f507262338bb5168feb3e06_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"8\" style=\"vertical-align: -4px;\"\/> which we measure in a molecular dynamics (MD) simulation at successive configurations: <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-146877c5d2194c5d703b68f699c57c1b_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;&#95;&#48;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"15\" style=\"vertical-align: -4px;\"\/>, <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-246601121f687545f8d4443aabcad088_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;&#95;&#49;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"14\" style=\"vertical-align: -4px;\"\/>, <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-441931bb19f13f9cdcb228a2c93e5a34_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;&#95;&#50;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"15\" style=\"vertical-align: -4px;\"\/>, and so on.  Regardless of the length of our simulation, we can measure the average of all the values <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-69910680bec0f4d26f2ef0e5d86cf577_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#113;&#125;&#61;&#32;&#92;&#100;&#105;&#115;&#112;&#108;&#97;&#121;&#115;&#116;&#121;&#108;&#101;&#92;&#115;&#117;&#109;&#95;&#123;&#105;&#61;&#49;&#125;&#94;&#123;&#77;&#125;&#32;&#113;&#95;&#105;\" title=\"Rendered by QuickLaTeX.com\" height=\"53\" width=\"74\" style=\"vertical-align: -22px;\"\/>.  We can also calculate the standard deviation \u03c3 of these values in the usual way as the square root of the variance.  Both of these quantities will approach their \u201ctrue\u201d values (based on the simulation protocol) with enough sampling \u2013 with large enough <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-10ebb71bad275c1815a8f2a8c5dea0be_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#77;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"19\" style=\"vertical-align: 0px;\"\/>.<\/p>\n<p><!--more--><\/p>\n<p>The first key point to get straight is one that you probably know already: the difference between the natural scale of variation of the variable <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-ac7da57d7f507262338bb5168feb3e06_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"8\" style=\"vertical-align: -4px;\"\/> (see blue curve below, whose width is characterized by the standard deviation \u03c3) and the uncertainty in the estimate of <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-78d1749fa0cea58ed5ee3c09b6d0dfe9_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#113;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"15\" width=\"9\" style=\"vertical-align: -4px;\"\/> (see red curve below).  The standard deviation describes the range of values one expects to see in the list of <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-082fbf97a093c57232fd760b0ec58542_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#113;&#95;&#105;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"13\" style=\"vertical-align: -4px;\"\/> values, but the variability in the estimate of the mean can be much smaller \u2013 if there are many <em>independent<\/em> measurements.<\/p>\n<p><a href=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2017\/12\/image8.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2017\/12\/image8.png\" alt=\"\" width=\"589\" height=\"417\" class=\"aligncenter size-full wp-image-192\" srcset=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2017\/12\/image8.png 589w, https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2017\/12\/image8-300x212.png 300w\" sizes=\"auto, (max-width: 589px) 100vw, 589px\" \/><\/a><\/p>\n<p>The uncertainty in <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-78d1749fa0cea58ed5ee3c09b6d0dfe9_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#113;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"15\" width=\"9\" style=\"vertical-align: -4px;\"\/> is best framed in terms of a <em>confidence interval<\/em>.  Understanding confidence intervals, which characterize statistical uncertainty, is essential and can be accomplished with a simple \u201cthought experiment.\u201d  Imagine you repeat your simulation of length M many times, each time generating an average q (from the red distribution in the figure).  If you consider the distribution of these <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-78d1749fa0cea58ed5ee3c09b6d0dfe9_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#113;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"15\" width=\"9\" style=\"vertical-align: -4px;\"\/> values there is a 90% chance that any future simulation (performed in an identical fashion, aside from stochasticity) will fall within the 5th and 95th percentiles of the <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-78d1749fa0cea58ed5ee3c09b6d0dfe9_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#113;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"15\" width=\"9\" style=\"vertical-align: -4px;\"\/> distribution \u2013 regardless of whether all <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-10ebb71bad275c1815a8f2a8c5dea0be_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#77;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"19\" style=\"vertical-align: 0px;\"\/> values are independent.  This range of percentiles <em>is<\/em> the 90% confidence interval \u2013 even if we don\u2019t know how to obtain them in practice.<\/p>\n<p>The confidence interval or another estimate of statistical uncertainty in principle can be smaller, even much smaller, than the standard deviation if the number of <em>independent samples<\/em> among the <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/ql-cache\/quicklatex.com-10ebb71bad275c1815a8f2a8c5dea0be_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#77;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"19\" style=\"vertical-align: 0px;\"\/> values is extremely large.  That is, if a simulation is very long compared to system timescales, the estimate of an average from a single simulation should be very close to the \u201ctrue\u201d mean, based on infinite simulation, and the difference can be very small compared to \u03c3.  So, with more data, the red curve in the figure above would get narrower, as would the corresponding confidence interval.  <\/p>\n<p>As you may know, a confidence interval can also be estimated based on the <em>standard error of the mean<\/em> (an estimator of the more official \u201cstandard uncertainty\u201d noted in the VIM reference) but this assumes a symmetric Gaussian distribution of the variable and requires knowing the number of independent samples in your data.  In MD simulation, only a small fraction of the frames generated may be independent.  And it is a challenging if not impossible task to quantify this, as described below.  <\/p>\n<p>Error bars really only make sense when you\u2019re sure multiple independent samples have been gathered.  <em>Without independent samples, we really have no way to understand the statistical uncertainty in our measurements.  <\/em><\/p>\n<p>In molecular simulation, we would want to know how many of the \u2018zillions\u2019 of trajectory frames are truly independent.  Generally, this can be estimated by examining the autocorrelation time for the observable of interest (or implicitly via block-averaging) &#8230; with the risk that subtle correlations with slower processes may be masked by such an analysis.  The transient \u201cequilibration\u201d time needed to relax the system initially also must be accounted for.<\/p>\n<p>To examine sample size using a global correlation time (aka \u201cdecorrelation\u201d time), Ed Lyman and I proposed examining trajectory frames at increasingly large time intervals.  The data at each interval were compared to behavior expected for independent samples in order to infer the interval which led to quasi-independent behavior.  This was a way to estimate an effective sample size for the whole system \u2013 i.e., the total simulation time divided by the minimum interval required for independence.  The analysis can also show if the minimum interval is close to the overall simulation time, which would rule out good sampling.<\/p>\n<p>How much of this is really relevant for the types of systems that usually are simulated?<\/p>\n<p>If we\u2019re honest, we\u2019ll admit that for simulations of large and complex systems, we may never reach the timescales necessary to generate independent samples.  What then?  How do we even think about error in such cases?<\/p>\n<p>To make things concrete, let\u2019s assume we\u2019ve run a one microsecond simulation of a system with a slow timescale of 10 \u03bcs.  In this case, all of the trajectory frames are correlated.  Of course, there\u2019s nothing to stop us from calculating the standard deviation \u03c3 of any observable based on the trajectory, but we can expect that even the value of \u03c3 is highly biased \u2013 probably too small because we have not sampled important parts of the configuration space that require 10 \u03bcs to see.  Certainly the standard error\/uncertainty would be meaningless because there is not even one representative (independent) sample due to the short sampling time.<\/p>\n<p>Well, there\u2019s no getting around inadequate data, but statistical mechanics does have something to teach us here.  We can view our single too-short trajectory, as in our prior thought experiment, as one sample of a distribution of 1 \u03bcs trajectories, all started from the same configuration.  This is precisely the \u201c<a href=\"https:\/\/statisticalbiophysicsblog.org\/?p=103\">initialized trajectory ensemble<\/a>\u201d discussed in an earlier post, which must obey a non-equilibrium <a href=\"https:\/\/en.wikipedia.org\/wiki\/Fokker%E2%80%93Planck_equation\">Fokker-Planck equation<\/a>.  If we follow the ensemble of time-evolving trajectories, we can get the non-equilibrium configuration space distribution (ensemble) at any time point.  This distribution will slowly relax toward equilibrium under typical conditions \u2013 or possibly to a non-equilibrium steady state under other constraints.<\/p>\n<p>That conceptual picture is very powerful, but is it useful?  Maybe or maybe not.  If you just have the single too-short trajectory \u2013 sorry, you\u2019re out of luck.  However, if you have several too-short trajectories generated independently, then you can at least estimate the uncertainty in your observable of interest <u>given the time elapsed and the initial state or initial distribution<\/u> \u2013 call this an <em>ensemble transient average<\/em>.  See the sketch below.  For example, you could calculate the average of a quantity in each trajectory based on the interval from 95 \u2013 100 ns.  Then the multiple independent trajectories would enable you to estimate uncertainty in this explicitly transient quantity.  It\u2019s not an equilibrium estimate, but hey, it\u2019s a genuine statistical mechanics observable that could also be measured by another group.  Such an observable also might be compared among related systems (e.g., different ligands interacting with the same protein) in a statistically meaningful way.<\/p>\n<p><a href=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2017\/12\/image9.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2017\/12\/image9.png\" alt=\"\" width=\"615\" height=\"484\" class=\"aligncenter size-full wp-image-193\" srcset=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2017\/12\/image9.png 615w, https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2017\/12\/image9-300x236.png 300w\" sizes=\"auto, (max-width: 615px) 100vw, 615px\" \/><\/a><\/p>\n<p>If you want to learn more about uncertainty analysis \u2013 and figure out how to make meaningful error bars for your own data \u2013 consider looking at the article published in the \u201c<a href=\"http:\/\/www.livecomsjournal.org\/\">Living Journal of Computational Molecular Science<\/a>,\u201d a non-profit, community-run, low-cost, open-access venue for updateable articles oriented toward education and best practices.  (Full disclosure: I helped to found the journal.)  Feel free to look at the article and make suggestions directly on the <a href=\"https:\/\/github.com\/dmzuckerman\/Sampling-Uncertainty\">article\u2019s github page<\/a>.<\/p>\n<p><strong>Further Reading<\/strong><\/p>\n<p>Chodera JD. \u201cA simple method for automated equilibration detection in molecular simulations,\u201d <a href=\"http:\/\/pubs.acs.org\/doi\/abs\/10.1021\/acs.jctc.5b00784\">Journal of chemical theory and computation. 2016; 12(4):1799\u20131805.<\/a><\/p>\n<p>Grossfield A, Patrone PN, Roe DR, Schultz AJ, Siderius DW, Zuckerman DM. &#8220;Best Practices for Quantification of Uncertainty and Sampling Quality in Molecular Simulations [Article v1. 0],&#8221; <a href=\"https:\/\/www.livecomsjournal.org\/article\/5067-best-practices-for-quantification-of-uncertainty-and-sampling-quality-in-molecular-simulations-article-v1-0\">Living journal of computational molecular science. 2018;1(1)<\/a>.<\/p>\n<p>Grossfield A, Zuckerman DM. \u201cQuantifying uncertainty and sampling quality in biomolecular simulations,\u201d Annu Rep Comput Chem. 2009. 5:23\u201348.<br \/>\nJCGM. JCGM 200: \u201cInternational vocabulary of metrology \u2013 Basic and general concepts and associated terms (VIM)\u201d. <a href=\"https:\/\/www.bipm.org\/en\/publications\/guides\/#vim\">Joint Committee for Guides in Metrology; 2012.<\/a> <\/p>\n<p>Lyman E, Zuckerman DM. \u201cOn the Structural Convergence of Biomolecular Simulations by Determination of the Effective Sample Size,\u201d <a href=\"http:\/\/pubs.acs.org\/doi\/abs\/10.1021\/jp073061t\">J. Phys Chem B. 2007; 111(44):12876\u201312882.<\/a><\/p>\n<p>Yang W, Bitetti-Putzer R, Karplus M. \u201cFree energy simulations: Use of reverse cumulative averaging to determine the equilibrated region and the time required for convergence,\u201d <a href=\"http:\/\/aip.scitation.org\/doi\/abs\/10.1063\/1.1638996\">Journal of Chemical Physics. 2004; 120(6):2618\u20132628.<\/a><\/p>\n<p>DM Zuckerman, <em>Statistical Physics of Biomolecules: An Introduction<\/em>, CRC Press, 2010.<\/p>\n[This post edited on Jan. 28, 2019 to reflect publication status of Grossfield et al.]\n","protected":false},"excerpt":{"rendered":"<p>Let\u2019s draw a line. Across the calendar, I mean. Let\u2019s all pledge that from today on we\u2019re going to give honest accounting of the uncertainty in our data. I mean \u2018honest\u2019 in the sense that if someone tried to reproduce our data in the future, their confidence interval and ours would overlap. There are a [&hellip;]<\/p>\n","protected":false},"author":5,"featured_media":207,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[18,8],"tags":[],"class_list":["post-180","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-statistical-uncertainty","category-trajectory-physics"],"_links":{"self":[{"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=\/wp\/v2\/posts\/180","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=180"}],"version-history":[{"count":22,"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=\/wp\/v2\/posts\/180\/revisions"}],"predecessor-version":[{"id":257,"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=\/wp\/v2\/posts\/180\/revisions\/257"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=\/wp\/v2\/media\/207"}],"wp:attachment":[{"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=180"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=180"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=180"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}