{"id":258,"date":"2019-01-29T18:46:19","date_gmt":"2019-01-29T18:46:19","guid":{"rendered":"http:\/\/statisticalbiophysicsblog.org\/?p=258"},"modified":"2019-01-29T18:46:19","modified_gmt":"2019-01-29T18:46:19","slug":"rules-of-thumb-for-sampling-assessment","status":"publish","type":"post","link":"https:\/\/statisticalbiophysicsblog.org\/?p=258","title":{"rendered":"Rules of Thumb for Sampling Assessment"},"content":{"rendered":"<p>\n  Some quick guidance for analyzing molecular dynamics (MD) or Markov-chain Monte Carlo (MC) data in hard-to-sample systems &#8211; e.g., biomolecules.  I can summarize the advice this way: <u>Ask not how to compute error bars.  Ask first whether error bars are even appropriate.<\/u>  A meaningless error bar is more dangerous (to you and the community) than no error bar at all.  This guidance is essentially abstracted from our recent <a href=\"https:\/\/www.livecomsjournal.org\/article\/5067-best-practices-for-quantification-of-uncertainty-and-sampling-quality-in-molecular-simulations-article-v1-0\">Best Practices paper<\/a>, and I hope it will set in context some of the theory discussed in an <a href=\"https:\/\/statisticalbiophysicsblog.org\/?p=180\">earlier post<\/a>.\n<\/p>\n<p>\n  <!--more-->\n<\/p>\n<p><em>Rule Zero &#8211; Plan your study with explicit awareness of sampling needs.<\/em>  Has anyone ever convincingly sampled a system as complex as the one you\u2019re choosing to study?  Do you know the timescales associated with your system and is there hope to access them \u2026 multiple times?\n<\/p>\n<p><em>Rule One &#8211; Assume your data is <u>not<\/u> well-sampled, until you\u2019re convinced to the contrary.<\/em>  There are good <a href=\"https:\/\/www.livecomsjournal.org\/article\/5067-best-practices-for-quantification-of-uncertainty-and-sampling-quality-in-molecular-simulations-article-v1-0\">qualitative tests<\/a> for ruling out good sampling: use them.  If there is evidence an important state has only been visited once or if you see continuing drift in any observable that is supposed to be in a steady state, you\u2019re not well sampled.  <u>Every<\/u> important steady-state observable should be fluctuating about a mean after a transient \u201cequilibration\/burn-in\u201d period.  For multiple trajectories, also compare the distributions from each.\n<\/p>\n<p><em>Rule Two &#8211; Do not cherry pick data.<\/em>  If some of your data makes a nice story for any reason while other data does not, excluding the disagreeable data means biasing your results.  That\u2019s not good science.\n<\/p>\n<p><em>Rule Three &#8211; Be extremely cautious when assessing data from an enhanced sampling approach.<\/em>  Be cynical at first and assume that the only thing your fancy method does is smooth otherwise poor data.  At a minimum, perform multiple <u>completely independent<\/u> runs to gauge variance.  If your results depend on starting configuration(s), then you have not sampled well.\n<\/p>\n<p>\n  What are some examples of good sampling?  Most obviously I can point to the MD <a href=\"http:\/\/science.sciencemag.org\/content\/334\/6055\/517\">protein folding study<\/a> by Shaw and coworkers, where we see multiple folding and unfolding events; this is what good sampling looks like in a single trajectory.  In more modest systems, we carefully analyzed MC-based <a href=\"https:\/\/pubs.acs.org\/doi\/abs\/10.1021\/jp910112d\">peptide equilibrium sampling<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Some quick guidance for analyzing molecular dynamics (MD) or Markov-chain Monte Carlo (MC) data in hard-to-sample systems &#8211; e.g., biomolecules. I can summarize the advice this way: Ask not how to compute error bars. Ask first whether error bars are even appropriate. A meaningless error bar is more dangerous (to you and the community) than [&hellip;]<\/p>\n","protected":false},"author":6,"featured_media":262,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[18],"tags":[],"class_list":["post-258","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-statistical-uncertainty"],"_links":{"self":[{"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=\/wp\/v2\/posts\/258","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=258"}],"version-history":[{"count":6,"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=\/wp\/v2\/posts\/258\/revisions"}],"predecessor-version":[{"id":265,"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=\/wp\/v2\/posts\/258\/revisions\/265"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=\/wp\/v2\/media\/262"}],"wp:attachment":[{"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=258"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=258"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=258"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}