{"id":160,"date":"2017-05-21T07:00:32","date_gmt":"2017-05-21T07:00:32","guid":{"rendered":"http:\/\/statisticalbiophysicsblog.org\/?p=160"},"modified":"2022-11-02T21:06:41","modified_gmt":"2022-11-02T21:06:41","slug":"what-i-have-against-most-pmf-calculations","status":"publish","type":"post","link":"https:\/\/statisticalbiophysicsblog.org\/?p=160","title":{"rendered":"What I have against (most) PMF calculations"},"content":{"rendered":"<p>Such a beautiful thing, the PMF.  The potential of mean force is a \u2018free energy landscape\u2019 \u2013 the energy-like-function whose Boltzmann factor exp[ -PMF(x) \/ kT ] gives the relative probability* for any coordinate (or coordinate set) x by integrating out (averaging over) all other coordinates.  For example, x could be the angle between two domains in a protein or the distance of a ligand from a binding site.<\/p>\n<p>The PMF\u2019s basis in statistical mechanics is clear.  When visualized, its basins and barriers cry out &#8220;Mechanism!\u2019\u2019 and kinetics are often inferred from the heights of these features.<\/p>\n<p>Yet aside from the probability part of the preceding paragraph, the rest is largely speculative and subjective \u2026 and that\u2019s assuming the PMF is well-sampled, which I highly doubt in most biomolecular cases of interest.<\/p>\n<p><!--more--><\/p>\n<p>Let\u2019s deal with each of the issues in turn: mechanism, kinetics, and sampling.<\/p>\n[* Note that a precise interpretation of the PMF requires knowledge of the Jacobian for the chosen coordinate(s): some intervals (x, x+dx) may contain more Cartesian configuration space than others.]\n<p><em>Mechanism and the PMF<\/em><\/p>\n<p>To my knowledge, David Chandler and coworkers were the first to highlight dangers of the PMF.  They described energy landscapes \u2013 potential energy functions \u2013 with mechanisms that could not be described by obvious coordinate choices for PMF calculations (thus motivating <a href=\"https:\/\/statisticalbiophysicsblog.org\/?p=115\">path-sampling techniques<\/a>).  Expanding on those ideas, consider the two-dimensional landscapes below.<\/p>\n<p><a href=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2017\/05\/saddlepoint.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2017\/05\/saddlepoint-300x210.png\" alt=\"\" class=\"alignnone size-medium wp-image-163\" width=\"300\" height=\"210\" srcset=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2017\/05\/saddlepoint-300x210.png 300w, https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2017\/05\/saddlepoint.png 574w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><a href=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2017\/05\/saddlepoint2.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2017\/05\/saddlepoint2-300x197.png\" alt=\"\" class=\"alignnone size-medium wp-image-162\" width=\"300\" height=\"197\" srcset=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2017\/05\/saddlepoint2-300x197.png 300w, https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2017\/05\/saddlepoint2.png 612w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><a href=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2017\/05\/saddlepoint3.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2017\/05\/saddlepoint3-300x202.png\" alt=\"\" class=\"aligncenter size-medium wp-image-161\" width=\"300\" height=\"202\"><\/a><\/p>\n<p>Although some of the landscapes perhaps could be described by some (tortuous) one-dimensional coordinate, in other instances that just wouldn\u2019t be possible.  Think how much worse the situation would have to be in a system like a protein with thousands of degrees of freedom.<\/p>\n<p>Now take one more step on the road to skepticism and consider how PMFs tend to be constructed in computational studies.  Because the calculations are so expensive, we tend to choose one or two coordinates in advance for study.  These coordinate choices represent our pre-conceived ideas about how a system might function.<\/p>\n<p>Based on the features of the PMF landscape, we build a story about mechanism \u2026 the dwells are here and the transition states there.  And yet we know that all basins are not alike \u2013 some may be energy stabilized and other favored by entropy.  The heights of barriers, their widths and likewise the dimensions of basins all are influenced by the particular coordinate(s) we chose in advance, as illustrated in the two-dimensional examples above.<\/p>\n<p>So I would say the quantitative analysis of a PMF is inherently misleading, but the \u201cstories\u201d we build are almost more dangerous.  Because our minds are naturally drawn to stories, such narratives are easy to retain.  This wouldn\u2019t be a problem if the stories were not biased by the subjective coordinate choices and perhaps also by inadequate sampling.<\/p>\n<p>Is there a good way to think about mechanism?  I believe the starting point has to be <a href=\"https:\/\/statisticalbiophysicsblog.org\/?p=92\">a trajectory ensemble of continuous unbiased transition events<\/a>.  Trajectories tell us the full sequence of events from an initial to a final state.  And the ensemble reports on the diversity of mechanisms, something a PMF could never do.  It\u2019s true that <a href=\"https:\/\/statisticalbiophysicsblog.org\/?p=115\">trajectory ensembles are difficult to obtain<\/a>, but at least they offer a potentially unbiased way to describe mechanism.<\/p>\n<p>As a final perspective on mechanism, think about the goal of discovery.  We use large amounts of computing resources and we would like to be able to discover something we did not know before.  But if we only look at coordinates we presumed from the start to be important, we are severely impairing our ability to discover novel phenomena.<\/p>\n<p>Perhaps it\u2019s worth a moment to consider that our subjective coordinate choices typically are based on a further assumption \u2013 that we have good or complete knowledge of our system\u2019s biological function.  Is this really true?<\/p>\n<p>Ideally, although it\u2019s a challenge, we would perform unbiased analyses of path\/trajectory ensembles to discover important coordinates.  Significant work has already been done on automated discovery of coordinates, and I think such methodologies should play an increasingly important role in the future.<\/p>\n<p><em>Does a PMF predict kinetics?<\/em><\/p>\n<p>We\u2019ve all done it \u2013 looked at a PMF (free energy landscape), estimated a barrier height, and tried to guess a rate constant for a process.  If we haven\u2019t tried to guess an absolute rate, we\u2019ve at least said to ourselves something like, \u201cWell, that barrier is higher than the other, so one process is slower than the other.\u201d<\/p>\n<p>But in principle, the PMF may not yield any information about kinetics at all!  Look again at the two-dimensional landscapes above.  Would you trust a one-dimensional PMF of any of these to give reliable rates?  So why trust a projection from 1,000+ dimensions to one or two, as is usually done for biomolecular systems?<\/p>\n<p>Again, I think part of the reason we over-interpret PMFs is because free energy landscapes speak to us like stories.  We can\u2019t resist the narrative that arises in the stat-mech part of our brains, even when we know better.<\/p>\n<p>To drive home the point further, note that it\u2019s even possible to construct a PMF that is exactly constant and yet does not exhibit diffusive dynamics.  One way would be with a potential consisting of `anti-parallel\u2019 valleys as in landscape (c) above, where the well depths and widths were tuned at each x value to exactly have equal probability when integrated over y.<\/p>\n<p>Another constant-in-x PMF based on a non-trivial potential energy in x and y results from the following potential energy, which is a double-well potential in x modulated by a harmonic y component with x-varying width:<\/p>\n<p><a href=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2017\/05\/bias_equation.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2017\/05\/bias_equation-300x52.png\" alt=\"\" class=\"aligncenter size-medium wp-image-170\" width=\"300\" height=\"52\" srcset=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2017\/05\/bias_equation-300x52.png 300w, https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2017\/05\/bias_equation.png 544w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p><a href=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2017\/05\/bias_equation2.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2017\/05\/bias_equation2-300x42.png\" alt=\"\" class=\"aligncenter size-medium wp-image-169\" width=\"300\" height=\"42\" srcset=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2017\/05\/bias_equation2-300x42.png 300w, https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2017\/05\/bias_equation2.png 456w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p><a href=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2017\/05\/bias_equation3.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2017\/05\/bias_equation3-300x26.png\" alt=\"\" class=\"aligncenter size-medium wp-image-168\" width=\"300\" height=\"26\" srcset=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2017\/05\/bias_equation3-300x26.png 300w, https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2017\/05\/bias_equation3.png 535w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>You can check this yields a constant PMF by integrating over y and multiplying the result by the Boltzmann factor of the energy minimum at any x value.  Below is a sample trajectory of x values (y not shown) simulated with overdamped Langevin (a.k.a. Brownian) dynamics.<\/p>\n<p><a href=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2017\/05\/bias_Langevin.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2017\/05\/bias_Langevin.png\" alt=\"\" class=\"aligncenter size-full wp-image-172\" width=\"1014\" height=\"648\" srcset=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2017\/05\/bias_Langevin.png 1014w, https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2017\/05\/bias_Langevin-300x192.png 300w, https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2017\/05\/bias_Langevin-768x491.png 768w, https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2017\/05\/bias_Langevin-788x504.png 788w\" sizes=\"auto, (max-width: 1014px) 100vw, 1014px\" \/><\/a><\/p>\n<p>Clearly the behavior is not simple diffusion in x, even though the PMF(x) is constant.  (Histogramming a lot of these trajectories into x bins numerically confirms the PMF is flat.)  And that\u2019s no surprise, from a theory standpoint.  The PMF generally does not govern dynamics, except in rare cases where an ideal reaction coordinate has been used \u2026 and suitable effective dynamics have been specifically derived for the selected coordinate.<\/p>\n<p><em>Are biomolecular PMFs well sampled?<\/em><\/p>\n<p>Very sophisticated simulation and analysis techniques are used to calculate PMFs.  But should we trust the results?  To be clear, I\u2019m not questioning the rigor behind the methods, but rather the likelihood that the PMFs are well-sampled.<\/p>\n<p>Consider the popular analysis approach WHAM (weighted histogram analysis method).  WHAM seeks to provide the best possible PMF given the data available from the different simulation windows.  This is a very different goal from assessing whether sufficient sampling has been performed.  For better and for worse, WHAM almost always provides a smooth estimate of the free energy profile.  Our minds tend to confuse smoothness with reliable, well-sampled data.  But the two things are completely independent.<\/p>\n<p>Let\u2019s be more concrete.  How many nsec are typically used in a single window of a WHAM protein calculation?  Often the answer seems to be 10 nsec or less.  Is that really enough time to sample even a constrained protein process?  When careful studies are done on small peptides, they suggest 10s of nsec are needed for good sampling of these tiny systems!<\/p>\n<p>Now consider a thought experiment.  If indeed every window of a WHAM-like PMF calculation is well sampled, then we have an equilibrium ensemble for that window.  And if the PMF is accurate, we also know the relative free energy of that coordinate value.  We therefore can generate an overall equilibrium ensemble by combining all the window ensembles, weighting each by the Boltzmann factor of the window-specific PMF.  Would you trust this equilibrium ensemble?  Further, an equilibrium ensemble can be projected onto any coordinate to generate a new PMF.  Would you trust that new PMF?<\/p>\n<p><em>Pessimism<\/em><\/p>\n<p>To sum up: Be careful!  I am dubious that protein PMFs are well-sampled.  But even when a PMF is exactly accurate, it reveals a landscape based on a subjectively chosen coordinate set.  Physical scientists are adept at building a story from a landscape, but any given landscape is likely to be deceptive both in terms of mechanism and kinetics.  Perhaps it\u2019s time for us to move on from over-reliance on the PMF.<\/p>\n<p><em>Acknowledgement<\/em>.<\/p>\n<p>I very much appreciate comments on a draft given by Alan Grossfield.<\/p>\n<p><strong>Further reading<\/strong><\/p>\n<p>Dellago, C.; Bolhuis, P. G.; Csajka, F. S. &amp; Chandler, D., Transition path sampling and the calculation of rate constants, <a href=\"http:\/\/aip.scitation.org\/doi\/abs\/10.1063\/1.475562\">J. Chem. Phys., 1998, 108, 1964-1977<\/a><\/p>\n<p>Grossfield, A. &amp; Zuckerman, D. M., Quantifying uncertainty and sampling quality in biomolecular simulations, Annu Rep Comput Chem, 2009, 5, 23-48.<br \/>\nKumar, S.; Rosenberg, J. M.; Bouzida, D.; Swendsen, R. H. &amp; Kollman, P. A., Multidimensional free-energy calculations using the weighted histogram analysis method, <a href=\"http:\/\/www.sciencedirect.com\/science\/article\/pii\/S1574140009005027\">J. Comput. Chem., 1995, 16, 1339-135<\/a><\/p>\n<p>Lyman, E. &amp; Zuckerman, D. M., On the Structural Convergence of Biomolecular Simulations by Determination of the Effective Sample Size, <a href=\"http:\/\/pubs.acs.org\/doi\/abs\/10.1021\/jp073061t\">J. Phys. Chem. B, 2007, 111, 12876-12882<\/a><\/p>\n<p>McGibbon, Robert T.,  Brooke E. Husic, and Vijay S. Pande, Identification of simple reaction coordinates from complex dynamics,  <a href=\"http:\/\/aip.scitation.org\/doi\/abs\/10.1063\/1.4974306\">The Journal of Chemical Physics 146, 044109 (2017)<\/a><\/p>\n<p>Zuckerman, D.M., <a href=\"https:\/\/www.crcpress.com\/Statistical-Physics-of-Biomolecules-An-Introduction\/Zuckerman\/p\/book\/9781420073782\"><em>Statistical Physics of Biomolecules: An Introduction<\/em><\/a>, CRC Press, 2010.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Such a beautiful thing, the PMF. The potential of mean force is a \u2018free energy landscape\u2019 \u2013 the energy-like-function whose Boltzmann factor exp[ -PMF(x) \/ kT ] gives the relative probability* for any coordinate (or coordinate set) x by integrating out (averaging over) all other coordinates. For example, x could be the angle between two [&hellip;]<\/p>\n","protected":false},"author":5,"featured_media":161,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[11,14,8],"tags":[],"class_list":["post-160","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-general-biophysics","category-path-sampling","category-trajectory-physics"],"_links":{"self":[{"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=\/wp\/v2\/posts\/160","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=160"}],"version-history":[{"count":10,"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=\/wp\/v2\/posts\/160\/revisions"}],"predecessor-version":[{"id":475,"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=\/wp\/v2\/posts\/160\/revisions\/475"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=\/wp\/v2\/media\/161"}],"wp:attachment":[{"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=160"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=160"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=160"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}