{"id":115,"date":"2016-03-07T15:11:09","date_gmt":"2016-03-07T15:11:09","guid":{"rendered":"http:\/\/statisticalbiophysicsblog.org\/?p=115"},"modified":"2016-06-23T18:05:36","modified_gmt":"2016-06-23T18:05:36","slug":"so-you-want-to-do-some-path-sampling","status":"publish","type":"post","link":"https:\/\/statisticalbiophysicsblog.org\/?p=115","title":{"rendered":"So you want to do some path sampling\u2026"},"content":{"rendered":"<h3><em><strong>Basic strategies, timescales, and limitations<\/strong><\/em><\/h3>\n<p>Key biomolecular events \u2013 such as conformational changes, folding, and binding \u2013 that are challenging to study using straightforward simulation may be amenable to study using \u201cpath sampling\u201d methods.\u00a0 But there are a few things you should think about before getting started on path sampling.\u00a0 <em>There are fairly generic features and limitations<\/em> that govern all the path sampling methods I\u2019m aware of.<\/p>\n<p><em>Path sampling<\/em> refers to a large family of methods that, rather than having the goal of generating an ensemble of system configurations, attempt to generate an ensemble of dynamical <em>trajectories<\/em>.\u00a0 Here we are talking about trajectory ensembles that are precisely defined in statistical mechanics.\u00a0 As we have noted in <a href=\"https:\/\/statisticalbiophysicsblog.org\/?p=103\">another post<\/a>, there are different kinds of trajectory ensembles \u2013 most importantly, the equilibrium ensemble, non-equilibrium steady states, and the initialized ensemble which will relax to steady state.\u00a0 Typically, one wants to generate trajectories exhibiting events of interest \u2013 e.g., binding, folding, conformational change.<\/p>\n<p><!--more--><\/p>\n<p>A trajectory can be considered a list of configurations (possibly with velocities) for all system coordinates recorded with a fixed time increment.\u00a0 Note that there is indeed a path ensemble even in one dimension: because displacements\/velocities will vary along a trajectory, there are an infinite number of trajectories connecting any two points.\u00a0 In principle, trajectory ensemble of transition events could be obtained by collecting transitions of interest from a very long trajectory \u2013 for example the red segments below, possibly with their preceding blue segments, which together make up the first-passage times (FPTs) for the system to transition from low to high x values.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-119  aligncenter\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2016\/03\/path_smp1-e1457363230599.jpg\" alt=\"path_smp1\" width=\"385\" height=\"287\" srcset=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2016\/03\/path_smp1-e1457363230599.jpg 813w, https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2016\/03\/path_smp1-e1457363230599-300x223.jpg 300w\" sizes=\"auto, (max-width: 385px) 100vw, 385px\" \/><\/p>\n<p>Path sampling methods can focus computing resource on a subset of trajectories (e.g., the red transition events, above) and have been developed using a variety of strategies.\u00a0 We\u2019ll mention three that work with continuous trajectories rather than disconnected segments.\u00a0 (1) Lawrence Pratt suggested that, because the probability for a given trajectory to occur could be calculated, one could perform Metropolis Monte Carlo in path (trajectory) space; this was the \u201ctransition path sampling\u201d idea later taken up by Chandler and co-workers.\u00a0 The basic idea, however, has its roots in path-integral Monte Carlo.\u00a0 (2) Huber and Kim suggested that an ensemble of trajectories could be orchestrated using replication and pruning steps in a way that could encourage sampling of rare processes.\u00a0 This \u201cweighted ensemble\u201d strategy was really a re-discovery of the \u201csplitting and Russian roulette\u201d strategy published by Los Alamos theorists Herman Kahn &amp; coworkers in the 1950s.\u00a0 (3) \u201cDynamic importance sampling\u201d was proposed by Woolf, based on prior work by Ottinger, in which trajectories could be biased toward rare events of interest, with reweighting performed after the fact to ensure conformance with statistical principles.<\/p>\n<p>The preceding are three basic approaches that generate ensembles of <em>continuous<\/em> trajectories.\u00a0 It is fair to note that many sophisticated variants and improvements on the basic strategies have been developed, in addition to many approaches using collections of discontinuous segments (see review by Elber noted below); these are quite valuable but a distraction from the main points of this post.<\/p>\n<p><em>Generic limitations of path sampling<\/em><\/p>\n<p>All the continuous-trajectory approaches share two fundamental limitations.\u00a0 One arises from intrinsic system-specific transition timescales, and the other is a consequence of intrinsic sampling limitations.<\/p>\n<p>To understand the limitations, let\u2019s assume <em>our goal is to sample 100 statistically independent transition events<\/em>.\u00a0 Although every individual trajectory is time-correlated because configurations are generated sequentially, trajectories can be statistically independent \u2013 for example, if you started 100 independent simulations in an initial state and simply waited for 100 transitions.\u00a0 Of course, that strategy generally is prohibitive and motivates path sampling in the first place, but truly independent simulations would be a gold standard for independent transition-event trajectories.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-116  alignleft\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2016\/03\/path_smp2-e1457363280345.jpg\" alt=\"path_smp2\" width=\"257\" height=\"202\" \/><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-120 \" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2016\/03\/path_smp3-e1457363321818.jpg\" alt=\"path_smp3\" width=\"292\" height=\"202\" \/><strong>\u00a0 \u00a0 \u00a0<\/strong><\/p>\n<p>&nbsp;<\/p>\n<p><em>System-specific timescales<\/em><\/p>\n<p>Generally speaking, there are two kinds of transition events.\u00a0 As shown in our original long one-dimensional trajectory and immediately above in magnified view, in a simple \u201cactivated process\u201d characterized by a dominant energy barrier, the duration of a transition <em>t<sub>b<\/sub><\/em> will be much shorter than the waiting time (a.k.a. dwell time) in the initial state.\u00a0 The sum of the average dwell and event times is called the mean first-passage time (MFPT).\u00a0 Although <em>t<sub>b<\/sub><\/em> may be short and much less than the MFPT, it is still finite.\u00a0 A more challenging scenario is depicted below in the figure with many intermediate states: each intermediate can lead to a separate, possibly lengthy dwell \u2013 and don\u2019t forget that trajectories can reverse many times leading to more dwells than there are intermediates.\u00a0 In such a case, the transition-event duration <em>t<sub>b<\/sub><\/em> may be similar to the overall MFPT.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-117  aligncenter\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2016\/03\/path_smp4-e1457363397835.jpg\" alt=\"path_smp4\" width=\"343\" height=\"248\" srcset=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2016\/03\/path_smp4-e1457363397835.jpg 680w, https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2016\/03\/path_smp4-e1457363397835-300x216.jpg 300w\" sizes=\"auto, (max-width: 343px) 100vw, 343px\" \/><\/p>\n<p>With this understanding, let\u2019s get back to our goal of simulating 100 independent transitions.\u00a0 The minimum cost for doing this with fully continuous trajectories is 100 <em>t<sub>b<\/sub><\/em>.\u00a0 If <em>t<sub>b<\/sub><\/em> ~ 10 ns, then at least 1 ms is needed for our trajectory ensemble.\u00a0 And there is no guarantee that <em>t<sub>b<\/sub><\/em> will be short (compared to timescales that can easily be simulated).\u00a0 So start worrying now \u2026 and things only get worse.<\/p>\n<p><em>\u00a0<\/em><\/p>\n<p><em>Intrinsic limitations of sampling<\/em><\/p>\n<p>Path sampling is desirable when system timescales (MFPTs for processes of interest) are too long to simulate.\u00a0 In other words, by definition of the problem, we cannot afford 100*MFPT.\u00a0 Algorithms such as the ones sketched above have the potential to limit computational effort to the transition events themselves.\u00a0 But generating <em>independent<\/em> transition-event trajectories is not a trivial matter!\u00a0 Although starting separate \u201cbrute force\u201d (i.e., standard) trajectories is simple to do, if one wants computational effort to be focused on transition events, there are additional costs.<\/p>\n<p>Let\u2019s look in turn at each of the three path-sampling strategies described above.<\/p>\n<p>In <em>transition path sampling,<\/em> Metropolis Monte Carlo in path space requires perturbing the preceding trajectory in a sample to create a trial trajectory (that is correlated and, in fact, typically partially coincident with the prior trajectory) which then is accepted or rejected according to a suitable Metropolis criterion.\u00a0 The sequence of trajectories is significantly correlated, and indeed rejections amount to having the same trajectory twice in the ensemble which is generated.\u00a0 In other words, there is a kind of \u201ccorrelation number\u201d <em>n<\/em><sub>corr<\/sub> (akin to a Monte Carlo correlation \u201ctime\u201d) measuring the average number of trajectories which must be sampled before a new statistically independent trajectory is sampled.\u00a0 There is no reason why<em> n<\/em><sub>corr<\/sub> should be small and indeed, just as in a rough energy landscape, one can imagine that the effective landscape for paths is highly corrugated and requires significant sampling \u201ctime.\u201d\u00a0 The bottom line is that our 100 independent trajectories will cost a total of 100*<em>n<\/em><sub>corr<\/sub>*<em>t<sub>b<\/sub><\/em> simulation time: one hopes that this will be less than the \u201cbrute force\u201d cost of 100*MFPT!\u00a0 This sounds bad, but other approaches share similar limitations.<\/p>\n<p>Consider the <em>weighted ensemble<\/em> strategy.\u00a0 Trajectories in the ensemble are run independently but occasionally pruned or replicated \u2013 and both operations intrinsically reduce information content and hence increase correlations.\u00a0 When a trajectory is pruned, the prior computing effort which generated it now gets wasted (at least partially).\u00a0 When a trajectory is replicated, say midway through the simulation, then the \u201cdaughter\u201d or replica trajectories actually were the very same trajectory for half of their existence \u2013 and clearly correlated.\u00a0 So once again, there are significant correlations and we can again describe it with an effective correlation number <em>n<\/em><sub>corr<\/sub>.\u00a0 Whether these correlations are stronger or weaker in the two methods is not our concern here (and indeed would depend on the system and specifics of the implementation of the path-sampling algorithm).<\/p>\n<p>The <em>dynamic importance sampling<\/em> strategy is strictly based on independent trajectories and so does not suffer from correlations \u2026 but it has its own challenges.\u00a0 Specifically, trajectories are biased and do not evolve according the correct physical dynamics.\u00a0 Although a probabilistic description of stochastic trajectories enables one to calculate a weight for each of the biased trajectories and thus correct for the bias, these weights degrade the statistical quality of the resulting trajectory ensemble.\u00a0 Specifically, the non-uniformity of weights guarantees that only a fraction of the trajectories (say, 1\/<em>n<sub>w<\/sub><\/em>, with <em>n<sub>w<\/sub><\/em> &gt; 1) will contribute significantly to calculations of any observable, such as a rate.\u00a0 The size of <em>n<sub>w<\/sub><\/em> will depend on system and implementation specifics, but it\u2019s clear the approach qualitatively suffers from sampling limitations analogous to the two other strategies we just discussed.<\/p>\n<p>Bottom line: The cost per continuous transition trajectory is <em>n<\/em>*<em>t<sub>b<\/sub><\/em>, where <em>n<\/em> &gt;&gt; 1 is an integer quantifying the efficiency of the path sampling method.\u00a0 Of course, experts in each method strive to reduce <em>n<\/em> but there are no guarantees for any challenging system.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"  wp-image-118 aligncenter\" src=\"https:\/\/statisticalbiophysicsblog.org\/wp-content\/uploads\/2016\/03\/path_smp5.jpg\" alt=\"path_smp5\" width=\"356\" height=\"499\" \/><\/p>\n<p><em>Meeting the challenge of multiple intermediates<\/em><\/p>\n<p>The issue of multiple intermediates (meta-stable on- or off-pathway states) is worth some additional discussion in the context of path sampling.\u00a0 Recall that in such a case, <em>t<sub>b<\/sub><\/em> ~ MFPT itself may seem prohibitive \u2013 at least, if we insist on having fully continuous trajectories.<\/p>\n<p>The good news is that all three strategies described above can side-step the problem of intermediate states (as can a number of other approaches based on trajectory segments).\u00a0 One example is the non-Markovian post-analysis suggested by Suarez et al., but this post will not go into the details.\u00a0 Qualitatively, it turns out that the limiting timescale for path sampling is not <em>t<sub>b<\/sub><\/em> but a quantity we can call \u00a0which represents the sum of all the event durations for transitions among the intermediates \u2013 <em>excluding<\/em> the intermediate dwell times.\u00a0 This doesn\u2019t solve the problem of trajectory correlations or weights, but at least offers some hope for obtaining useful results.<\/p>\n<p>The papers noted below are only a very small subset of the path sampling literature.<\/p>\n<p><strong>Further reading<\/strong><\/p>\n<p>Elber, R. \u201cPerspective: Computer simulations of long time dynamics,\u201d The Journal of Chemical Physics, AIP Publishing, 2016, 144, 060901<\/p>\n<p>Huber, G. A. &amp; Kim, S. \u201cWeighted-ensemble Brownian dynamics simulations for protein association reactions,\u201d Biophys. J., 1996, 70, 97-110<\/p>\n<p>Pratt, L. R. \u201cA statistical method for identifying transition states in high dimensional problems,\u201d J. Chem. Phys., 1986, 85, 5045-5048<\/p>\n<p>Su\u00e1rez, E.; Lettieri, S.; Zwier, M. C.; Stringer, C. A.; Subramanian, S. R.; Chong, L. T. &amp; Zuckerman, D. M. \u201cSimultaneous Computation of Dynamical and Equilibrium Information Using a Weighted Ensemble of Trajectories,\u201d J Chem Theory Comput, 2014, 10, 2658-266<\/p>\n<p>Woolf, T. B. \u201cPath corrected functionals of stochastic trajectories: towards relative free energy and reaction coordinate calculations.\u201d Chem. Phys. Lett., 1998, 289, 433-441<\/p>\n<p>Zuckerman, D. M. &amp; Woolf, T. B. \u201cDynamic reaction paths and rates through importance-sampled stochastic dynamics.\u201d J. Chem. Phys., 1999, 111, 9475-9484<\/p>\n<p>Zuckerman, D. M. &amp; Woolf, T. B. \u201cTransition events in butane simulations: similarities across models.\u201d J. Chem. Phys., 2002, 116, 2586-2591<\/p>\n<p>Zwier, M. C. &amp; Chong, L. T. \u201cReaching biological timescales with all-atom molecular dynamics simulations,\u201d Curr Opin Pharmacol, Department of Chemistry, University of Pittsburgh, Pittsburgh, PA 15260, USA., 2010, 10, 745-752<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Basic strategies, timescales, and limitations Key biomolecular events \u2013 such as conformational changes, folding, and binding \u2013 that are challenging to study using straightforward simulation may be amenable to study using \u201cpath sampling\u201d methods.\u00a0 But there are a few things you should think about before getting started on path sampling.\u00a0 There are fairly generic features [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9,14,8],"tags":[],"class_list":["post-115","post","type-post","status-publish","format-standard","hentry","category-first-passage-times","category-path-sampling","category-trajectory-physics"],"_links":{"self":[{"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=\/wp\/v2\/posts\/115","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=115"}],"version-history":[{"count":7,"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=\/wp\/v2\/posts\/115\/revisions"}],"predecessor-version":[{"id":137,"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=\/wp\/v2\/posts\/115\/revisions\/137"}],"wp:attachment":[{"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=115"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=115"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/statisticalbiophysicsblog.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=115"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}