It was at breakfast during a recent conference that Prof. X leaned toward me and quietly said, “We tried using weighted ensemble but it didn’t work.” I got the sense he was trying not to broadcast this to other conference attendees, as a courtesy.
I want to talk again today about the essential topic of analyzing statistical uncertainty – i.e., making error bars – but I want to frame the discussion in terms of a larger theme: our community’s often insufficiently critical adoption of elegant and sophisticated ideas. I discussed this issue a bit previously in the context of PMF calculations. To save you the trouble of reading on, the technical problem to be addressed is statistical uncertainty for high-variance data with small(ish) sample sizes.
Such a beautiful thing, the PMF. The potential of mean force is a ‘free energy landscape’ – the energy-like-function whose Boltzmann factor exp[ -PMF(x) / kT ] gives the relative probability* for any coordinate (or coordinate set) x by integrating out (averaging over) all other coordinates. For example, x could be the angle between two domains in a protein or the distance of a ligand from a binding site.
The PMF’s basis in statistical mechanics is clear. When visualized, its basins and barriers cry out “Mechanism!’’ and kinetics are often inferred from the heights of these features.
Yet aside from the probability part of the preceding paragraph, the rest is largely speculative and subjective … and that’s assuming the PMF is well-sampled, which I highly doubt in most biomolecular cases of interest.
Basic strategies, timescales, and limitations
Key biomolecular events – such as conformational changes, folding, and binding – that are challenging to study using straightforward simulation may be amenable to study using “path sampling” methods. But there are a few things you should think about before getting started on path sampling. There are fairly generic features and limitations that govern all the path sampling methods I’m aware of.
Path sampling refers to a large family of methods that, rather than having the goal of generating an ensemble of system configurations, attempt to generate an ensemble of dynamical trajectories. Here we are talking about trajectory ensembles that are precisely defined in statistical mechanics. As we have noted in another post, there are different kinds of trajectory ensembles – most importantly, the equilibrium ensemble, non-equilibrium steady states, and the initialized ensemble which will relax to steady state. Typically, one wants to generate trajectories exhibiting events of interest – e.g., binding, folding, conformational change.