Markov state models (MSMs) are very popular and have a rigorous basis in principle, but applying them in practice must be done with great caution. There is no guarantee the results will be reliable for complex systems of typical interest unless there is an enormous amount of data and significant expertise and validation goes into the MSM building. And even if those conditions are in place, certain observables likely will be biased.

I do not state these points lightly but based on a set published papers, including one that I participated in. The set of papers referenced here are special in that they compared MSM predictions to extremely long ordinary simulations, i.e., essentially to exact results. These are the papers to pay attention to.

To be clear, I am not critiquing the theory underpinning MSMs. The issue is what happens in practice. Indeed, the MSM community is well aware that MSMs are approximations based on coarse-graining in both space and time. These approximations can lead to accurate estimation of some observables under the right conditions and with enough data. However, in typical molecular dynamics situations, we are limited in our ability to create ideal data that underpin the ideal mathematical description.

Vitalis and coworkers published a paper in 2019 examining the effects of initial state bias on MSMs. They examined the effects of building MSMs from shorter and longer trajectories started from single or multiple start points. Most importantly, they find that MSMs can yield significantly biased estimates for equilibrium properties – e.g., the probability distribution – with finite amounts of data. This bias is defined by comparison to distributions determined from extensive sampling. Surprisingly, probability distribution estimates can be worse when MSMs are constructed that enforce detailed balance. In our own unpublished work, we have also seen biased equilibrium distributions from MSMs.

In a collaboration with John Chodera, Frank Noe and their groups, we found that MSMs built from the full DE Shaw folding trajectories had successes and failures. At sufficiently long lag times (~100ns) the MSMs could correctly characterize the kinetics of folding and unfolding. However, the MSMs were demonstrably biased for mechanistic observables (pathways) as compared to the long MD trajectories: most notably, long lag times cannot resolve the mechanistic steps of interest; however, the bias held even at short lag times. All these results stemmed from extremely large data sets: tens of (un)folding first-passage times per system. Thus, any problems were most likely due to the models and not the sampling. On a related mechanistic point, in a preprint with other collaborators, we showed that MSM estimation of committor values can be biased compared to an exactly solvable reference model.

Most recently, Roux and coworkers published a ‘critical perspective’ on MSMs. In studying a coarse-grained protein association system, they found that several kinds of MSMs provided reliable equilibrium and kinetic observables when a large amount of data was used. However, they indicate that the data requirement is extremely high – many times the binding residence time. They report that smaller data sets based on trajectories comparable to a single residence time (a timescale usually inaccessible in atomistic studies) led to notably biased MSM observables. Further, they note that their system could be adequately described by a single center-of-mass distance, which suggests to me that the coarse-grained model did not retain the complexity we normally face in atomistic systems.

My suggestion: read these papers yourself and decide. But I would say these studies stand out for their comparison to reliable reference data. (If you know of other such studies, let me know.)

Are MSMs a lost cause? I wouldn’t say that. I am optimistic that approaches incorporating history information, i.e., more than just the two-time-point correlations of a traditional MSM, will be a significant improvement. These include approaches using a last-interface label, a last-macrostate label, and a memory kernel. Note also approaches based on observable operator models, first-passage times, Markov renewal processes, and the dynamical Galerkin approximation.

Stay tuned. In the mean time, tread carefully.