Counting is not enough: A weakness of MSMs inherited by RiteWeight

Can’t live with them, can’t live without them. That just about sums up my relationship with Markov state models aka MSMs. They have known limitations, but they sure can be useful (sometimes) and also misleading (sometimes).

I wrote a few years ago about some of the limitations of MSMs, and the aim of this post is to expand on one point from that post, namely, the tendency of MSMs to simply recapitulate the distribution of the raw counts data when one or a small number of trajectories are used to train the model. That is, as first pointed out by Scalco and Caflisch, when analyzing a trajectory that revisits the same basins, you won’t learn anything from the MSM because it essentially spits back the quantities you would get from simple analysis, e.g., by counting populations.
Continue reading

Delay Embedding: Learning hidden coordinates in cell biology

Trajectories are golden information in the study of dynamical systems like cells.  If we can follow individual cells in time, then we can learn what events precede others and in principle infer all kinds of wonderful mechanistic information.

The problem is that measurements that reveal high-dimensional omics information for single cells require killing the cells, whereas observing cells in real time via video microscopy doesn’t reveal too much molecular information.  Although modern live-cell microscopy techniques enable labeling specific cellular proteins to reveal their spatial and temporal intra-cellular behavior, this is limited to a literal handful of proteins (at once).  And there’s always the worry that the labels may alter the behavior of cells.

So the question is, How can we learn detailed information for single cells as they change and move in time?  In other words, can we learn about (apparently) hidden information?

Continue reading

Machine Learning for Outsiders 2 – Cross validation and learning curves

Having covered the very basics of machine learning (ML) in a previous post, I want to introduce you to an essential conceptual framework embodied in a simple graph called the “learning curve” or “training curve.” This graph can be made to aid evaluation of any ML project, and is invaluable for non-expert evaluation of a project. The learning curve is closely related to cross-validation, which we will also discuss.
Continue reading

Machine Learning for Outsiders 1 – Very basics

I want to introduce machine learning (ML) to people outside the field, both non-mathematical scientists totally new to machine learning and quantitative folks who are novices. My qualifications for this are that I’m an outsider to ML myself, maybe an “advanced beginner” – with several years of experience. As I have learned ML, I try to keep an eye out for what’s important and what isn’t.
Continue reading

What they don’t know about us – the daily thrill of science

It was a perfect Fall Sunday on the trail years ago, and I remember very clearly bumping into husband-and-wife colleagues. They were much older, on the verge of formal retirement. Chatting about the beautiful day, they told me, “After this we’re heading to the university. We decided to work every day. It keeps us happier.” For me, as a young father just treading water in life with no free time at all, this sounded at least a little bit strange.
Continue reading

Cell Biophysics Primer for Molecular Biophysicists

So you know about molecular biophysics and want to think about cells. How should you get started? Let’s take the first few steps.

We’ll start with some concepts that should be familiar: coordinates and equilibrium vs. nonequilibrium behavior. We can frame our discussion by comparing something familiar, a protein, to a cell.
Continue reading

Maximum Likelihood vs. Bayesian estimation of uncertainty

When we want to estimate parameters from data (e.g., from binding, kinetics, or electrophysiology experiments), there are two tasks: (i) estimate the most likely values, and (ii) equally importantly, estimate the uncertainty in those values. After all, if the uncertainty is huge, it’s hard to say we really know the parameters. We also need to choose the model in the first place, which is an extremely important task, but that is beyond the scope of this discussion.
Continue reading

Confidence and Anxiety – Doing Science While Human

I think self-confidence is an essential ingredient of doing well in science, but it’s not discussed enough. I have thought about confidence a lot because I don’t always have it. I’m over 50 years old and have published plenty of papers, but often enough I doubt myself. I have this intermittent, but deep-seated worry that maybe my science isn’t so great. Sometimes I get pretty nervous before giving a talk (which I do my best to hide) even though I enjoy lecturing. This long-lived impostor syndrome is frustrating, but it’s part of who I am.
Continue reading

Caution Regarding Markov State Models

Markov state models (MSMs) are very popular and have a rigorous basis in principle, but applying them in practice must be done with great caution. There is no guarantee the results will be reliable for complex systems of typical interest unless there is an enormous amount of data and significant expertise and validation goes into the MSM building. And even if those conditions are in place, certain observables likely will be biased.
Continue reading