Ten days after I made this post, the New York Times published a related piece suggesting that one gender was more likely to up-sell their work with self-congratulatory description. (Guess which one?) This made me think more about the gender issue in our own field. Have something to say on this issue? Write me: zuckermd@ohsu.edu. I hope to do a post on this in the future, ideally relating the experiences of several individuals, anonymously if they wish. I will be at the Biophysics meeting in San Diego Monday and Tuesday if anyone wants to talk about it.

#### Page 2 of 4

It’s my view that we must become *statistical *biophysicists. Why statistical? Because microscopic behaviors must be repeated zillions of times to create macroscopic effects. Can you help me shift the thinking in our community? See below for a collaboration opportunity.

How many times has our community solved the sampling problem? I think it’s a fair question. You know I’m talking about claims rather than actual solutions. And many if not most of those claims are made in the abstracts of papers, even when the data paints a more limited story. I think our abstracts are the problem.

It was at breakfast during a recent conference that Prof. X leaned toward me and quietly said, “We tried using weighted ensemble but it didn’t work.” I got the sense he was trying not to broadcast this to other conference attendees, as a courtesy.

This is yet another one of those things where, after reading this, you’re supposed to say, “Oh, that’s obvious.” And I admit it is kind of obvious … after you think about it for a few minutes! So spend those few minutes now to learn one more cool thing about non-equilibrium trajectory physics.

In non-equilibrium calculations of transition processes, we often wish to estimate a rate constant, which can be quantified as the inverse of the mean first-passage time (MFPT). That is, one way to define a rate constant is just reciprocal of the average time it takes for a transition. The Hill relation tells us that probability flow per second into a target state of interest (state “B”, defined by us) is *exactly* the inverse MFPT … so long as we measure that flow in the A-to-B steady state based on initializing trajectories outside state B according to some distribution (state “A”, defined by us) and we remove trajectories reaching state B and re-initialize them in A according to our chosen distribution.

Here’s a true story from a number of years ago. A postdoc in the group comes to me in frustration. He has built a cool “semi-atomistic” coarse-grained protein model that has generated disappointing results. An alpha helix that’s clearly resolved in the X-ray structure of his protein completely unravels. Disappointment. But playing the optimistic supervisor, I ask, “Are we sure you’re wrong? Could that helix be marginally stable?” Further digging revealed an isoform of the protein where the helix in question was not resolvable via X-ray. Relief! I was pretty pleased with myself, I must say.

But now I’m disappointed that I was pleased.

Some quick guidance for analyzing molecular dynamics (MD) or Markov-chain Monte Carlo (MC) data in hard-to-sample systems – e.g., biomolecules. I can summarize the advice this way: __Ask not how to compute error bars. Ask first whether error bars are even appropriate.__ A meaningless error bar is more dangerous (to you and the community) than no error bar at all. This guidance is essentially abstracted from our recent Best Practices paper, and I hope it will set in context some of the theory discussed in an earlier post.

I realized that I owe you something. In a prior post, I invoked some Bayesian ideas to contrast with boostrapping analysis of high-variance data. (More precisely, it was high *log-*variance data for which there was a problem, as described in our preprint.) But the Bayesian discussion in my earlier post was pretty quick. Although there are a number of good, brief introductions to Bayesian statistics, many get quite technical.

Here, I’d like to introduce Bayesian thinking in absolutely the simplest way possible. We want to understand the point of it, and get a better grip on those mysterious priors.

I don’t about you but I grew up on *equilibrium* statistical mechanics. The beauty of a partition function, an ensemble, the ability to understand thermodynamic principles from microscopic rules. I love that stuff.

But what if we want to understand biology? Is a partition function really the most important object? This Fall, I’m going to lecture on biophysics for an assortment of biology and biomedical engineering students for just a few weeks; and for the first time in my teaching career, I’m planning to omit a partition-function based description of molecular behavior. I’m just not convinced it’s important enough for an abbreviated set of lectures.

I want to talk again today about the essential topic of analyzing statistical uncertainty – i.e., making error bars – but I want to frame the discussion in terms of a larger theme: our community’s often insufficiently critical adoption of elegant and sophisticated ideas. I discussed this issue a bit previously in the context of PMF calculations. To save you the trouble of reading on, the technical problem to be addressed is statistical uncertainty for high-variance data with small(ish) sample sizes.

© Daniel M. Zuckerman, 2015 - 2021

Theme by Anders Noren — Up ↑