Do Foundation Models Keep the Brain's Nonlinear Dynamics?

Here is a question I have been circling for the past while, and I think it is the right one to ask about brain foundation models before we ask any other.

A model that reconstructs EEG almost perfectly, window by window, might still be throwing away the thing that makes the signal a brain signal.

To see why that is possible, you have to take seriously what EEG actually is, beyond a wiggly line.

EEG is not a spectrum with noise on top

A resting EEG trace looks, to the eye, like filtered noise. It is not. It carries structure that lives specifically in how the signal relates to itself across time.

The clearest example is long-range temporal correlation. If you take the amplitude fluctuations of an oscillation and ask how correlated a fluctuation now is with a fluctuation later, the answer does not fall to zero after a fixed lag the way it would for a simple random process. It decays as a power law, slowly, over seconds to minutes. You measure it with detrended fluctuation analysis and summarize it with a scaling exponent, a Hurst-like number. A signal with long-range correlations is one whose past reaches a long way into its future.

Alongside it sits the aperiodic, or 1/f, part of the spectrum: power that falls off with frequency along a straight line on a log-log plot, with a slope that is itself a meaningful quantity, not a background to be subtracted away.

These are not curiosities. Long-range correlations, the 1/f slope, the branching ratio, the distribution of neuronal avalanches, all point to the same underlying idea: a brain that sits near criticality, close to a phase transition between order and disorder, where it is maximally sensitive, maximally integrative, and scale-free. That regime is not a metaphor. It is measurable, and it moves. The scaling changes across development, across sleep and arousal, across disease. Much of my own work has been about reading those dynamical markers, because they carry clinical information that band power alone does not.

So the dynamics are part of the signal’s identity. Now ask what a foundation model is trained to do with them.

Read the loss function literally

A masked-reconstruction model is trained to minimize something close to the squared error between the true signal and its reconstruction of the masked parts. A contrastive model is trained to make two augmented views of the same segment land near each other in representation space and different segments land apart.

Read those objectives literally and notice what is in them and what is not.

Reconstruction error rewards getting the amplitude and the local spectral shape right, sample by sample, window by window. It is a pointwise, or near-pointwise, criterion. Contrastive loss rewards whatever invariances your augmentations encoded, and nothing else.

Neither loss contains a single term for the long-range correlation structure. There is nothing in the objective that says: preserve the DFA exponent. Nothing that says: keep the aperiodic slope. Nothing that says: match the branching ratio, or the avalanche distribution, or the scaling across time. The dynamical invariants that make EEG brain-like are simply not quantities the training ever looks at.

The uncomfortable consequence

Here is the consequence, and I will flag it plainly as a hypothesis I am testing rather than a result I am reporting.

[hypothesis] A foundation model can minimize its reconstruction loss, produce EEG that looks correct in any two-second window, match the spectrum locally, and still get the long-range structure wrong. You can be right everywhere locally and wrong about the scaling, because the scaling is a property of the correlations across windows, exactly the relationship the loss never scores. A model can be a faithful mimic of the signal’s surface and a poor custodian of its dynamics.

If that is true, and it is testable, then several comfortable assumptions come apart.

A model’s reconstructions and generations would not be physiologically faithful even when they pass visual and spectral inspection, because faithfulness at the level of dynamics is a different property from faithfulness at the level of shape.

Downstream tasks that depend on dynamical biomarkers, and many diseases express themselves through altered long-range correlations and shifted criticality rather than through a single spectral peak, could be built on representations that discarded the very markers they need.

And the benchmarks would not notice. Reconstruction error and downstream accuracy are the standard yardsticks, and neither of them measures whether the model preserved the scaling. A model could top every leaderboard and still be dynamically hollow.

I am not the only one looking. A recent critical review of EEG foundation models flags exactly this gap among the field’s open problems (arXiv:2507.11783), and separate work on aperiodic dynamics finds that reconstructing EEG which keeps its 1/f profile requires modeling the self-similarity exponent explicitly, rather than getting it for free from a standard reconstruction loss (arXiv:2505.19009).

The test is concrete

This is not a philosophical worry, which is what makes it worth writing down. It suggests a direct measurement.

Take a trained foundation model. Feed it real EEG. Then compute the dynamical invariants on what comes out, on its reconstructions, or on the trajectories in its latent space, the same way you would on a real recording: the DFA exponent, the aperiodic slope, the branching ratio, the avalanche statistics. Compare them to the input. If the model preserves the dynamics, those numbers travel through it intact. If they collapse, or drift toward the trivial values of a memoryless process, then the model is capturing the wrong invariants, no matter how good its reconstruction loss looks.

That is the direction I am pursuing, and I am deliberately not reporting a verdict here, because the measurement is the point and I would rather show it than assert it.

A different yardstick

The deeper issue is that we are evaluating brain foundation models with the metrics language handed us: reconstruction loss, next-token likelihood, downstream accuracy. Those metrics were fitted to a domain where the units are stable and the structure that matters is largely local and combinatorial. Brains are not that domain. The structure that matters in EEG is substantially in the long-range correlations and the nearness to criticality, and none of the imported metrics can see it.

So the question in the title is not rhetorical, and it is not settled. It is the one I think the field should be asking out loud: not only whether a model reconstructs the signal or classifies the patient, but whether it keeps the dynamics that made the signal worth recording in the first place. A model that ignores them is optimizing for the wrong likeness, and it will look right until precisely the moment the dynamics are what you needed.