2025 AIChE Annual Meeting

(206e) Data-Driven Analysis of Heterogeneous Population Dynamics: A Case for Optimal Shrinkage

Authors

Pei-Chun Su, Yale University
Nikolaos Evangelou, Johns Hopkins University
Ronald Coifman, Yale University
We present a method for learning stochastic differential equations (SDEs) and stochastic partial differential equations (SPDEs) using optimal shrinkage, a multiscale denoising technique. Our approach treats noisy time-evolution data as corrupted observations of underlying stochastic differential equations (i.e. of both drift and diffusivity functions), which we estimate nonparametrically using wavelet thresholding.

By expanding observations in a two-dimensional Haar wavelet basis and applying nonlinear shrinkage, we denoise function estimates without requiring prior knowledge of smoothness. The shrinkage process penalizes small coefficients, effectively suppressing noise while preserving signal structure. We apply this strategy to discrete-time approximations of stochastic processes, using Euler-Maruyama-type schemes to cast learning as a regression problem with Gaussian noise.

We demonstrate the method’s versatility across first-order SDEs, second-order Langevin dynamics, and SPDEs. In each case, the stochastic term is treated as additive noise, enabling consistent estimation of drift and diffusion. For SPDEs, we adapt finite difference schemes to fit within the same shrinkage framework, enabling the recovery of spatially varying forcing terms from noisy spatiotemporal data.

Our experiments—ranging from toy models with cubic drift to spatially varying stochastic wave equations—show that wavelet shrinkage accurately recovers both deterministic and stochastic components of the dynamics. The method is nonparametric, computationally efficient, and robust to noise, making it a promising tool for learning complex stochastic systems directly from data.

We also discuss the application of these techniques in the study of experimental observations of cellular motility, in regimes where the latter can be well approximated by stochastic motion whose statistics vary across the members of the cell population (e.g. due to age, or differential exposure to chemical agents).