2025 AIChE Annual Meeting

(267g) Towards an Transferable Open Force Field Including Lipids: Development and Validation

Introduction: Biological membranes are central to a wide range of processes in biophysics, biochemistry, and biotechnology, including protein-lipid interactions, cellular transport, signal transduction, and the delivery of therapeutics via lipid nanoparticles. Understanding the structural and dynamic behavior of lipid assemblies is essential for advancing these applications. However, experimental characterization of lipid bilayers, especially multicomponent and compositionally complex systems, remains challenging due to limitations in resolution, timescale, and interpretability.

Molecular modeling (MM) offers a powerful complementary approach, capable of resolving atomistic details and probing membrane properties under a range of conditions. At its core, MM seeks to reproduce the true quantum mechanical potential energy surface using classical approximations. Achieving accurate simulations requires careful parameterization of both bonded (e.g., bonds, angles, torsions) and non-bonded (e.g., van der Waals, electrostatics) interactions. These components collectively define the system’s free energy landscape, where small inaccuracies in individual terms can propagate into large deviations in predicted behavior. Optimizing force field parameters to simultaneously capture thermodynamic, structural, and dynamic properties is a complex task, especially for anisotropic systems such as lipids.

The Open Force Field (OpenFF) initiative is a collaboration between academic and industry groups with the shared mission of developing accurate, reproducible, and open-source force fields for small molecules. Their small molecule force field, Sage, supports the parameterization of diverse chemical space and includes parameters for functional groups commonly found in lipids. Although Sage was not initially trained or validated on lipid-specific chemistries, OpenFF maintains a flexible framework for developing force fields from the ground up, which allows force fields to be inherently compatible across chemistries and provides an ideal foundation for extension. This work aims to expand the Sage force field to better represent lipid molecules and evaluate its ability to reproduce key membrane properties. These efforts may bridge the gap between general-purpose small molecule force fields and the specialized needs of lipid simulations, paving the way for a truly generalizable, biophysically accurate force field.

Sage underperforms for lipid chemistries: To determine the accuracy of Sage’s OpenFF parameters for lipids, biologically-relevant membrane compositions were simulated in varying conditions. Structural and kinetic properties, such as area per lipid, bilayer thickness, diffusion coefficients, and order parameters of the bilayers were calculated and compared to experimental data and well performing lipid simulations. Simulations using the OpenFF Sage force field revealed consistent deviations from experimental benchmarks. Tail order parameters were overestimated, indicating excessive chain rigidity. Diffusion coefficients were underestimated, causing the bilayer to resemble a dense-fluid phase behavior rather than a fluid phase bilayer. Additionally, Sage underestimated area per lipid and overestimated bilayer thickness across compositions, suggesting an overly compact membrane structure. These discrepancies point to limitations in the current parameterization of lipid-like chemistries and highlight the need for targeted improvements to better capture bilayer dynamics and structure.

Expand Sage’s Training Data: To adapt the Sage force field for lipid simulations, lipid-relevant chemistries were incorporated into the training dataset in the form of quantum mechanical data targets, specifically optimized geometries and torsion drive energy profiles. Model compounds were selected to represent key functional groups in phospholipids, including alkanes (tail groups), amines, and phosphates (head groups). These additions addressed a critical gap in Sage’s original training data, which lacked sufficient representation of long-chain hydrocarbons and polar phosphate-containing fragments that are central to lipid behavior. The updated force field was then refit, allowing the valence terms (bond, angle, torsion) to deviate from original parameters while keeping the van der Waals and electrostatic terms constant. The training process incorporated both the original geometries in Sage and the newly generated quantum mechanical data to enhance the force field’s accuracy in capturing lipid-specific interactions.

Validation of the fitted force field was performed on full lipid bilayer systems, where structural and dynamic metrics were compared to experimental data and results from established lipid force fields. In parallel, quantum mechanical (QM) potential energy surfaces for representative lipid-like molecules were compared against the corresponding molecular mechanics (MM) energy profiles to assess the force field’s ability to reproduce QM energetics. These refinements represent an important first step toward generalizing the Sage force field for biologically relevant lipid environments, but they also revealed the need to incorporate additional QM data for key chemical moieties such as the glycerol backbone and ester linkages, which are critical for accurately modeling the structural flexibility and polarity of phospholipids.

Lipid-representative molecules: Because full lipid molecules contain a complex combination of diverse functional groups, it can be challenging to diagnose and iteratively improve force field performance at the whole-molecule level. To address this, we employed a modular approach using small, well-characterized model compounds that represent chemically distinct regions of lipids. Linear alkanes were used as proxies for the hydrophobic tail groups, while methyl acetate served as a surrogate for the glycerol backbone and ester linkages.

For these representative molecules, experimental data on density, heat of vaporization, and diffusion coefficients were used as benchmarks. Simulations revealed that Sage systematically underestimated density and diffusion coefficients, while overestimating the heat of vaporization for both alkanes and methyl acetate. These discrepancies pointed to limitations in the nonbonded parameters, particularly in capturing accurate van der Waals interactions and potentially partial charge distributions.

To improve agreement with experiment, nonbonded parameters were varied for these fragments, and validation focused on both thermodynamic and conformational properties. For alkanes, we further assessed the populations of trans and gauche rotamers, which are sensitive to the accuracy of torsional parameters. These results informed a targeted, ab initio-guided refinement of the torsional potential for alkanes within the Sage force field. Together, these efforts represent a modular and data-driven strategy to improve lipid-relevant chemistries in a general-purpose small molecule force field.

Improving Sage Parameters: Systematic underestimation of density and diffusion coefficients and overestimation of heat of vaporization in the initial Sage simulations pointed to excessive dispersion interactions, high torsional barriers, and limited molecular mobility. Tuning ε and σ parameters for key functional groups may result in closer reproduction of experimental thermophysical properties for lipid-representative molecules, while proper curation and reweighting of ab initio quantum mechanical data for training may result in easier tuning of stubborn bonded parameters. These improvements, observed in full bilayer simulations, may bring relevant lipid bilayer metrics closer to experimental benchmarks. Ongoing efforts are focused on evaluating electrostatic contributions to bilayer properties and augmenting the training dataset with additional quantum mechanical data for glycerol-containing compounds, to better capture the structure and flexibility of the lipid backbone. This presentation will demonstrate that improvements in bonded and nonbonded parameterization by the careful selection of chemically relevant training data and expansion of the SMIRNOFF parameter space, Sage can be adapted to expand the general-purpose force field to specialized domains such as lipids. These efforts lay the groundwork for a truly generalizable, open-source force field capable of accurately modeling both small molecules and complex lipid assemblies.