2025 AIChE Annual Meeting

(58j) Maximizing the Data Efficiency of Experimental Planning Algorithms for Quantifying Material Structure-Property Relationships

Design-build-test-learn strategies based on simulations and experiments are increasingly employed to accelerate materials discovery and characterization. However, these design campaigns are limited by the availability of time and resources, constraining the range of problems to which these strategies can be applied and the fidelity of the chosen simulation or experiment. Therefore, it is essential that maximally efficient experimental planning algorithms are employed to limit the time and resource usage of data-driven materials design strategies. Despite this, there is no consensus on how to most effectively employ experimental planning strategies for choosing experiments that maximize understanding of material structure-property relationships with limited data.

To address this gap, we compare the performance of 100 data selection strategies for building machine learning models for materials property prediction on 95 tasks sourced from literature. From these results, we identify attributes of top-performing strategies, rationalize the shortcomings of low-performing strategies, and characterize how properties of tasks (e.g., property landscape roughness) influence data efficiency. Specifically, we show how neural network-based active learning algorithms outperform alternatives even in low-data scenarios, that space-filling algorithms can rival active learning algorithms on certain tasks, and that molecular representations based on small sets of physico-chemical features tend to be more data-efficient than more sophisticated representations. Motivated by these insights, we investigate how incorporating additional factors, like domain structure, meta-learning, and foundation models, impacts the efficiency of experimental planning algorithms in data-scarce scenarios.

Overall, we provide a comprehensive survey of data-efficient experimental planning strategies, recommend best practices for future materials design campaigns, and suggest avenues for further study.