2025 AIChE Annual Meeting
(58j) Maximizing the Data Efficiency of Experimental Planning Algorithms for Quantifying Material Structure-Property Relationships
To address this gap, we compare the performance of 100 data selection strategies for building machine learning models for materials property prediction on 95 tasks sourced from literature. From these results, we identify attributes of top-performing strategies, rationalize the shortcomings of low-performing strategies, and characterize how properties of tasks (e.g., property landscape roughness) influence data efficiency. Specifically, we show how neural network-based active learning algorithms outperform alternatives even in low-data scenarios, that space-filling algorithms can rival active learning algorithms on certain tasks, and that molecular representations based on small sets of physico-chemical features tend to be more data-efficient than more sophisticated representations. Motivated by these insights, we investigate how incorporating additional factors, like domain structure, meta-learning, and foundation models, impacts the efficiency of experimental planning algorithms in data-scarce scenarios.
Overall, we provide a comprehensive survey of data-efficient experimental planning strategies, recommend best practices for future materials design campaigns, and suggest avenues for further study.