Quality-Controlled Active Learning via Gaussian Processes for Robust Structure-Property Learning in Autonomous Microscopy
Summary
This paper introduces a novel gated active learning framework designed to overcome the limitations of noisy data in autonomous experimental systems, particularly in structure-property learning tasks like Image-to-Spectrum and Spectrum-to-Image translations. Standard active learning often misinterprets noise as uncertainty, leading to the acquisition of poor-quality measurements. The proposed framework combines curiosity-driven sampling with a physics-informed quality control filter, based on Simple Harmonic Oscillator model fits, to automatically exclude low-fidelity data during acquisition. Evaluations on a pre-acquired dataset of band-excitation piezoresponse spectroscopy data from PbTiO3 thin films demonstrate that this method significantly outperforms random sampling, standard active learning, and multitask learning strategies. Furthermore, its effectiveness was validated in real-time experiments on BiFeO3 thin films, showcasing its applicability in real autonomous microscopy. This work advocates for a shift towards hybrid autonomy in self-driving labs, integrating physics-informed quality assessment with active decision-making for more reliable scientific discovery.
Technical Impact
This research addresses a critical challenge in autonomous experimental systems, particularly in materials science's structure-property learning (Im2Spec, Spec2Im), where noisy data hinders active learning performance. The proposed 'gated active learning framework' integrates curiosity-driven sampling with a physics-informed quality control filter (based on Simple Harmonic Oscillator model fits) to automatically exclude low-fidelity measurements during data acquisition. This overcomes the issue where traditional active learners misinterpret noise as uncertainty, leading to the acquisition of poor samples, thereby significantly enhancing learning robustness and reliability. For development stacks, this necessitates integrating real-time data quality assessment modules into active learning pipelines. This would involve combining physics model implementations (e.g., using libraries like SciPy for model fitting) with machine learning models such as Gaussian Processes (e.g., via scikit-learn, PyTorch, TensorFlow). This framework provides a crucial architectural pattern for building autonomous labs and AI-driven experimental systems, improving data efficiency and the reliability of scientific discovery. It promotes a hybrid AI approach where physics-based reasoning complements data-driven machine learning.