Transfer Learning (TL) is integral to developing and updating data-driven machine learning (ML) operational models for autonomous operation of membrane-based distributed wellhead water treatment and desalination (DWTD) systems. DWTD systems are vital for delivering safe potable water to remote and underserved communities in the USA and globally. Their operation is inherently intermittent, requiring autonomous control to handle multi-mode operation (startup, production, shutdown, flushing). Rapid commissioning of new systems in diverse geographical areas builds confidence in reliability and regulatory permitting. This study reports TL implementation in three DWTD systems deployed in Salinas Valley, California. Baseline models were first developed for site A for permeate flux, salinity, and nitrate, using Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) and attention mechanisms. Based on multi-year data (~6M points from 22 sensors), the models showed excellent performance (R² = 0.95, AARE < ~4%). Systems B and C (located ~15-20 miles from site A) were similar in design to A, but varied in components (valves, pumps, sensor manufacturers), source water quality, operational capacity, and environmental conditions. Multi-mode operational models were developed for B and C for the following scenarios: (i) TL from A to B (3 membranes in series for each system), (ii) TL from A to C (3 and 5 membrane train for sites a and B, respectively), and (iii) sequential TL from A to B, then B to C. TL based models enabled rapid model building for the new systems with forecasting horizon of ~2-4 months while maintaining R² ≥ ~0.92 and prediction error < ~5% prior to model retraining. The TL approach led to accurate data-driven ML models that contributed to improved system management and control, confidence in the system reliability, and provided support for shortening the commissioning period and approvals for delivering safe potable water to the study communities.