Membrane-based distributed water treatment and desalination (DWTD) systems are crucial for upgrading impaired inland groundwater for potable use in small remote communities. Given fluctuating water demand (hourly and daily) multi-mode operation (startup, production, shutdown, and flushing) of such systems is required with strict product water quality control. Accordingly, operational graph attention convolutional networks (GATConv) models were developed to support the autonomous operation of wellhead DWTD membrane-based systems (water production capacity of ~2,500 – 5800 gallons/day) deployed in three small, underserved communities in Salinas Valley, CA. Time-series data from the DWTD systems, comprised of 33 process sensors, storage tank levels, and process tags, was transmitted to local system controller and data storage, and to a DWTD cyberinfrastructure that also served for remote monitoring and system management. The GATConv models for permeate flux, permeate (product water) salinity and nitrate level demonstrated excellent forecasting prediction (2-4 months forward in time) and were effective for forecasting system performance, as well as detecting sensor faults and membrane performance degradation. Also, interpretability and causal relationships were extracted from the models by aggregating and visualizing the learned weights within the graph neural network structure. Such analysis revealed that temperature had a significant impact on permeate flux and solute passage, in addition to the expected dependence on feed salinity and nitrate levels, as well as concentrate recycle ratio, and flow rate and pressure profile along the RO elements.