Low-resolution and implicit-solvent coarse-grained (CG) modeling remains an effective way to improve the explorable time and length scales of macromolecular simulations. However, the mapping degeneracy from lost degrees of freedom (e.g., from integrated-out solvent or degenerate fine-grained configurations within CG sites) can result in both dynamical and thermodynamic inaccuracies, especially since these models are likely to exhibit non-Markovian behavior. Current innovations in machine learning techniques, such as language learning models (LLMs) and Bayesian modeling, may provide a means to improve the accuracy of low-resolution CG models. Here, we propose a bottom-up (i.e., derived from atomistic data through statistical mechanical principles) CG modeling approach we denote as Probabilistic Forecasting for Coarse-Graining (PFCG) that utilizes elements from recurrent neural networks and transformers to learn spatiotemporal relationships along CG-mapped trajectories to forecast probabilities of future integration steps. We test our approach on fast-folding mini-proteins with varying folding kinetics and compare predicted free energy surfaces and dynamical descriptors to that of other CG approaches and the reference atomistic data. Our results demonstrate that our approach is faithfully able to recapitulate both stationary and dynamical properties, the latter of which is notoriously difficult for current CG models. We attribute the success of the PFCG approach to implicit learning of the underlying CG equations of motion, rather than the conventional approach to learn effective CG interactions and forces. Finally, we discuss future directions for improvement, which include improving scalability to larger protein systems and strategies to incorporate chemical and temperature transferability.