2025 AIChE Annual Meeting

Reinforcement Learning-Based Control for Inverted Pendulum System – Computational Study and Hardware Validation

A control system manages and regulates the behavior of other devices using system control loops. The goal is to control the system's state to the desired set point. However, in the real world, due to system complexity, nonlinearity, and uncertainty, it can be challenging. To address these limitations, reinforcement learning (RL) offers a promising alternative by enabling learning through direct interactions with the environment. The RL agent observes the system's state and outputs an effort to control its behavior. A reward system is created based on the deviation from the target set point, and the learning process continuously improves the control action taken by the agent. More specifically, the RL agent comprises two neural networks: an actor network and a critic network. The actor network generates a control policy, while the critic network evaluates the quality of the action, enabling an effective control strategy.

In this study, RL-based controllers were first trained for an inverted pendulum system and tested in Gymnasium simulation environments using Deep Deterministic Policy Gradient (DDPG) and Twin Delayed Deep Deterministic Policy Gradient (TD3). DDPG prone to instability and overestimate in value function so that is where TD3 comes in to improve DDPG because of its reduction in overestimate bias in the value function. Simulation results showed both DDP3 and TD3 could learn the policies for the inverted pendulum. However, TD3 outperformed DDPG, achieving faster convergence. By these findings, TD3 has been applied to Quanser Qube-Servo 3, an inverted pendulum module used for control research and education. Unlike simulation, on the hardware, the reward must be computed from sensor which may be noisy or incomplete. While the TD3 controller has not yet achieved full stabilization of the pendulum, the experiments provide a valuable insight into the challenges of applying RL to real-world hardware.