2025 AIChE Annual Meeting
Reinforcement Learning-Based Control for Inverted Pendulum System – Computational Study and Hardware Validation
In this study, RL-based controllers were first trained for an inverted pendulum system and tested in Gymnasium simulation environments using Deep Deterministic Policy Gradient (DDPG) and Twin Delayed Deep Deterministic Policy Gradient (TD3). DDPG prone to instability and overestimate in value function so that is where TD3 comes in to improve DDPG because of its reduction in overestimate bias in the value function. Simulation results showed both DDP3 and TD3 could learn the policies for the inverted pendulum. However, TD3 outperformed DDPG, achieving faster convergence. By these findings, TD3 has been applied to Quanser Qube-Servo 3, an inverted pendulum module used for control research and education. Unlike simulation, on the hardware, the reward must be computed from sensor which may be noisy or incomplete. While the TD3 controller has not yet achieved full stabilization of the pendulum, the experiments provide a valuable insight into the challenges of applying RL to real-world hardware.