2019 AIChE Annual Meeting
(642a) Two-Phase Optimal Design of Pharmaceutical Separation: Sampling-Based Uncertainty Analysis and Reinforcement Learning
Authors
Property process modelling and uncertainty analysis
Screening of an optimal solvent or solvents mixture is an inherent procedure to formulate an efficient pharmaceutical process. Solubility modelling is most often used as a criterion of solvent screening and requires a knowledge of a proper thermodynamic model. As aforementioned, eNRTL-SAC model is applied to this study, whereby regressed segments parameters as well as activity coefficients are able to be disclosed. Activity coefficients are determined by regressed segment parameters and a distribution coefficient, which is also known as a partition coefficient and means the ratio of concentration of a compound in a mixture at equilibrium, is calculated by activity coefficients. Extraction factor consists of the distribution coefficient, the flow rate of the feed stream in the LLE process, and the flow rate of the solvent stream in the LLE process, and Kremser-Souders-Brown theoretical stage equation directly uses the distribution coefficient and the extraction factor to estimate the number of stages in the extraction column that ultimately affects the operating costs. Parameter uncertainties and an influence thereof are implemented by uncertainty analysis in order to give a decision maker the guideline of an optimal process design. Sampling-based model solution prior to uncertainty analysis is conducted as means to explore and identify the optimal process variables thereby Monte Carlo method is exploited to illustrate the influence of parameter uncertainties.
Off-policy control with deep Q-leaning
General reinforcement learning basically encompasses five components, each of which is state, state transition probability matrix, action, reward, and discount factor. Action is determined by a policy included in the agent. Conventional objective function in reinforcement learning is to maximize a reward function according to the policy. Therefore, understanding of the optimal policy is the main purpose in reinforcement learning. Off-policy strategy is associated with the concept of exploration and exploitation. In the case of on-policy strategy, a specific policy can be upgraded through agent training, on the other hand, it has the weakness of exploring solutions available in the vicinity of the current policy. At the expense of consuming computational costs, off-policy can flexibly explore distinct candidates and approach much closer to the global solution. The agent is designed by deep learning algorithms to suggest the optimal action every single episode. State information and value of reward from the environment play a role of input variables of the agent and suitable hyperparameters of the deep learning algorithm have to be set up to improve the efficiency of the agent.
This study purposes to design a novel extraction process in pharmaceutical industry based on uncertainty analysis and reinforcement learning. Thermodynamics property modelling, solubility modelling, solvent screening, and uncertainty analysis are consecutively examined to design the feasible LLE process of pharmaceuticals. Afterwards reinforcement learning is considered to suggest the optimal operating planning based on the results from uncertainty analysis. Moreover, the proposed two-phase framework of the LLE process would be extended to other processes in downstream process.