2023 AIChE Annual Meeting
(305c) Extracting Meaningful Features from Industrial Text Data
Authors
Recent advancements in Natural Language Processing (NLP) [1] have enabled the extraction of features from text data beyond the frequency counting of Bag of Words (BoW) [2] kind of approaches. NLP models can codify the meaning of text into numerical features, which can be used for further analysis. However, NLP models remain complex to understand and are still primarily used as black-box models. Moreover, the power and robustness of text feature extraction methods is still not explored in the CPI context. Therefore, we evaluated several text feature extraction methods, including Bag of Words (BoW) and NLP, using both unsupervised and supervised approaches [3] to assess their power and robustness.
We applied text data exploratory analysis to a real case study from Dow Chemical Company site to assess the information that can be extracted from industrial text data to predict the probability of an event occurrence. Our findings show that the context described in text data is relatively sparse, which may be related to the functional aggregation level reported in the texts. Overall, our study demonstrates the potential for text data to be used in process analysis and monitoring in CPI.
References
[1] D. Antons, E. Grünwald, P. Cichy, T. O. Salge, e T. O. Salge, «The application of text mining methods in innovation research: current state, evolution patterns, and development priorities», R & D Management, vol. 50, n.o 3, pp. 329â351, jun. 2020, doi: 10.1111/radm.12408.
[2] A. Zheng e A. Casari, Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists, 1 edition. Beijing : Boston: OâReilly Media, 2018.
[3] T. Hastie, R. Tibshirani, e J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition, 2nd edition. New York, NY: Springer, 2009.