2025 Global Conference on Process Safety and Big Data

Evolving Theme Analysis to Identify Early Indicators of Process Safety Risk Using Observation Data from North Sea Oil & Gas Operators

Introduction

Our company has developed a new tool, ‘Evolving Theme Analysis’, to identify key safety themes from health and safety observation data. This technique has been implemented in a multi-stage project with a major North Sea operator to extract insight from their observation dataset. The outputs of this project empowered the operator to proactively identify safety risks before an incident occurs, and prioritise corrective actions. It also enabled them to monitor the effectiveness of interventions and provide timely, meaningful feedback to the workforce, to foster engagement and accountability.

Example in practice

When utilised for a single operator, the feedback was that the project “generated insights and trends that would just not have been possible using traditional techniques. This work has enabled us to identify the top issues that are influencing process and personal safety on the installation, and to put in place an action plan to sustainably improve these aspects. Equally importantly, the work by [Our Company] has enabled us to feedback to the offshore workforce on the contribution that their observations are making to close the gap between work as imagined onshore and work as done offshore.”

This tool builds upon a technique called ‘Dynamic Topic Modelling’. The dynamic aspect allows the underlying model to track the evolution of existing themes over time and detect the emergence of new themes, helping to identify behavioural patterns, systemic weaknesses, and early indicators of process safety risk across an asset.

Industry benchmarking

Our company have applied this tool, not only to a single operator, but also to the wider North Sea Oil& Gas industry as a whole, with data combined from multiple operators. Themes prevalent across the industry have been identified, interpreted and collated in the Industry 2024 Data Report(1). Since this report was published, additional model iterations have been trained, hence the theme analysis model has evolved and new themes have emerged as distinct semantic groupings. By training a model upon data collected across the industry sector, industry benchmarking of individual companies becomes possible, these findings underline the value of treating observations as a unified data source.

What can it be used for?

The identification of key safety themes within an observation dataset can highlight shared workforce concerns, patterns of behaviour and consistently occurring issues/challenges. . By exploring these themes, operators can identify key issues to target and utilise these to inform interventions, leading to a safer, more reliable asset. Once interventions have been implemented, theme prevalence over time can be used to track safety performance and crucially, monitor the effectiveness of targeted interventions. For behavioural themes (relating to unsafe acts) where observations may be the only data source capturing what is happening, theme prevalence may be the only indicator of intervention effectiveness.

“Evolving Theme Analysis” therefore provide operators with a way to prioritise pressing issues, and then to monitor the effectiveness interventions over time, ensuring that corrective actions are both data-driven and adaptive to emerging trends.

Technical

Topic modelling is a multi-stage process whereby the data is first embedded into numerical format using an open-source text embedding model which captures textual semantics. Then, the embedded text vectors undergo a dimensionality reduction stage prior to a ‘clustering’ stage which creates clusters of semantically similar observations. These 'clusters' form Themes: distinct patterns of similar observations, for example observations that mention 'Leak Detection' may be (and has been) recognised as a cohesive theme. These themes then undergo a meta-analysis stage to extract a suitable name, summary and key points, using text-generative large-language models (LLMs). Topic modelling therefore combines several natural language processing techniques with the ultimate aim of organising a varied dataset into a navigable and valuable data asset.

Information extraction represents an ideal use case of text-generative LLMs: the actual assignment of an observation to a theme comes from a mathematical similarity score between the embedded observation and a dense embedding vector that represents the centroid of a theme, which is a consistent and explainable process. Then, the LLM utilises the ability of LLMs to summarise large volumes of information into digestible short-form, to aid comprehension of extracted themes.

“Evolving Theme Analysis” enhances traditional topic modelling by enabling the detection of emerging themes over time. Our company has developed an algorithm that trains separate topic models on data from different time periods and then integrates these into a single, dynamic model. This approach allows the model to be periodically updated while maintaining continuity and consistency.

The process begins with a base topic model trained on data from an initial time window (typically one year). Subsequent models are then trained at regular intervals (e.g., quarterly). Newly identified topics are compared with existing ones to determine whether they are novel or previously observed. In this way, the model preserves and tracks existing themes over time while incorporating new ones as they arise: resulting in a continuously evolving thematic landscape that reflects current and emerging concerns.

The outputs of the “Evolving Theme Analysis” are shown in a bespoke dashboard that allows safety professionals to navigate themes, starting from the high-level of theme prevalence. The dashboard combines evolving theme analysis with previously developed functionality to assign categories (LSR, Hazard) to observations, allowing themes to be explored and broken down by different industrially relevant categories. This can be used to see, for example, that the ‘Leak Detection’ theme is highly associated with process containment issues. The dashboard allows the user to drill down and view the individual observations that belong to the themes, facilitating a granular view of observation data with a temporal aspect.

This tool has been deployed into a production-ready cloud-hosted environment. This is an example of a project that has quickly progressed from exploratory data science to a scalable, secure offering, combining expertise in data science, data engineering and process safety.

Summary

In summary, we propose to present a newly developed tool that applies cutting-edge text-processing techniques to extract meaningful and actionable insights from workforce observations: a rich but often underutilised source of information. This method has already been successfully implemented in collaboration with individual companies and industry bodies, demonstrating both its practical applicability and proven value. In this presentation, Our company will share findings from industrial applications of Evolving Theme Analysis, providing not only real-world context but also an opportunity for shared learning across the industry.