13 Complete Steps for OT Forensics After an Incident

by Mohammad Ibrahim

March 26, 2026

IT / OT Convergence, Incident Response & Forensics, Industrial Cybersecurity Intelligence

When an industrial incident occurs , whether a ransomware event reaches Level 2 of the control network, an unauthorized configuration change disrupts a production process, or an anomalous communication pattern suggests active adversary presence , the quality of the forensic investigation that follows determines everything downstream: the accuracy of root cause analysis, the completeness of remediation, the defensibility of the incident report, and the confidence with which operations can resume.

The 13 Complete Steps for OT Forensics After an Incident in this article provide a structured, operationally realistic framework for conducting forensic investigations in OT/ICS environments, where the constraints of process safety, legacy systems, uptime requirements, and vendor dependencies make the direct application of IT forensic procedures both impractical and potentially dangerous.

OT forensics is not IT forensics in a factory. It is a distinct discipline with distinct constraints, distinct evidence sources, distinct stakeholder obligations, and distinct consequences for errors in technique. Understanding that distinction, and building forensic capability around it, is the foundation of effective incident response in industrial environments.

Step 1 – Activate the Incident Response Team and Establish Command Structure

What this step involves: The first action after an OT security incident is confirmed is activating the designated incident response team and establishing clear command structure, defining who has authority over security decisions, who is responsible for operational continuity, and who owns communications with leadership, regulators, and external parties.

Why it matters: OT incidents require simultaneous management of security response, operational continuity, and safety assurance. Without a clear command structure, these priorities compete rather than coordinate, resulting in response decisions that optimize for one dimension at the cost of another.

OT-specific consideration: The incident response commander for an OT event should have authority that spans both security and operations, or there must be a defined escalation path between the security lead and operations management. Security decisions that affect production must be made with operations input, not unilaterally.

Practical action: Activate your documented OT incident response playbook. If no playbook exists, this incident creates the documented baseline for the first one.

Step 2 – Define Scope and Confirm the Incident Nature

What this step involves: Before any containment or forensic action, define the known scope of the incident, which systems are confirmed affected, which are potentially affected, and which are unaffected. Confirm whether the incident is an active threat, a completed event, or an anomaly under investigation.

Why it matters: Forensic and containment actions appropriate for an active ransomware event are different from those appropriate for a completed unauthorized configuration change discovered in a post-maintenance audit. Scope definition prevents over-reaction that disrupts unaffected systems and under-reaction that allows an active threat to continue.

OT-specific consideration: In OT environments, scope definition must include the safety consequence dimension, which affected systems have safety functions, and what is the risk of those safety functions being compromised.

Step 3 – Notify Operations and Safety Teams Immediately

What this step involves: Concurrent with scope definition, formally notify operations engineering and safety teams, providing them with the known scope of the incident, the potential safety implications, and the forensic and containment actions under consideration.

Why it matters: Operations and safety teams have context about process state, equipment condition, and operational constraints that the security team does not. Forensic decisions made without this context can create safety consequences that the security team did not anticipate.

Practical action: Establish a secure, dedicated communication channel for the incident response team that includes operations and safety representation from the first notification. Every forensic and containment decision that could affect process operations must be reviewed by operations before execution.

Step 4 – Contain the Incident Without Disrupting Safe Operations

What this step involves: Implement containment measures that limit the adversary’s ability to continue operating or cause further damage, while ensuring that containment actions do not interrupt safety-critical process control functions.

Why it matters: In IT forensics, containment typically means network isolation or system shutdown. In OT forensics, these same actions can cause the very operational disruption the adversary intended, or trigger safety system activations with physical consequences.

OT-specific approach: Containment in OT should be implemented incrementally and with operational validation at each step. Network isolation of a compromised engineering workstation is typically safe. Isolating a PLC from its SCADA server without understanding the process consequences is not.

Common approach: Where possible, implement monitoring-based containment first, enhanced network visibility and communication logging, before physical isolation, allowing the forensic team to understand the incident scope before applying controls that could affect process operations.

Step 5 – Preserve Evidence Before It Is Lost

What this step involves: Immediately identify and preserve all evidence sources that are volatile, memory contents, network traffic, temporary files, and log data with short retention windows, before containment or recovery actions overwrite them.

Why it matters: In OT environments, evidence preservation is particularly time-critical because many OT devices have very limited local storage and overwrite logs rapidly. Engineering workstation temporary files, PLC audit logs, and network traffic captures can be lost within hours of an incident if not actively preserved.

Priority evidence sources:

Network traffic captures (PCAP) from monitoring sensors at zone boundaries
Engineering workstation memory and running process state (if safe to capture)
PLC and RTU event logs and audit trails
Historian data covering the incident timeframe
SCADA server application logs and operator interaction records
Authentication and access logs from all OT systems
Physical access records from controlled areas

Chain of custody: From the moment evidence is collected, maintain a documented chain of custody, recording who collected each artifact, when, from which system, and by what method.

Step 6 – Capture Network Traffic and Communications

What this step involves: Collect and preserve network traffic captures from the OT monitoring infrastructure, network switches, and any passive sensors deployed in the affected zones, covering the incident timeframe and a sufficient pre-incident window to establish baseline behavior.

Why it matters: Network traffic is often the most complete and reliable evidence source in OT forensics, capturing communication patterns, protocol content, timing, and device behavior that application logs may not record.

OT-specific consideration: Network capture in OT environments should use passive methods, SPAN port mirroring or network TAPs, never active network injection or scanning that could disrupt control system communications.

What to look for: unusual communication patterns, connections to unexpected external addresses, protocol anomalies (unexpected function codes, out-of-sequence communications), authentication events, and timing deviations from established baselines.

Step 7 – Collect Logs from All Relevant OT Systems

What this step involves: Systematically collect log data from every system with a potential role in the incident, expanding outward from confirmed affected systems to cover the full potential scope.

Log sources to prioritize:

Windows event logs from all OT workstations and servers (Security, System, Application)
Domain controller authentication logs
VPN and remote access gateway logs
Industrial protocol logs from network monitoring platforms
PLC and RTU event logs (where available)
Historian data and configuration change records
Firewall and network device logs covering OT zone boundaries
Jump host and privileged access management session logs
Physical access control system records

OT-specific challenge: Many legacy OT devices generate minimal logs or store them locally with short retention windows. For these devices, network-level evidence is the primary forensic source, making the Step 6 network capture particularly critical.

Step 8 – Conduct Host Analysis on OT Workstations and Servers

What this step involves: Perform forensic analysis on the Windows-based systems in the OT environment, engineering workstations, SCADA servers, historian servers, and jump hosts, using standard Windows forensic methods adapted for the operational constraints of the OT environment.

What to examine:

Process execution history (prefetch, Windows event logs, Sysmon data if deployed)
User account activity and authentication events
File system changes and new or modified files in the incident timeframe
Registry modifications relevant to persistence mechanisms
Network connection history
Installed software changes
Scheduled task and service modifications

OT-specific consideration: Host forensic tools that require agent installation or active scanning should be validated in a test environment before use on production OT hosts. Some OT application software is sensitive to resource contention that forensic tools can introduce.

Step 9 – Analyze Controller and Field Device Evidence

What this step involves: Examine PLC, RTU, and IED evidence, including configuration change logs, ladder logic version history, setpoint modification records, and any device event logs, to determine whether control system logic or configuration was modified during the incident.

Why it matters: In OT incidents where the adversary reached Level 1 or Level 2 of the control network, the most critical forensic question is whether control logic, safety parameters, or device configurations were modified. Identifying unauthorized changes, and reverting them safely, is essential to safe operational recovery.

OT-specific approach: Controller analysis should be conducted by OT engineers with knowledge of the specific controller platform and process. Security forensics professionals should support the collection process and document findings, but controller logic analysis requires process domain expertise.

Evidence to collect: Compare current controller configuration against the last known-good backup, documenting any differences, including changes to ladder logic, function blocks, setpoints, communications parameters, and safety system configuration.

Step 10 – Reconstruct the Incident Timeline

What this step involves: Using all collected evidence, network traffic, host logs, controller records, authentication events, physical access records, construct a chronological timeline of the incident from initial access or anomaly to detection and response.

Why it matters: Timeline reconstruction is the analytical foundation of root cause analysis. It reveals the sequence of events, identifies the initial access vector, maps the adversary’s lateral movement, and determines the full scope of systems accessed or affected.

OT-specific challenge: Time synchronization is a critical dependency for timeline reconstruction in OT environments. Many OT devices have independent time references, if they are not synchronized to a common NTP source, event timestamps from different systems may be offset, making timeline correlation difficult or impossible.

Practical approach: Build the timeline using a structured analysis tool, even a spreadsheet with timestamped events from each evidence source, and identify any time reference discrepancies between systems before interpreting the timeline.

Step 11 – Conduct Root Cause Analysis

What this step involves: Using the reconstructed timeline and collected evidence, identify the root cause of the incident, the initial access vector, the vulnerability or weakness exploited, and the control failures that allowed the incident to progress to the scope observed.

Root cause categories in OT incidents:

Exploited remote access pathway (VPN, vendor connection, engineering workstation)
Phishing or credential compromise on IT-connected OT workstation
Supply chain compromise (software update, vendor-installed software)
Insider action (intentional or accidental configuration change)
Lateral movement from IT network through inadequate IT/OT boundary

Why it matters: Root cause analysis drives remediation, addressing the specific vulnerability, control failure, or architectural weakness that enabled the incident. Remediation without confirmed root cause may address symptoms while leaving the underlying vulnerability unaddressed.

Step 12 – Document Findings and Maintain Chain of Custody

What this step involves: Produce a structured incident investigation report documenting: incident timeline, scope, root cause, evidence collected and preserved, forensic methods used, findings, and remediation recommendations, maintaining chain of custody documentation for all collected evidence.

Why it matters: The incident report serves multiple purposes: internal learning, regulatory reporting, insurance claims, potential legal proceedings, and the organizational knowledge base that informs future security investment. Reports that lack documentation of forensic method, chain of custody, or evidentiary basis are not defensible in any of these contexts.

For regulated environments: Critical infrastructure operators in regulated sectors, energy, water, transportation, have specific incident reporting obligations that the investigation report must satisfy. Confirm regulatory reporting requirements and timelines at the incident activation stage, not after the investigation is complete.

Step 13 – Validate Recovery and Implement Post-Incident Hardening

What this step involves: Before declaring the incident resolved and returning to full operational status, validate that all affected systems have been restored to a known-good configuration, all identified vulnerabilities have been remediated, and the specific control failures that allowed the incident have been addressed.

Recovery validation checklist:

All modified controller configurations restored to validated known-good state
All compromised credentials revoked and replaced
All identified persistence mechanisms removed
All affected systems verified against clean baseline
Network monitoring confirmed active and generating expected telemetry
Operations team confirmation of normal process behavior

Post-incident hardening: Implement the specific security improvements identified in root cause analysis, closing the vulnerability, patching the system, segmenting the network pathway, or implementing the control that was absent during the incident. Document these changes as part of the incident record.

Conclusion

The 13 Complete Steps for OT Forensics After an Incident in this article reflect the operational reality of conducting defensible forensic investigations in environments where safety, availability, and process continuity are non-negotiable constraints. OT forensics is not a simplified version of IT forensics, it is a discipline that requires its own framework, its own tools selection criteria, and its own stakeholder coordination model.

Organizations that invest in OT forensic capability before they need it, through documented playbooks, pre-deployed monitoring infrastructure, established chain of custody procedures, and joint security/operations response training, consistently produce better investigation outcomes, faster recovery timelines, and more defensible incident reports than those building the capability during an active incident.

Need expert insight after an OT incident – or ready to share your own industrial cybersecurity perspective?

If your organization is working on OT forensics, incident response, or industrial resilience, OT Ecosystem offers a platform to share practical expertise with the right audience.

Share your story. Strengthen your authority. Be part of the OT security conversation.

📩 Email: info@otecosystem.com
📞 Call: +91 9490056002
💬 WhatsApp: https://wa.me/919490056002