The Problem: Workflow Hell in Energy Operations
Every utility we work with runs on a patchwork of systems that don't talk to each other. SCADA alarms come through one vendor's proprietary interface. Work orders live in a legacy CMMS that hasn't been updated since 2008. Compliance reports get assembled manually in Excel by engineers who should be doing actual engineering. Asset data exists in three different databases with no single source of truth.
The typical response is to buy an enterprise integration platform that costs seven figures and requires a consulting army to configure. Or teams write custom Python scripts that work until the person who wrote them leaves. We've watched utilities spend eighteen months and $2M trying to get MuleSoft to route SCADA alarms to work orders.
We needed workflow automation that could run air-gapped, handle OT protocols, integrate with existing systems without vendor lock-in, and be maintained by utility engineers—not expensive consultants. After deploying variations across three utilities, we settled on n8n as the workflow engine paired with ERPNext for the operational data backbone.
Architecture: Two-Layer Automation
Our stack separates workflow orchestration from data persistence. n8n handles all the routing, transformation, and integration logic. ERPNext provides the structured data layer for assets, maintenance, inventory, and compliance records. This separation is critical in OT environments where you need deterministic behavior and clear audit trails.
Layer One: n8n Workflow Engine
n8n runs in Docker on a hardened Linux VM inside the utility's DMZ. We use version 1.x with self-hosted deployment—no cloud, no telemetry, no external dependencies. The instance connects to the utility's internal network via a single firewall-controlled interface.
We deploy n8n with PostgreSQL as the execution database and Redis for queue management. Standard configuration: 4 vCPUs, 16GB RAM, 100GB SSD. This handles 200-300 workflow executions per hour with sub-second response times for most automation tasks.
Critical configuration: we disable all external node packages and whitelist exactly which nodes can be used. In NERC CIP Critical Infrastructure Protection environments, you cannot have workflow automation pulling random NPM packages from the internet. We maintain an internal registry of approved nodes that have been security-reviewed.
Layer Two: ERPNext Data Backbone
ERPNext 15.x provides the structured data layer. We use it for asset registry, maintenance work orders, inventory management, procurement tracking, and compliance documentation. It replaces the legacy CMMS, procurement system, and half a dozen Excel-based processes.
Deployment: separate VM, 8 vCPUs, 32GB RAM, MariaDB backend. ERPNext is Python-based (Frappe framework) and runs behind nginx. We enable the REST API but lock it down to internal network access only with API key authentication.
The key insight: ERPNext isn't just an ERP—it's a flexible data model you can customize for energy operations. We've extended the Asset doctype to include SCADA tag mappings, protection relay settings, and transformer load profiles. The Work Order doctype links directly to NERC CIP maintenance requirements and generates compliance evidence automatically.
Component Interactions: How Data Flows
Here's a real workflow we deployed at a transmission utility: automated substation alarm response.
Trigger: SCADA alarm fires (DNP3 or IEC 61850 protocol). The utility's SCADA historian (OSIsoft PI) receives the alarm and writes to a SQL Server table we monitor.
Step 1: n8n polls the SQL Server table every 15 seconds using the MSSQL node. When a new critical alarm appears, the workflow triggers.
Step 2: n8n queries ERPNext via REST API to retrieve the asset record for the substation equipment. This includes the equipment criticality rating, maintenance history, assigned field crew, and current outage status.
Step 3: Decision logic in n8n: if the equipment is critical (transmission voltage) and no planned outage is active, create an emergency work order. If it's a known nuisance alarm (we maintain a list), log it but don't escalate.
Step 4: n8n creates a work order in ERPNext with priority automatically set based on voltage level and time of day. The work order includes the SCADA alarm details, equipment history, and suggested troubleshooting steps from our knowledge base.
Step 5: n8n sends SMS to on-call field crew via Twilio integration and posts to the utility's Mattermost channel. Both notifications include a direct link to the work order.
Step 6: Field crew acknowledges via mobile app (ERPNext mobile interface). n8n monitors for acknowledgment within 10 minutes. If no ack, escalates to supervisor.
Step 7: When work order is completed in ERPNext, n8n automatically generates the NERC CIP compliance record with timestamps, personnel involved, and actions taken. This goes into a dedicated ERPNext doctype we created called 'CIP Maintenance Evidence'.
End-to-end latency from SCADA alarm to field crew notification: 45-90 seconds. Manual process before automation: 15-45 minutes depending on who was monitoring the SCADA console.
Operational Reality: What Breaks and Why
n8n Failure Modes
The webhook receiver node crashes under load if you get a burst of simultaneous triggers. We saw this during a storm event when 40+ SCADA alarms fired within 60 seconds. Solution: implemented a Redis-based queue in front of n8n. External systems write to Redis, n8n pulls from queue at controlled rate.
n8n's credential management is weak for OT environments. Credentials are encrypted in PostgreSQL, but the encryption key is in a config file on the same server. For NERC CIP compliance, we had to implement HashiCorp Vault and modify n8n to retrieve credentials from Vault at runtime. This required patching the n8n source and maintaining a custom Docker image.
Error handling in complex workflows gets messy. n8n's built-in error workflow is global—if you want different error behavior for different workflow types, you're writing a lot of conditional logic. We ended up creating a standardized error-handling pattern as a sub-workflow that all production workflows call.
ERPNext Integration Challenges
ERPNext's REST API is well-documented but has quirks. Creating linked doctypes (like a Work Order with linked Asset) requires multiple API calls in specific order. If you try to create everything in one call with nested JSON, it silently fails with no error message. We wrote wrapper functions in n8n that handle the multi-step creation properly.
Performance degrades if you store large binary files (PDFs, images) directly in ERPNext doctypes. For compliance evidence that includes photos from field inspections, we store files in Nextcloud and keep only the file reference in ERPNext. n8n handles the file upload to Nextcloud and updates ERPNext with the link.
ERPNext's permission system is role-based, which doesn't map cleanly to NERC CIP access controls that are asset-specific. We had to extend the permission framework with custom Python code that checks asset criticality before allowing access to related work orders and maintenance records.
Deployment Considerations
Air-Gapped Operations
In true air-gapped environments (no internet connectivity), you must pre-stage everything. We maintain an internal mirror of n8n Docker images, node dependencies, and ERPNext installation packages. Updates happen via USB transfer to a designated admin workstation that can reach both the internet and the isolated network (not simultaneously).
n8n's community nodes cannot be used air-gapped unless you vendor them into your internal registry. We maintain about 15 custom nodes specific to energy protocols (Modbus, DNP3, IEC 61850) that we developed in-house.
Monitoring and Observability
n8n exposes metrics via Prometheus endpoint. We monitor workflow execution count, error rate, execution duration, and queue depth. Alerting triggers if error rate exceeds 5% or if queue depth grows beyond 50.
ERPNext doesn't expose metrics natively. We wrote a custom Frappe app that publishes key metrics (API response time, database query performance, background job queue length) to Prometheus. This runs as a scheduled job every 30 seconds.
Log aggregation is critical for audit trails. All n8n execution logs and ERPNext API access logs go to a local Graylog instance. For NERC CIP compliance, we retain workflow execution logs for 90 days with immutable storage.
Scaling Patterns
Single n8n instance handles up to 300 executions/hour. Beyond that, you need to run multiple n8n workers with Redis queue. We haven't needed this yet—most utility automation workloads are bursty (storm events, shift changes) rather than sustained high volume.
ERPNext scales vertically to about 50 concurrent users before you need to think about horizontal scaling. For larger utilities, Frappe supports multi-site deployment where different business units run separate ERPNext instances behind a load balancer. We haven't implemented this—utilities we work with top out at 30 concurrent users.
Cost and Maintenance Reality
Our total infrastructure cost per utility: $8K in server hardware (two VMs on existing infrastructure), zero software licensing fees, $40K in initial configuration and custom development (4 weeks of engineering time). Compare this to a Siemens or ABB integration platform that starts at $500K plus annual maintenance.
Ongoing maintenance: utility engineers handle workflow modifications in n8n's visual interface with minimal training. We provide two days of initial training and then they're autonomous. ERPNext customization requires Python knowledge—utilities either train internal developers or contract us for major schema changes.
We spend about 4 hours per month per utility on maintenance: reviewing error logs, updating custom nodes for protocol changes, adding new workflow templates as operational needs evolve.
The Verdict
n8n paired with ERPNext gives you enterprise workflow automation capability at 5% the cost of commercial platforms, with full control over your data and code. It works in air-gapped OT environments, handles energy-specific protocols with custom nodes, and can be maintained by utility engineers without a consulting dependency.
The learning curve is real. n8n's visual interface is intuitive for simple workflows, but complex multi-step automation with error handling and state management requires programming thinking. ERPNext customization definitely requires Python skills and understanding of the Frappe framework.
Where this stack excels: utilities with 50-500 employees, moderate automation needs (dozens of workflows, not thousands), technical staff who can learn new tools, and NERC CIP compliance requirements that prohibit cloud solutions. If you need sub-second latency for real-time control loops, look elsewhere—this is for operational workflows, not real-time automation.
What we'd change: n8n needs better credential management out of the box. ERPNext's mobile interface is functional but clunky—field crews tolerate it but don't love it. The integration between the two requires custom code that we wish was standardized.
Bottom line: we deploy this stack repeatedly because it works, costs little, and utilities can own it completely. In an industry where vendor lock-in is the norm and commercial software treats you like a permanent revenue stream, that matters.