AI Architecture for Energy Sector: Practitioner Lessons from Real Deployments

The Architecture Nobody Talks About

Every AI architecture diagram you see online assumes reliable internet, centralized compute, and the ability to send data wherever you want. Then you walk into a generating station's control room and realize none of that applies. Your vector database needs to run on a server that hasn't been patched in six months because it's inside a NERC CIP boundary. Your LLM can't phone home for updates. Your knowledge graph needs to ingest P&IDs that were scanned from paper in 1987.

We've deployed AI systems across seven utilities and three refineries over the past three years. The reference architectures we started with looked nothing like what we ended up running. Here's what we learned about building AI infrastructure that actually works in energy operations.

The Three-Layer Reality

The conventional wisdom says to build a single unified AI platform. What we actually build is three distinct layers that rarely talk to each other:

Edge Layer: Runs at substations, wellheads, or plant floors. These are the AnythingLLM instances running on industrial PCs with 32GB RAM, processing maintenance logs and equipment manuals. No internet. Updates via sneakernet. The LLMs are quantized Llama models that fit in 8GB VRAM. We're not doing anything fancy here—just basic RAG over technical documentation so operators can ask questions without calling engineering at 2 AM.

Operations Layer: Lives in the OT network DMZ. This is where Qdrant runs our vector stores for the past decade of incident reports, maintenance records, and operating procedures. We pair it with Neo4j to maintain the knowledge graph of equipment relationships—which transformer feeds which breaker, which valve isolates which tank, the chain of dependencies that matters when something fails. This layer has controlled internet access for model updates, but all operational data stays inside.

Enterprise Layer: The IT network where we run the heavier workflows. SmythOS instances orchestrating multi-step processes like regulatory report generation, predictive maintenance scheduling, and capacity planning. This layer can use cloud APIs if needed, but it never touches OT data directly. Everything flows through the DMZ layer with strict data sanitization.

You cannot collapse these layers. We tried. The security team stopped us, and they were right.

Vector Stores: The Qdrant Lesson

Our first deployment used Pinecone because that's what all the tutorials recommended. Lasted six weeks before the compliance audit killed it. You cannot send equipment failure data to a third-party cloud service. Even anonymized. Even encrypted. The auditors don't care about your technical arguments.

We switched to Qdrant running on-premise and learned why it matters for energy operations:

Filtering Performance: Energy data is inherently hierarchical and time-bounded. We need to search across "all protective relay events at substations built before 2010 in the past 18 months." Qdrant's payload filtering is fast enough to do this in real-time. We're running collections with 40+ million vectors (ten years of SCADA events, maintenance logs, and sensor readings) and getting sub-100ms query times with complex filters. The Rust implementation means it doesn't fall over when you throw concurrent queries at it during an outage event.

Quantization: We run scalar quantization on most collections, which cuts memory usage by 4x with minimal accuracy loss on our use cases. This matters when you're running on physical servers you bought three years ago, not elastic cloud instances. Our largest deployment has 12 collections totaling 180GB of vectors running on a single 256GB server.

No Network Chattiness: Qdrant doesn't assume fast networking. The internal architecture is designed for single-node performance, which is exactly what you get in an air-gapped environment. When we eventually need to scale horizontally, it supports clustering, but we haven't hit that limit yet.

The embedding models we use are BGE-large or E5-large, running locally via Ollama. Updates happen quarterly via approved change control windows, not continuously via API.

Knowledge Graphs: Why Neo4j Won

We initially resisted graph databases because they felt like overkill. Just use a relational database, right? Wrong.

Energy infrastructure is fundamentally a graph. The power grid is a graph. A refinery's piping system is a graph. Equipment dependencies are a graph. When a circuit breaker trips, you need to know what's downstream, what's upstream, what alternative paths exist, and which protection schemes are in play. Doing this in Postgres requires joins that make DBAs cry.

Neo4j changed our incident response workflow:

Impact Analysis: When equipment fails, we run a Cypher query that traverses the graph to find all affected loads, alternative supply paths, and dependent systems. Takes 200ms for queries that used to take minutes in SQL. During a substation transformer failure last year, the operations team had a complete impact map in under ten seconds, including customers affected, backup options, and switching sequences.

Root Cause Navigation: We store failure modes, maintenance history, and environmental conditions as graph relationships. When investigating recurring failures, we can traverse patterns like "find all motors that failed within 90 days of a VFD upgrade, supplied by the same vendor, in high-vibration locations." This kind of multi-hop pattern matching is what graphs do well.

Living Documentation: P&IDs and one-lines become queryable data structures instead of PDF files. We extract the topology from drawings (painful process, worth it) and maintain it in Neo4j. Now engineering queries like "show me all isolation points between this tank and the flare header" return actual answers instead of requiring three engineers to review drawings for an hour.

We run Neo4j Enterprise in a three-node cluster for the operations layer (high availability matters when you're supporting 24/7 operations) and single instances at the edge. The graph rarely changes—infrastructure doesn't rewire itself daily—so replication is straightforward.

The RAG Architecture That Actually Works

Every RAG tutorial shows you a simple pipeline: embed documents, store in vector DB, retrieve, stuff into context, generate. In production, this naive approach fails in ways that cost you operational credibility.

What we actually run:

Hybrid Retrieval: Vector similarity alone misses critical context. We combine Qdrant vector search with Neo4j graph traversal and keyword search in Postgres. Example: operator asks about a transformer alarm. Vector search finds similar historical events. Graph traversal adds the equipment context (what's connected, recent maintenance). Keyword search catches the exact model number from the manual. We merge these results before sending to the LLM.

Temporal Awareness: Energy data is time-series by nature, but embeddings lose temporal information. We maintain separate vector collections per time period (last 30 days, last year, historical) with different quantization strategies. Recent data gets full precision; older data gets compressed. Queries specify time bounds explicitly, and we route to the appropriate collection.

Source Attribution: Every response must cite sources because our users make operational decisions based on this information. We modified AnythingLLM to force citation of document chunks in responses. If the LLM can't cite a source, it doesn't answer. This frustrates users initially, but they trust the system more once they understand why.

Confidence Thresholds: We reject vector search results below 0.75 similarity score (empirically determined for our embedding model and domain). Better to say "I don't know" than to hallucinate when someone's asking about protective relay settings. The production systems have a fallback prompt that explains what information is missing rather than making something up.

AnythingLLM serves as the interface layer because it handles the authentication, conversation memory, and workspace isolation we need without requiring custom development. We run instances at the edge for local document chat and in the operations layer for the full RAG pipeline with Qdrant and Neo4j integration.

The Orchestration Layer Nobody Expects

You need workflow automation whether you think you do or not. AI components don't spontaneously integrate with SCADA historians, ERP systems, and document management. Someone has to orchestrate the data pipelines.

We use SmythOS for multi-step workflows that span systems:

Incident Report Generation: When a protection event occurs, the workflow pulls SCADA data from the historian, retrieves similar historical events from Qdrant, queries the equipment graph in Neo4j, generates a preliminary report via LLM, and routes it to the right engineering team. The entire pipeline runs automatically within five minutes of the triggering event. Before automation, this took a day of manual work.

Maintenance Planning: Combines predictive signals from sensor data, equipment age from the asset database, spare parts inventory from ERP, and crew availability from scheduling systems. The orchestration determines optimal maintenance windows and generates work orders. The AI component ranks priority based on failure probability and operational impact.

Regulatory Compliance: Energy operations produce endless regulatory reports. We built workflows that extract required data from multiple sources, validate completeness, generate draft reports, and track submission deadlines. The LLM component ensures the narrative sections use approved language and cite correct references.

The visual workflow builder in SmythOS meant we could iterate quickly with domain experts rather than requiring Python developers for every change. Operations engineers can modify workflow logic themselves, which matters when requirements change mid-project (they always do).

What We'd Do Differently

If we started over tomorrow:

Start with the Graph: We should have built the Neo4j equipment model first, not last. Everything else hangs off that structural foundation. Your vector stores, your workflows, your analytics—all of them need to reference the canonical equipment graph. Building it later meant painful data migration and relationship reconciliation.

Separate Embedding Models by Domain: We used a single embedding model for all content types initially (incident reports, manuals, procedures, SCADA events). Accuracy improved noticeably when we fine-tuned domain-specific models. Maintenance procedures and incident reports have different semantic structures; one embedding model can't optimize for both.

Invest in Data Extraction Earlier: Half our deployment time goes to extracting structured data from PDFs, scanned documents, and legacy databases. The AI components are straightforward; the data archaeology is brutal. Budget more time for this than you think reasonable, then double it.

Build Confidence Metrics from Day One: Track retrieval quality, citation accuracy, and user satisfaction from the first deployment. You need this data to tune retrieval parameters and justify system improvements. We added monitoring late and regretted it.

Plan for Model Updates: We didn't initially design a process for updating embedding models and LLMs in production. When better models became available, we had no good way to migrate vector stores or test impact. Build the update pipeline as part of the initial architecture.

The Verdict

AI architecture in energy operations is mostly boring infrastructure work with occasional AI components inserted where they add value. You're building data pipelines, managing databases, orchestrating workflows, and worrying about security boundaries. The AI is almost an afterthought.

Your architecture will have three layers whether you plan for it or not: edge devices with local AI for immediate needs, an operations layer with serious vector and graph databases for real-time decision support, and an enterprise layer for orchestration and analytics. Don't fight this separation—embrace it and design clear interfaces between layers.

Qdrant handles vector search at operational scale in restricted environments. Neo4j provides the graph foundation that everything else references. AnythingLLM gives you document chat and RAG interfaces without building from scratch. SmythOS orchestrates the multi-system workflows that actually deliver business value. These four components form the core of every successful deployment we've done.

The hard parts aren't the AI components—those mostly work as advertised. The hard parts are data extraction from legacy systems, maintaining data quality in vector stores, keeping knowledge graphs synchronized with physical reality, and building trust with operators who rightfully distrust black boxes.

If you're building AI infrastructure for energy operations, spend less time on model selection and more time on data architecture. Get the graph right. Build robust data pipelines. Implement proper monitoring. The AI performance will follow once the foundation is solid. We learned this by doing it wrong first—you can learn from our mistakes instead.

Dimension	Qdrant + Neo4j	AnythingLLM	SmythOS
Query Performance	<100ms @ 40M vectors★★★★★	Good for <1M documents★★★☆☆	Orchestration not search★★★☆☆
Memory Efficiency	4x compression via quantization★★★★★	Moderate resource usage★★★☆☆	Depends on integrated tools★★★☆☆
Air-Gap Operation	Full functionality offline★★★★★	Complete offline capability★★★★★	Requires external connections★★☆☆☆
Graph Integration	External integration required★★★☆☆	No native graph support★★☆☆☆	Integrates multiple sources★★★★★
Deployment Complexity	Two separate systems★★★☆☆	Single installation★★★★★	Visual workflow builder★★★★★
Best For	Energy operations requiring both vector search and graph traversal	Edge deployments and departmental document chat applications	Multi-system workflow orchestration and enterprise AI agent coordination
Verdict	The combination we run in production—handles scale and security requirements that matter in OT environments.	Perfect interface layer for local RAG, but you'll outgrow the backend as data volume increases.	Not a vector database replacement—it's the orchestration layer that coordinates your AI components across system boundaries.

AI Architecture for Energy: What Three Years of Deployments Taught Us