Hybrid Graph-Vector AI Architecture for Energy Operations | EthosPower

The Problem with Stateless AI in Energy Operations

We've deployed AI systems across seventeen power utilities in the last four years. The pattern is always the same: initial excitement, decent demos, then operational failure within three months. The reason? Engineers ask the AI about a substation event, get a reasonable answer, then ask a follow-up question thirty seconds later and the system has zero memory of the previous conversation. Worse, it can't distinguish between the Bakersfield substation (commissioned 1987, GE protection relays, history of transformer issues) and the Riverside substation (commissioned 2019, SEL equipment, clean record).

This isn't a model problem. GPT-4, Claude, Llama 3.1 — they all have the same fundamental limitation: no persistent operational memory. Vector databases alone don't solve this. Dumping your SCADA historian into Qdrant gives you semantic search, which is valuable, but it doesn't preserve the relational context that makes energy operations comprehensible. You lose the network topology, the equipment hierarchy, the cause-effect chains that experienced operators hold in their heads.

After three failed attempts at pure vector approaches, we developed what we now call the Hybrid Memory Architecture: Neo4j for structured operational knowledge, Qdrant for semantic document retrieval, and a coordination layer that queries both simultaneously. It's not elegant. It's not simple. But it's the only pattern we've found that actually works in production.

Why Energy Operations Break Pure Vector Approaches

Vector databases excel at similarity search. You embed a question, find similar document chunks, feed them to an LLM. For general knowledge work, this is often sufficient. For energy operations, it's catastrophically incomplete.

Consider a typical operator question: "Why did the voltage regulator on Bank 2 trip at the Henderson substation last Tuesday?" A pure vector approach retrieves documents mentioning voltage regulators, Bank 2, Henderson, and recent dates. You might get the alarm log, maybe a maintenance report, possibly a similar incident from 2019. What you don't get is the operational context:

Bank 2 feeds three industrial customers who run synchronized processes
The protection relay was replaced six months ago during the NERC CIP upgrade
That specific relay model has known issues with harmonic distortion above 3.2%
The upstream transmission line had a phase imbalance event 47 seconds before the trip
The same operator crew was on shift during the last two similar events

This contextual web is graph-native. It's nodes (equipment, people, events, locations) connected by typed relationships (feeds, protects, replaced_by, caused_by, operated_by). Vector search can't reconstruct these relationships from embedded text. It finds similar content, but similarity is not causality.

We learned this the hard way at a California IOU where we spent eight weeks tuning embedding models and chunk strategies, only to have operators ignore the system because it "didn't understand how the grid actually works." They were right.

The Hybrid Architecture: Structure Meets Semantics

The pattern we've settled on after four years of iteration:

Neo4j holds the structured operational ontology. Equipment hierarchy, network topology, maintenance history, operational procedures, staffing relationships, regulatory requirements. This is your system of record for "what connects to what and why it matters." We typically model:

Equipment nodes: transformers, breakers, relays, meters, each with properties like serial number, commission date, manufacturer, current firmware
Location nodes: substations, plants, zones with geographic and administrative hierarchies
Event nodes: alarms, maintenance activities, operational changes, each timestamped and attributed
Document nodes: procedures, manuals, compliance records with metadata but not full text
Personnel nodes: operators, engineers, contractors with certifications and shift assignments
Relationships: FEEDS, PROTECTS, LOCATED_AT, MAINTAINED_BY, CAUSED, PRECEDED, RELATED_TO

Qdrant holds the semantic document corpus. Full text of maintenance logs, incident reports, manufacturer manuals, engineering studies, NERC alerts, tribal knowledge captured from operator interviews. We chunk at the paragraph level (typically 200-400 tokens), embed with nomic-embed-text (768 dimensions), and maintain separate collections for different document classes to enable filtered retrieval.

AnythingLLM coordinates the hybrid query. When a user asks a question, we don't immediately jump to vector search. First, we extract entities and intent using a small, fast model (Llama 3.2-3B running in Ollama). If the question references specific equipment or locations, we query Neo4j to retrieve the relevant subgraph: the equipment node, its immediate relationships, recent events, connected systems. This gives us 2-5KB of structured context.

Then we construct a vector search using both the original question and Neo4j results. Qdrant filters by document type and date range extracted from the graph context, retrieves the top 8-12 semantically relevant chunks, and returns them with their metadata. The coordination layer merges graph context and vector results into a single prompt for the LLM, explicitly marking which information came from structured records versus unstructured documents.

The LLM sees both the relational facts ("this relay protects Bank 2, which feeds Customers A, B, C") and the semantic content ("maintenance report from last month mentions intermittent communication errors"). It can reason about causality because it has the graph structure. It can reference specific technical details because it has the document chunks.

Implementation: The Practical Details

We run this stack entirely on-premises for NERC CIP compliance. Typical deployment:

Neo4j Enterprise 5.x on three-node cluster (HA for production environments, single instance for development)
Qdrant 1.11+ on dedicated vector servers with NVMe storage (quantization enabled for collections over 5M vectors)
Ollama hosting Llama 3.1-70B for reasoning, Llama 3.2-3B for entity extraction, nomic-embed-text for embeddings
AnythingLLM as the orchestration and UI layer, custom plugins for Neo4j integration
All components behind the OT security boundary, no internet connectivity, no cloud dependencies

Data ingestion pipeline: We built a Python service that monitors document repositories (SharePoint, network drives, ERPNext attachments) and SCADA historians. New documents get chunked, embedded, and loaded into Qdrant with metadata. Simultaneously, structured events from SCADA, work orders from ERPNext, and equipment changes from the CMMS update Neo4j through its Cypher API. The pipeline runs continuously but rate-limited to avoid impacting operational systems.

Graph schema evolution: This is the hard part. Energy companies have decades of inconsistent data. Equipment IDs change between systems. Location hierarchies don't match between GIS and SCADA. We spend 60-70% of project time on data normalization and schema design. Our approach: start with a minimal viable graph (equipment, locations, one level of hierarchy), prove value, then incrementally add complexity. Don't try to model everything upfront. You'll fail.

Query coordination logic: AnythingLLM's agent framework lets us define a multi-step process: extract entities, query Neo4j, construct enhanced vector query, retrieve from Qdrant, merge results, generate response. We expose this as a custom workspace tool. The entity extraction step is critical — we use a fine-tuned Llama 3.2-3B that recognizes energy-specific entities (equipment types, voltage levels, protection schemes) with 91% accuracy, dramatically better than general-purpose NER models.

Performance characteristics: Entity extraction: 80-150ms. Neo4j subgraph retrieval: 20-80ms for typical queries (well-indexed graphs with <10M nodes). Qdrant vector search: 15-40ms for collections under 2M vectors, 60-120ms for larger collections. LLM generation: 800ms-3s depending on response length and model. Total user-perceived latency: 2-5 seconds, which operators find acceptable for complex analytical questions.

Where This Pattern Struggles

Honesty time: this architecture is not simple. You're managing two databases with different consistency models, different backup strategies, different operational characteristics. When something breaks, you're debugging graph queries, vector indexes, embedding pipelines, and LLM prompts simultaneously.

Data synchronization is a constant battle. Equipment gets decommissioned in SCADA but lingers in Neo4j. Documents get updated but old chunks remain in Qdrant. We run nightly reconciliation jobs and maintain audit logs, but drift happens. You need someone who understands both graph and vector databases, which is a rare skill set.

Cost and complexity for small deployments. If you're a municipal utility with three substations and 50 pieces of major equipment, this is overkill. A well-configured vector database with rich metadata might suffice. This pattern makes sense at scale: dozens of substations, thousands of equipment nodes, decades of documentation, multiple interconnected systems.

Query optimization requires deep understanding. Naive graph traversals can explode in complexity ("find all equipment within three hops of this transformer" might return 10,000 nodes). Naive vector searches return irrelevant results when embeddings don't capture technical precision ("transformer" the equipment vs. "transformer" the neural network architecture). You need engineers who understand both domains and can tune queries based on operational feedback.

The graph schema becomes technical debt. As your model evolves, you'll want to refactor relationships and node types. Unlike relational databases with ALTER TABLE, graph migrations are painful. We version our schema and maintain migration scripts, but every major refactor requires coordination with downstream applications.

Alternative Patterns We've Tried

Before settling on this hybrid approach, we experimented with several alternatives:

Pure vector with metadata filtering: Store everything in Qdrant, use rich metadata for filtering. Fast to implement, works for document search, completely fails for multi-hop reasoning about operational relationships. Operators couldn't ask "what else is affected by this" questions.

Graph database with embedded documents: Store full document text as properties on Neo4j nodes, use full-text search. Graph traversal is excellent, semantic search is terrible. Neo4j's full-text isn't vector-aware, so you lose the semantic matching that makes modern RAG effective.

LLM-generated graph from vector results: Retrieve documents via vector search, have the LLM construct relationship graphs on-the-fly. Interesting in theory, catastrophically slow and unreliable in practice. LLMs hallucinate relationships, especially under token pressure. Also, regenerating the graph for every query wastes compute.

SmythOS visual workflows with embedded logic: We tested SmythOS for the coordination layer. Excellent for rapid prototyping, great visual debugging, but we found the abstraction layer added latency and made it harder to optimize critical paths. For production energy systems where 5-second response times matter, we needed lower-level control. SmythOS is genuinely excellent for business process automation where workflows change frequently, less ideal for performance-critical operational systems.

Task Master AI for decomposition: Attempted to use Task Master to break complex operator questions into subtasks that could be routed to appropriate data sources. The task decomposition worked well, but the overhead of managing task state and coordinating async queries added 2-3 seconds per interaction. For analytical workflows where an operator is investigating an incident over 20-30 minutes, this might work. For real-time operational questions, it's too slow.

The Verdict

If you're building AI for energy operations and your system needs to answer questions that span equipment relationships, historical context, and unstructured technical knowledge — and you're working at a scale where you have hundreds or thousands of equipment nodes and years of documentation — the hybrid graph-vector architecture is the only pattern we've found that actually works in production.

You'll spend more time on data modeling upfront. You'll need engineers who understand both graph databases and vector embeddings. You'll maintain more infrastructure. But you'll build AI that operators actually trust and use, because it understands the operational context that makes energy systems comprehensible.

Start with Neo4j for your equipment hierarchy and basic topology. Add Qdrant for your technical documentation. Use AnythingLLM to coordinate queries and provide the user interface. Iterate on the schema based on real operator questions. Budget 3-4 months for the initial implementation and expect to spend 20% of an engineer's time on ongoing maintenance and optimization.

This isn't the simplest architecture. It's the one that works when the alternative is operators ignoring your AI system and going back to Ctrl+F through network drives full of PDFs.

Dimension	Neo4j + Qdrant	Qdrant Only	AnythingLLM + SmythOS
Relational Query Speed	20-80ms subgraph★★★★★	N/A★☆☆☆☆	Workflow overhead★★★☆☆
Semantic Search Quality	15-40ms vector★★★★★	15-40ms vector★★★★★	Depends on backend★★★★☆
Hybrid Coordination	Custom integration★★★★☆	Metadata filters★★★☆☆	Visual workflows★★★★★
Deployment Complexity	Two databases★★★☆☆	Single database★★★★★	Abstraction layer★★★★☆
Energy Sector Fit	Purpose-built★★★★★	Document search★★★☆☆	General purpose★★★☆☆
Best For	Complex operational AI needing both structure and semantics	Document-centric use cases without complex relational context	Rapid prototyping and business process automation
Verdict	The only pattern we've found that handles multi-hop equipment reasoning and semantic document search at production scale.	Fast to implement and excellent for semantic search, but cannot answer relationship-heavy operational questions.	Excellent coordination and visual debugging, but added latency makes it less suitable for performance-critical operational systems.

The Hybrid Memory Architecture: Graph + Vector for Energy AI