Articles/AI Architecture

The Hierarchical Memory Pattern: Why Energy AI Systems Need Layered Knowledge Architecture

Qdrant Feature Landscape
AI ArchitectureArchitecture Pattern
By EthosPower EditorialFebruary 27, 20269 min readVerified Feb 27, 2026
anythingllm(primary)smythostaskmasterneo4jqdrant
AI ArchitectureKnowledge GraphsVector DatabasesRAG SystemsNERC CIPMemory ArchitectureGraph DatabasesEnergy AI

Pattern Context

We've deployed AI document retrieval systems for three different utilities over the past eighteen months. Every single one started with the same architecture: dump everything into a vector database, embed it, retrieve top-k results, feed to LLM. Every single one hit the same wall within weeks.

The problem isn't the technology—Qdrant performs beautifully at semantic search. The problem is that energy operations require two fundamentally different types of knowledge retrieval that a single-layer architecture cannot satisfy. You need fast semantic matching for document fragments AND deep relational traversal for operational context. Trying to solve both with one database creates a system that does neither well.

This pattern emerged from failure. A generation control operator asked our RAG system about ramping constraints during a wind forecast change. The system retrieved five relevant procedure fragments but missed that this particular plant had a custom ramp rate due to a turbine modification documented in a maintenance record from eight months prior. The semantic similarity was perfect. The operational context was absent.

The Problem Statement

Energy sector AI systems face three architectural tensions that single-layer vector stores cannot resolve:

First, NERC CIP audit trails demand complete provenance chains. When an AI system recommends a switching operation, auditors need to trace not just which document section was retrieved, but which prior incidents informed that procedure, which equipment history applies, which regulatory changes modified the guidance. Vector similarity scores don't capture these causal relationships.

Second, operational context is relational, not semantic. A procedure for breaker maintenance semantically matches procedures for dozens of breakers. But only specific breakers connect to specific feeders serving specific critical infrastructure customers under specific reliability agreements. That network of relationships determines which procedure variant applies. Graph traversal finds this. Vector search does not.

Third, knowledge freshness requirements differ by type. Procedure documents change quarterly. Equipment sensor data changes every four seconds. Regulatory interpretations change when FERC issues orders. Personnel certifications expire annually. A single embedding model with a single refresh cadence cannot serve all these timescales.

The naive solution—make the vector database bigger, embed more metadata, increase top-k retrieval—makes everything slower without solving the core problem. We tried this. At k=50 with metadata filtering, query latency on our Qdrant cluster hit 340ms p99. Still missing operational context. Just slower.

Solution Architecture

The hierarchical memory pattern separates concerns across two persistence layers with different query semantics:

Layer 1: Fast Semantic Retrieval (Vector Database)

Qdrant serves as the first-pass filter. We embed document chunks at 512-token boundaries with metadata tags for document type, asset class, and regulatory scope. The vector layer answers: "What text fragments are semantically similar to this query?" Query latency stays under 15ms p99 at our current 2.3M vector scale.

Critically, we embed only the semantic content. We do NOT try to encode relationships, lineage, or operational constraints into vector metadata. That's not what vectors are for.

Collection structure:

```

Vectors: 2.3M chunks, 1536 dimensions (OpenAI ada-002)

Metadata: doc_id, chunk_idx, doc_type, asset_class, timestamp

Filters: Pre-filter on asset_class and doc_type before vector search

Hardware: 3-node Qdrant cluster, 32GB RAM per node

```

Layer 2: Deep Relational Context (Knowledge Graph)

Neo4j stores the operational knowledge graph. Every document, equipment asset, procedure revision, incident report, and regulatory requirement becomes a node. Relationships capture causality, containment, applicability, succession, and authorization chains.

The graph layer answers: "Given these semantically relevant chunks, which ones apply to this specific operational context? What related information must be included for completeness?"

Core schema patterns:

```

(Procedure)-[:APPLIES_TO]->(Asset)

(Procedure)-[:SUPERSEDES]->(PriorVersion)

(Incident)-[:RESULTED_IN]->(ProcedureChange)

(Asset)-[:FEEDS]->(Customer)

(Employee)-[:CERTIFIED_FOR]->(Procedure)

(Regulation)-[:CONSTRAINS]->(Procedure)

```

The graph typically holds 50K-200K nodes for a mid-size utility. Query latency for 2-3 hop traversals runs 8-25ms depending on relationship fan-out.

Query Orchestration

AnythingLLM orchestrates the two-layer retrieval:

  1. User query arrives: "Can we defer the breaker maintenance on CB-447?"
  2. Extract entities: CB-447, breaker maintenance, deferral
  3. Query Qdrant: Vector search for maintenance procedures + deferral policies (top-10, <15ms)
  4. Extract doc_ids from Qdrant results
  5. Query Neo4j: Starting from those doc_ids, traverse to related assets, check if CB-447 feeds critical load, retrieve applicable NERC CIP requirements, check maintenance history (<20ms)
  6. Combine semantic chunks with relational context
  7. Feed enriched context to LLM (Llama 3.1 70B via Ollama)

Total retrieval latency: 35-45ms for the full hierarchical query. This is faster than our previous single-layer approach at k=50 AND provides complete operational context.

Implementation Considerations

Data Synchronization

The two layers must stay synchronized, but they update at different cadences. We run:

  • Qdrant re-embedding: Nightly batch for updated documents
  • Neo4j graph updates: Real-time via change-data-capture from ERPNext (asset management) and document control systems
  • Relationship inference: Weekly batch job to detect new implicit relationships (e.g., if a procedure references an asset by description but not ID)

Do NOT attempt real-time sync of embeddings. The computational cost of re-embedding every document change would crush your infrastructure. Nightly batch is sufficient for energy operations—procedures don't change hourly.

Access Control Propagation

NERC CIP requires role-based access control down to the document section level. We encode access control in Neo4j as relationships:

```

(Employee)-[:HAS_ROLE]->(Role)-[:CAN_ACCESS]->(Procedure)

```

During graph traversal, we filter results to only nodes reachable via the user's role relationships. Qdrant collections remain unfiltered—access control happens in the graph layer. This centralizes security policy and makes auditing tractable.

Embedding Model Selection

We tested three embedding models: OpenAI ada-002 (1536d), Sentence-BERT all-MiniLM-L6 (384d), and instructor-xl (768d). For energy domain technical documents:

  • ada-002: Best semantic accuracy on our evaluation set (87% relevance at k=10), but requires API calls
  • all-MiniLM: Runs locally, adequate accuracy (79% relevance), 4x faster embedding
  • instructor-xl: Domain-tunable, middle performance (82% relevance)

We chose ada-002 for production despite the API dependency. The accuracy gap was too significant. For air-gapped deployments, instructor-xl fine-tuned on energy procedures is the compromise.

Graph Schema Evolution

The knowledge graph schema will evolve as you discover new relationship types. Budget for this. We started with 8 relationship types. Eighteen months later we have 23. The key architectural decision: use Neo4j's schema-optional graph model, not a rigid schema-on-write approach. This lets you add new relationship types without migrations.

Document your schema in code, version it in git, but don't enforce it at the database level. Energy operations are too dynamic for rigid schemas.

Cost and Scaling

Infrastructure for a mid-size utility (2.3M vectors, 120K graph nodes):

  • Qdrant cluster: 3x VMs, 32GB RAM, 500GB SSD each (~$800/month cloud equivalent)
  • Neo4j: Single instance, 64GB RAM, 1TB SSD (~$400/month)
  • Ollama (LLM inference): 2x GPU nodes, A100 40GB (~$3000/month)
  • AnythingLLM orchestration: 16GB RAM, minimal CPU (~$100/month)

Total: ~$4300/month infrastructure. Compare this to enterprise RAG platforms at $50K+/year with vendor lock-in and data exfiltration concerns.

Qdrant scales horizontally—add nodes as vector count grows. Neo4j scales vertically to ~10M nodes on a single instance, then you need Neo4j clustering (enterprise license required, ~$150K/year).

Real-World Trade-Offs

Complexity vs. Capability

This is not a simple architecture. You're running two databases with different query languages, different backup strategies, different failure modes. Your team needs Cypher skills for Neo4j and understanding of vector operations for Qdrant. If your team is two people and you need something working next week, start with AnythingLLM's built-in vector store and accept the limitations.

But if you're building a system that will serve 200+ operations personnel making real-time decisions with compliance implications, the complexity pays for itself. Our operators trust the hierarchical system because it explains WHY a procedure applies, not just that it semantically matches.

Query Latency vs. Context Depth

Every graph hop adds latency. We limit traversals to 3 hops maximum. Beyond that, query time explodes and context relevance drops. This means some distant relationships won't surface. That's acceptable—better to return highly relevant context quickly than everything eventually.

We added query caching at the Neo4j layer for common traversal patterns (e.g., "find all procedures applicable to this asset class"). Cache hit rate runs 40-50%, reducing p99 latency from 25ms to 8ms for cached queries.

Data Sovereignty and Air Gaps

The hierarchical pattern works in air-gapped environments if you swap OpenAI embeddings for local models. We deployed this exact architecture at a utility with NERC CIP High security requirements. Everything runs on-premises. No data leaves the facility.

The trade-off: embedding quality drops from 87% to 82% relevance with instructor-xl. Operators notice. They need to scan one or two additional retrieved chunks to find the right information. Still vastly better than keyword search or no AI assistance.

Maintenance Burden

You're now maintaining two specialized databases. Qdrant is Rust-based, lightweight, minimal operational overhead. Neo4j is Java-based, heavier, requires JVM tuning and occasional heap adjustments. Budget 4-6 hours per month for routine maintenance (backups, updates, performance monitoring).

We run Qdrant on Kubernetes with automated failover. Neo4j runs on a dedicated VM with daily backups to S3-compatible object storage. Total ops burden for our platform team: ~8 hours/month across all components.

The Verdict

The hierarchical memory pattern is the only architecture we've found that satisfies both the semantic search requirements and the relational context requirements of energy sector AI systems. Single-layer vector stores fail at operational context. Pure graph databases are too slow for semantic search. You need both.

Implement this pattern when:

  • You have >500K document chunks and complex operational relationships
  • NERC CIP compliance requires complete audit trails and provenance
  • Operational decisions depend on equipment history, regulatory constraints, and organizational relationships
  • You have engineering team capacity to operate two database systems

Stick with simpler single-layer architectures when:

  • You're under 100K document chunks with minimal relational complexity
  • You need to ship a proof-of-concept in two weeks
  • Your team has limited database operations experience
  • Query latency requirements are loose (>200ms acceptable)

We've deployed the hierarchical pattern at three utilities now. Each one saw 60-70% reduction in operator time spent searching for applicable procedures, plus significantly higher confidence in AI-retrieved information. The architecture complexity is real, but for production energy AI systems, it's complexity that earns its keep.

Decision Matrix

DimensionQdrant + Neo4jQdrant OnlyAnythingLLM Built-in
Query Latency35-45ms combined★★★★☆15ms vector search★★★★★50-120ms★★★☆☆
Relational Depth3-hop traversal★★★★★Metadata filtering★★☆☆☆No graph support★☆☆☆☆
NERC CIP ComplianceFull audit trails★★★★★Limited provenance★★★☆☆Basic access control★★★☆☆
Operational ComplexityTwo databases★★★☆☆Single database★★★★★Turnkey setup★★★★★
Air-Gap CapableFully on-prem★★★★★Fully on-prem★★★★★Fully on-prem★★★★★
Best ForProduction energy AI with compliance and context requirementsFast proof-of-concepts under 100K documents with simple contextRapid deployment teams with <50K documents and loose latency needs
VerdictThe only architecture we've found that delivers both semantic search speed and operational relationship depth for energy sector deployments.Excellent for semantic search but fails when operational decisions require relationship traversal and complete lineage tracking.Great for getting started quickly but architectural limitations surface fast in production energy operations.

Last verified: Feb 27, 2026

Subscribe to engineering insights

Get notified when we publish new technical articles.

Topics

Unsubscribe anytime. View our Privacy Policy.