Articles/Vector Databases

Vector Databases: The Infrastructure Reality Behind Enterprise RAG

Qdrant Feature Landscape
Vector DatabasesStack Deep-Dive
By EthosPower EditorialMarch 26, 20269 min readVerified Mar 26, 2026
qdrant(primary)weaviatemilvuschromadbneo4j
vector-databasesqdrantragai-infrastructuresemantic-searchopen-sourceembeddingsnerc-cip

The Problem Vector Databases Actually Solve

Three years ago at EthosPower, we faced a problem that traditional databases couldn't touch: a utility client needed to search 40 years of incident reports, maintenance logs, and engineering drawings not by keywords, but by meaning. An operator typing "transformer overheating in summer" needed to find reports about "thermal runaway during peak load" and "cooling system failures in July." Traditional full-text search returned garbage. We needed semantic search, and that required a vector database.

Vector databases store high-dimensional embeddings—numerical representations of text, images, or other data—and perform similarity searches in milliseconds. When you convert "transformer overheating" into a 768-dimensional vector using a model like BAAI/bge-large-en-v1.5, a vector database finds the closest matching vectors from millions of stored documents. This is the foundation of every RAG system, every AI memory implementation, and every semantic search application.

In the energy sector, this matters because our operational knowledge is locked in unstructured data: maintenance procedures written in the 1980s, engineering specs in PDF format, operator notes in plain text. Vector databases make this knowledge accessible to LLMs without moving data to cloud services. For NERC CIP compliance, that's non-negotiable.

Architecture: What's Actually Happening

A vector database deployment has four components working together:

Embedding Generation

Before vectors hit the database, you need embeddings. At EthosPower, we run sentence-transformers/all-MiniLM-L6-v2 for fast, lightweight embeddings (384 dimensions) or BAAI/bge-large-en-v1.5 for higher quality (1024 dimensions). This happens in Ollama or a dedicated embedding service. The choice of embedding model is permanent—changing it means re-embedding your entire corpus.

Vector Storage and Indexing

The database stores vectors in specialized indexes. Most use HNSW (Hierarchical Navigable Small World) graphs, which sacrifice some accuracy for speed. Qdrant and Milvus both use HNSW. The index parameters (M and efConstruction) determine the trade-off between query speed and recall. In production, I set M=16 and efConstruction=200 for operational data that needs high recall.

Similarity Search

Queries come in as vectors, the database performs approximate nearest neighbor (ANN) search, and returns the top-k most similar vectors. The distance metric matters: cosine similarity for normalized embeddings, L2/Euclidean for spatial data. Energy sector use cases almost always use cosine.

Metadata Filtering

This is where vector databases separate from simple ANN libraries. You need to filter by date, asset ID, document type, security classification—then search within those results. Qdrant's filtering is fast because it uses inverted indexes for metadata. Milvus added decent filtering in 2.3, but it's still slower than Qdrant's implementation.

Deployment Reality: Air-Gapped and Compliant

At a midwestern utility running NERC CIP Critical Cyber Assets, we deployed Qdrant in air-gapped mode. The operational constraints were real:

Hardware Requirements

Qdrant runs efficiently on modest hardware. For 10 million vectors (768 dimensions), we used a single server with 32GB RAM and NVMe storage. Memory usage is roughly 4-8 bytes per dimension per vector, so 10M vectors at 768 dimensions = ~60GB on disk, ~30GB in RAM with quantization. Milvus needed more—it wants separate nodes for query and data, minimum three-node cluster for production.

Data Sovereignty

Every vector we store represents operational data. At EthosPower, we never send embeddings to external services. Qdrant runs entirely on-premises, no phone-home telemetry, no cloud dependencies. Weaviate can do this too, but its built-in vectorization modules want to call external APIs by default. ChromaDB works well for local development but isn't designed for distributed deployments.

Integration with RAG Pipelines

Our typical stack: n8n orchestrates document ingestion, Ollama generates embeddings, Qdrant stores vectors, AnythingLLM handles the RAG query interface. Qdrant's HTTP API makes integration straightforward. We push 50-100 documents per minute during bulk imports without performance degradation.

Backup and Recovery

Qdrant stores everything in a single data directory with write-ahead logs. We snapshot every six hours and stream WAL to backup storage. Recovery is straightforward—copy the data directory and restart. Milvus backup requires stopping the cluster and using minio, which is heavier operationally.

Scaling: When One Database Isn't Enough

At 50 million vectors, we hit Qdrant's practical single-node limit. Query latency jumped from 15ms to 200ms. We had three options:

Vertical Scaling

Adding RAM and faster NVMe helped, but only to a point. We maxed out at 128GB RAM and still saw degradation above 60 million vectors.

Horizontal Scaling with Sharding

Qdrant added distributed mode in 1.7. We sharded by asset type: one shard for generation assets, one for transmission, one for distribution. Query routing happens in application code. It works, but you lose the simplicity of a single database. Milvus handles this better—sharding is built-in and transparent to the application.

Multiple Specialized Databases

We ended up running three Qdrant instances: one for real-time operational data (high write rate, 7-day retention), one for historical archives (100M+ vectors, read-only), and one for engineering documents (moderate size, highest quality embeddings). This matched the operational reality better than trying to force everything into one system.

Monitoring and Operations

Qdrant exposes Prometheus metrics at /metrics. The critical metrics:

  • Collection size: Number of vectors per collection. Watch for unexpected growth.
  • Index memory usage: If this approaches available RAM, queries will hit disk and latency will spike.
  • Query latency (p50, p95, p99): We target <50ms p99 for operational queries. Above 100ms, users complain.
  • Indexing lag: If ingestion outpaces indexing, you'll build a backlog. This happens during bulk imports.

I run alerting on p99 latency >100ms and index memory >80% of available RAM. These catch problems before users notice.

Weaviate has similar metrics but uses its own format, not Prometheus-native. Milvus exposes Grafana dashboards, which are comprehensive but require running their full observability stack.

The Alternatives and Their Trade-offs

Weaviate: Hybrid Search and Generative Modules

Weaviate's hybrid search combines vector similarity with BM25 keyword search. This is valuable when you need both semantic and exact matching—searching maintenance procedures where specific part numbers matter. The built-in generative modules (question answering, summarization) are convenient but require external API access, which breaks air-gapped deployments.

Weaviate's GraphQL API is elegant but adds complexity if your team isn't familiar with GraphQL. At EthosPower, we prefer simple HTTP REST APIs that every developer understands.

Milvus: Trillion-Scale Performance

Milvus is built for massive scale. If you're storing billions of vectors, it's the right choice. The architecture separates compute and storage, allows GPU acceleration, and handles petabyte-scale deployments. But this comes with operational complexity: you're running etcd, MinIO, and Pulsar alongside Milvus itself. For energy sector deployments under 100 million vectors, Milvus is overengineered.

Milvus 2.4 added sparse vector support, which is interesting for hybrid dense-sparse retrieval. We haven't needed it yet.

ChromaDB: Developer-Friendly but Not Production-Ready

ChromaDB is excellent for development and testing. The Python API is simple, local mode runs without any infrastructure, and it handles embedding generation automatically. I use it for prototyping RAG applications.

But ChromaDB isn't designed for production at scale. The persistence layer is SQLite by default, which doesn't support concurrent writes well. The distributed mode exists but is immature. For energy sector production deployments, ChromaDB is a non-starter.

Neo4j: When Relationships Matter More Than Similarity

Neo4j is technically a graph database, not a vector database, but it added vector search in 5.13. If your data has rich relationships—this maintenance procedure applies to these specific transformers, installed by this contractor, with parts from this supplier—Neo4j's graph structure captures that better than pure vector similarity.

We use Neo4j alongside Qdrant in knowledge graph applications. Neo4j stores the relationships and metadata, Qdrant handles semantic search, and we join the results in application code. Neo4j's vector search is slower than Qdrant's (typically 3-5x), but the relationship queries are far more powerful.

Configuration Lessons from Production

After three years running Qdrant in production:

Use scalar quantization for embeddings: This reduces memory usage by 4x with minimal accuracy loss. Enable it from day one. In Qdrant: quantization_config={"scalar": {"type": "int8"}}.

Set aggressive on-disk payload: Store large text fields on disk, not in memory. Only the vectors need to be in RAM. This lets you index 2-3x more vectors on the same hardware.

Tune HNSW parameters for your workload: For high-recall scenarios (regulatory compliance, safety-critical searches), set ef=128 or higher at query time. For exploratory search, ef=32 is fine. The default ef=10 is too aggressive for energy sector use cases.

Don't over-shard: We initially sharded by month, thinking it would improve performance. It didn't. Sharding by operational domain (generation, transmission, distribution) worked better because queries rarely span domains.

Monitor index rebuild times: Qdrant rebuilds HNSW indexes in the background. If you're ingesting heavily, rebuilds can take hours. Plan maintenance windows accordingly.

Cost Reality

At EthosPower, we run Qdrant on-premises. Hardware cost for a 50M-vector deployment: $8K for a server (AMD EPYC 7543, 128GB RAM, 2TB NVMe). Hosting cost: $200/month for power and cooling. Staff time: roughly 4 hours per month for maintenance and monitoring.

Compare this to managed vector databases like Pinecone at $70 per million vectors per month. For 50M vectors, that's $3,500/month, or $42K/year—plus data egress fees, plus vendor lock-in. The economic case for self-hosted is clear.

The Verdict

For energy sector AI deployments, Qdrant is the pragmatic choice. It runs efficiently on modest hardware, handles air-gapped deployment without friction, and scales to 50-100 million vectors on a single node. The Rust implementation is fast and memory-efficient. The HTTP API integrates easily with n8n, Ollama, and AnythingLLM. For NERC CIP compliance, it checks every box: no external dependencies, no telemetry, full data sovereignty.

If you're building a RAG system for operational data, maintenance logs, or engineering documents, start with Qdrant. If your deployment grows beyond 100 million vectors and you have dedicated infrastructure staff, consider Milvus. If you need rich relationship queries alongside vector search, run Neo4j and Qdrant together. If you're prototyping, use ChromaDB locally.

Ignore Weaviate unless you specifically need hybrid search and are willing to accept its GraphQL complexity. Ignore managed services unless your organization has already moved operational data to the cloud—which, in the energy sector, most haven't and shouldn't.

The vector database landscape is maturing. In 2025, the question isn't whether to deploy one—it's which one fits your operational constraints. For most energy companies, that answer is Qdrant.

Decision Matrix

DimensionQdrantMilvusChromaDB
Query Latency (p99)15-50ms at 50M vectors★★★★★20-80ms at 50M vectors★★★★☆100-300ms at 10M vectors★★☆☆☆
Memory Efficiency30GB RAM w/ quantization★★★★★45GB RAM for same dataset★★★☆☆40GB RAM, inefficient★★★☆☆
Operational ComplexitySingle binary, one config file★★★★★Multi-node cluster required★★☆☆☆Simple for dev, weak prod★★★☆☆
Air-Gapped DeploymentZero external dependencies★★★★★Possible but complex setup★★★☆☆Yes, local SQLite mode★★★★☆
Scale Ceiling100M vectors single node★★★★☆Billions of vectors★★★★★10-20M vectors practical★★☆☆☆
Best ForEnergy sector RAG, NERC CIP compliance, operational simplicityMassive scale deployments with dedicated infrastructure teamsLocal development and RAG prototyping
VerdictThe pragmatic choice for most energy AI deployments under 100M vectors.Overengineered for typical energy sector needs, but unmatched at billion-vector scale.Excellent for testing, not suitable for production energy sector deployments.

Last verified: Mar 26, 2026

Subscribe to engineering insights

Get notified when we publish new technical articles.

Topic:Vector Databases & RAG

Unsubscribe anytime. View our Privacy Policy.