The Problem Vector Databases Actually Solve
Three years ago at EthosPower, we faced a problem that traditional databases couldn't touch: a utility client needed to search 40 years of incident reports, maintenance logs, and engineering drawings not by keywords, but by meaning. An operator typing "transformer overheating in summer" needed to find reports about "thermal runaway during peak load" and "cooling system failures in July." Traditional full-text search returned garbage. We needed semantic search, and that required a vector database.
Vector databases store high-dimensional embeddings—numerical representations of text, images, or other data—and perform similarity searches in milliseconds. When you convert "transformer overheating" into a 768-dimensional vector using a model like BAAI/bge-large-en-v1.5, a vector database finds the closest matching vectors from millions of stored documents. This is the foundation of every RAG system, every AI memory implementation, and every semantic search application.
In the energy sector, this matters because our operational knowledge is locked in unstructured data: maintenance procedures written in the 1980s, engineering specs in PDF format, operator notes in plain text. Vector databases make this knowledge accessible to LLMs without moving data to cloud services. For NERC CIP compliance, that's non-negotiable.
Architecture: What's Actually Happening
A vector database deployment has four components working together:
Embedding Generation
Before vectors hit the database, you need embeddings. At EthosPower, we run sentence-transformers/all-MiniLM-L6-v2 for fast, lightweight embeddings (384 dimensions) or BAAI/bge-large-en-v1.5 for higher quality (1024 dimensions). This happens in Ollama or a dedicated embedding service. The choice of embedding model is permanent—changing it means re-embedding your entire corpus.
Vector Storage and Indexing
The database stores vectors in specialized indexes. Most use HNSW (Hierarchical Navigable Small World) graphs, which sacrifice some accuracy for speed. Qdrant and Milvus both use HNSW. The index parameters (M and efConstruction) determine the trade-off between query speed and recall. In production, I set M=16 and efConstruction=200 for operational data that needs high recall.
Similarity Search
Queries come in as vectors, the database performs approximate nearest neighbor (ANN) search, and returns the top-k most similar vectors. The distance metric matters: cosine similarity for normalized embeddings, L2/Euclidean for spatial data. Energy sector use cases almost always use cosine.
Metadata Filtering
This is where vector databases separate from simple ANN libraries. You need to filter by date, asset ID, document type, security classification—then search within those results. Qdrant's filtering is fast because it uses inverted indexes for metadata. Milvus added decent filtering in 2.3, but it's still slower than Qdrant's implementation.
Deployment Reality: Air-Gapped and Compliant
At a midwestern utility running NERC CIP Critical Cyber Assets, we deployed Qdrant in air-gapped mode. The operational constraints were real:
Hardware Requirements
Qdrant runs efficiently on modest hardware. For 10 million vectors (768 dimensions), we used a single server with 32GB RAM and NVMe storage. Memory usage is roughly 4-8 bytes per dimension per vector, so 10M vectors at 768 dimensions = ~60GB on disk, ~30GB in RAM with quantization. Milvus needed more—it wants separate nodes for query and data, minimum three-node cluster for production.
Data Sovereignty
Every vector we store represents operational data. At EthosPower, we never send embeddings to external services. Qdrant runs entirely on-premises, no phone-home telemetry, no cloud dependencies. Weaviate can do this too, but its built-in vectorization modules want to call external APIs by default. ChromaDB works well for local development but isn't designed for distributed deployments.
Integration with RAG Pipelines
Our typical stack: n8n orchestrates document ingestion, Ollama generates embeddings, Qdrant stores vectors, AnythingLLM handles the RAG query interface. Qdrant's HTTP API makes integration straightforward. We push 50-100 documents per minute during bulk imports without performance degradation.
Backup and Recovery
Qdrant stores everything in a single data directory with write-ahead logs. We snapshot every six hours and stream WAL to backup storage. Recovery is straightforward—copy the data directory and restart. Milvus backup requires stopping the cluster and using minio, which is heavier operationally.
Scaling: When One Database Isn't Enough
At 50 million vectors, we hit Qdrant's practical single-node limit. Query latency jumped from 15ms to 200ms. We had three options:
Vertical Scaling
Adding RAM and faster NVMe helped, but only to a point. We maxed out at 128GB RAM and still saw degradation above 60 million vectors.
Horizontal Scaling with Sharding
Qdrant added distributed mode in 1.7. We sharded by asset type: one shard for generation assets, one for transmission, one for distribution. Query routing happens in application code. It works, but you lose the simplicity of a single database. Milvus handles this better—sharding is built-in and transparent to the application.
Multiple Specialized Databases
We ended up running three Qdrant instances: one for real-time operational data (high write rate, 7-day retention), one for historical archives (100M+ vectors, read-only), and one for engineering documents (moderate size, highest quality embeddings). This matched the operational reality better than trying to force everything into one system.
Monitoring and Operations
Qdrant exposes Prometheus metrics at /metrics. The critical metrics:
- Collection size: Number of vectors per collection. Watch for unexpected growth.
- Index memory usage: If this approaches available RAM, queries will hit disk and latency will spike.
- Query latency (p50, p95, p99): We target <50ms p99 for operational queries. Above 100ms, users complain.
- Indexing lag: If ingestion outpaces indexing, you'll build a backlog. This happens during bulk imports.
I run alerting on p99 latency >100ms and index memory >80% of available RAM. These catch problems before users notice.
Weaviate has similar metrics but uses its own format, not Prometheus-native. Milvus exposes Grafana dashboards, which are comprehensive but require running their full observability stack.
The Alternatives and Their Trade-offs
Weaviate: Hybrid Search and Generative Modules
Weaviate's hybrid search combines vector similarity with BM25 keyword search. This is valuable when you need both semantic and exact matching—searching maintenance procedures where specific part numbers matter. The built-in generative modules (question answering, summarization) are convenient but require external API access, which breaks air-gapped deployments.
Weaviate's GraphQL API is elegant but adds complexity if your team isn't familiar with GraphQL. At EthosPower, we prefer simple HTTP REST APIs that every developer understands.
Milvus: Trillion-Scale Performance
Milvus is built for massive scale. If you're storing billions of vectors, it's the right choice. The architecture separates compute and storage, allows GPU acceleration, and handles petabyte-scale deployments. But this comes with operational complexity: you're running etcd, MinIO, and Pulsar alongside Milvus itself. For energy sector deployments under 100 million vectors, Milvus is overengineered.
Milvus 2.4 added sparse vector support, which is interesting for hybrid dense-sparse retrieval. We haven't needed it yet.
ChromaDB: Developer-Friendly but Not Production-Ready
ChromaDB is excellent for development and testing. The Python API is simple, local mode runs without any infrastructure, and it handles embedding generation automatically. I use it for prototyping RAG applications.
But ChromaDB isn't designed for production at scale. The persistence layer is SQLite by default, which doesn't support concurrent writes well. The distributed mode exists but is immature. For energy sector production deployments, ChromaDB is a non-starter.
Neo4j: When Relationships Matter More Than Similarity
Neo4j is technically a graph database, not a vector database, but it added vector search in 5.13. If your data has rich relationships—this maintenance procedure applies to these specific transformers, installed by this contractor, with parts from this supplier—Neo4j's graph structure captures that better than pure vector similarity.
We use Neo4j alongside Qdrant in knowledge graph applications. Neo4j stores the relationships and metadata, Qdrant handles semantic search, and we join the results in application code. Neo4j's vector search is slower than Qdrant's (typically 3-5x), but the relationship queries are far more powerful.
Configuration Lessons from Production
After three years running Qdrant in production:
Use scalar quantization for embeddings: This reduces memory usage by 4x with minimal accuracy loss. Enable it from day one. In Qdrant: quantization_config={"scalar": {"type": "int8"}}.
Set aggressive on-disk payload: Store large text fields on disk, not in memory. Only the vectors need to be in RAM. This lets you index 2-3x more vectors on the same hardware.
Tune HNSW parameters for your workload: For high-recall scenarios (regulatory compliance, safety-critical searches), set ef=128 or higher at query time. For exploratory search, ef=32 is fine. The default ef=10 is too aggressive for energy sector use cases.
Don't over-shard: We initially sharded by month, thinking it would improve performance. It didn't. Sharding by operational domain (generation, transmission, distribution) worked better because queries rarely span domains.
Monitor index rebuild times: Qdrant rebuilds HNSW indexes in the background. If you're ingesting heavily, rebuilds can take hours. Plan maintenance windows accordingly.
Cost Reality
At EthosPower, we run Qdrant on-premises. Hardware cost for a 50M-vector deployment: $8K for a server (AMD EPYC 7543, 128GB RAM, 2TB NVMe). Hosting cost: $200/month for power and cooling. Staff time: roughly 4 hours per month for maintenance and monitoring.
Compare this to managed vector databases like Pinecone at $70 per million vectors per month. For 50M vectors, that's $3,500/month, or $42K/year—plus data egress fees, plus vendor lock-in. The economic case for self-hosted is clear.
The Verdict
For energy sector AI deployments, Qdrant is the pragmatic choice. It runs efficiently on modest hardware, handles air-gapped deployment without friction, and scales to 50-100 million vectors on a single node. The Rust implementation is fast and memory-efficient. The HTTP API integrates easily with n8n, Ollama, and AnythingLLM. For NERC CIP compliance, it checks every box: no external dependencies, no telemetry, full data sovereignty.
If you're building a RAG system for operational data, maintenance logs, or engineering documents, start with Qdrant. If your deployment grows beyond 100 million vectors and you have dedicated infrastructure staff, consider Milvus. If you need rich relationship queries alongside vector search, run Neo4j and Qdrant together. If you're prototyping, use ChromaDB locally.
Ignore Weaviate unless you specifically need hybrid search and are willing to accept its GraphQL complexity. Ignore managed services unless your organization has already moved operational data to the cloud—which, in the energy sector, most haven't and shouldn't.
The vector database landscape is maturing. In 2025, the question isn't whether to deploy one—it's which one fits your operational constraints. For most energy companies, that answer is Qdrant.