LLM Infrastructure Decision Guide for Energy Sector | EthosPower

The Real Problem

Every energy company we work with faces the same constraint: you cannot send operational data to OpenAI, Anthropic, or any cloud LLM provider. Period. NERC CIP-011 makes that explicit for bulk electric systems. For oil and gas, it's TSA Pipeline Security Directives and internal risk policies. For renewables, it's competitive intelligence about site performance and maintenance patterns.

The promise of LLMs for analyzing alarm logs, maintenance records, SCADA historian data, and engineering documentation is real. We've seen 40-60% reduction in mean time to diagnose grid disturbances when operations engineers can query decades of incident reports in natural language. The challenge is running this infrastructure inside your security perimeter without creating new vulnerabilities or dependencies on external services.

We've deployed five different LLM infrastructure stacks in production energy environments over the past eighteen months. Here's what actually matters when choosing your approach.

Decision Criteria That Matter

Forget the feature comparison matrices from vendor websites. In energy operations, your LLM infrastructure needs to satisfy four non-negotiable requirements and three operational preferences.

Non-Negotiable Requirements

Data sovereignty: Every byte of data and every model inference must stay within your network boundary. No telemetry, no phone-home, no "anonymous usage statistics." Many tools claim to be local but still ping external services for updates or analytics.

Air-gap capability: Can you run this completely disconnected from the internet? Not just theoretically—can you actually install, configure, and operate it on a network with zero external connectivity? We have three clients running LLM infrastructure on networks that have never touched the public internet.

Resource efficiency: Your OT environment isn't a hyperscaler datacenter. You're running on a Dell PowerEdge R750 with two RTX A5000 GPUs in a substation control house, not a rack of H100s. The infrastructure must deliver useful results within these constraints.

Audit trail: NERC CIP-010 requires you to document what software is running, how it's configured, and what it's accessing. Your LLM infrastructure needs proper logging, version control, and the ability to prove to auditors exactly what models processed what data.

Operational Preferences

Multi-user support: Operations, engineering, and maintenance teams need simultaneous access with appropriate permissions. Single-user desktop apps create bottlenecks.

Document ingestion: Your valuable knowledge is in PDF maintenance manuals, Word procedures, Excel fault tables, and PPT training decks. The system needs to handle these formats without requiring a data engineering team.

Integration capability: Eventually you'll want to connect this to your CMMS, historian, or document management system. API access and programmatic interaction matter.

The Infrastructure Options

Let's evaluate the five realistic options for running LLM infrastructure in energy operations. We're excluding cloud services entirely—they don't meet requirement one.

Ollama: The Model Runtime

Ollama is not a complete solution. It's a model serving runtime that makes it trivially easy to run models like Llama 3.1, Mistral, Qwen, and 100+ others on your hardware. Think of it as the engine, not the car.

We run Ollama in every deployment because it solves the hardest problem: making models actually work on diverse hardware without spending days debugging CUDA versions and Python dependencies. Installation is a single command. Model download is ollama pull llama3.1:70b. Inference is a REST API call.

The limitation: Ollama gives you raw model access, nothing more. No user interface, no document processing, no conversation history, no access controls. You need to build or deploy additional layers for actual operational use.

In our deployments, Ollama runs as a systemd service on Ubuntu 22.04 LTS, typically allocating 48GB RAM and both GPUs. We use the OpenAI-compatible API endpoint so downstream applications don't need Ollama-specific code.

AnythingLLM: The Enterprise Knowledge Base

AnythingLLM is a complete RAG (Retrieval-Augmented Generation) platform designed for organizations that need document-based AI with proper access controls and multi-user support. It's the tool we deploy most often for energy sector clients.

The architecture is clean: Qdrant or Chroma for vector storage, Ollama for LLM inference, and a Node.js application server that handles document processing, user management, and workspace organization. You can run everything on a single server or split components across multiple machines.

What makes it work for energy operations:

Workspace isolation: Create separate workspaces for operations, engineering, and maintenance with different document sets and access controls. Your compliance team can verify that relay protection documents never mix with financial data.
Document connectors: Direct ingestion from file shares, web scraping, and API connections. We've connected it to SharePoint document libraries and network drives containing 20+ years of engineering records.
Agent capabilities: Built-in agents can query databases, execute code, and chain multiple tools together. We've deployed agents that query SCADA historians, look up equipment in the CMMS, and retrieve relevant procedures—all in a single conversation.
Audit logging: Every query, every document accessed, every response generated gets logged with user attribution and timestamps.

The deployment reality: expect 2-3 days for initial setup including document ingestion pipeline configuration and workspace design. The web interface is production-ready, no custom development required. Memory usage scales with document corpus size—we typically allocate 64GB RAM for systems managing 50,000+ documents.

AnythingLLM is what you deploy when multiple teams need shared access to organizational knowledge with proper governance.

Open WebUI: The Self-Service Interface

Open WebUI (formerly Ollama WebUI) is a lightweight web interface for interacting with local LLMs. Think of it as a self-hosted ChatGPT clone that connects to your Ollama instance.

The design philosophy is minimalist: give users a clean chat interface with conversation history, model selection, and basic document upload. No complex workspace management, no extensive administration, no built-in RAG pipeline.

Where it fits: pilot projects, individual power plants, small teams that need quick LLM access without enterprise complexity. We've deployed it at remote renewable energy sites where 5-10 people need to query operational procedures and equipment manuals.

The setup is genuinely simple—Docker Compose file, five environment variables, done. Integration with Ollama is seamless. Users can switch between models mid-conversation, upload documents for one-off questions, and save conversation threads.

The limitations become apparent at scale: no sophisticated access controls, document uploads are per-conversation (not persistent knowledge base), and you're managing conversation history in a SQLite database. For 50+ users or sensitive multi-team deployments, you'll outgrow it.

LibreChat: The Kitchen Sink

LibreChat attempts to be a universal interface for every LLM provider—OpenAI, Anthropic, Google, Azure, and local models via Ollama. It includes agents, memory systems, multi-modal chat, and MCP (Model Context Protocol) integration.

The feature list is impressive. The operational reality in energy environments is complicated.

We've deployed LibreChat twice. Both times, the configuration complexity created friction with operations teams. The system assumes you want access to multiple LLM providers and need to manage API keys, rate limits, and provider-specific features. When your requirement is "run Llama 3.1 70B on our hardware with zero external dependencies," 80% of LibreChat's features become configuration overhead.

The agent framework and MCP integration are genuinely advanced—better than AnythingLLM's agent system in some respects. If you're building custom AI workflows that need to interact with multiple external systems and you have dedicated AI engineering resources, LibreChat's flexibility is valuable.

For most energy operations teams that need "talk to our documents and get answers," the complexity-to-value ratio doesn't work. Installation requires managing MongoDB, Redis, and multiple Node.js services. Updates can break configurations in subtle ways.

Msty: The Desktop Experience

Msty is a native desktop application for running local LLMs with GPU acceleration. It's beautiful, fast, and genuinely pleasant to use. The Shadow Personas feature (multiple AI personalities with different configurations) is clever. MCP integration works well.

The fundamental constraint: it's a single-user desktop app. In energy operations, that's a deployment dead-end.

We don't deploy Msty for operational use. We recommend it to individual engineers who want to experiment with local LLMs on their workstations before the organization commits to shared infrastructure. It's excellent for that purpose—the best desktop LLM experience available.

But when your operations center needs three shifts of engineers querying the same knowledge base, or when your compliance team needs to audit who accessed what information, a desktop app can't scale.

The Decision Framework

Here's how to choose based on your actual situation:

Start here: Install Ollama. Regardless of which interface layer you choose, you need a reliable model runtime. Get Ollama running, download Llama 3.1 70B and Qwen 2.5 32B, verify GPU acceleration works. This takes an afternoon.

For multi-user operational deployment with document RAG: Deploy AnythingLLM. It's the proven path for energy sector use cases. The workspace model maps cleanly to organizational structure. Document ingestion handles the messy reality of legacy file formats. Access controls satisfy compliance requirements.

For pilot projects and small sites: Open WebUI gives you a functional interface in hours, not days. Use it to prove value, then migrate to AnythingLLM when you need enterprise features. We've done this migration three times—conversation history doesn't transfer cleanly, but it's manageable.

For individual experimentation: Give engineers Msty on their workstations. It lowers the barrier to understanding what local LLMs can actually do. The insights from individual experimentation inform better requirements for organizational deployment.

For advanced custom AI workflows: Only choose LibreChat if you have dedicated AI engineering resources and need sophisticated agent capabilities that AnythingLLM doesn't provide. Be honest about your team's capacity to manage the complexity.

Deployment Architecture That Works

Our standard deployment for a medium-sized utility (2000 employees, three control centers):

Hardware: Dell PowerEdge R750, dual Xeon Gold 6338, 256GB RAM, two RTX A5000 GPUs (24GB each)
OS: Ubuntu 22.04 LTS, minimal installation
Model runtime: Ollama 0.4.x serving Llama 3.1 70B (primary) and Qwen 2.5 32B (secondary)
Vector database: Qdrant 1.11.x, dedicated volume for indexes
Application layer: AnythingLLM with three workspaces (Operations, Engineering, Corporate)
Network: Single NIC on OT network segment, no internet access, NTP via local stratum-1 server
Backup: Daily snapshots of vector database and application config to isolated backup server

This handles 150-200 queries per day across three shifts with sub-10-second response times for most questions. Document corpus is currently 47,000 files totaling 280GB of processed content.

Total capital cost including hardware: $38,000. Annual operational cost (power, cooling, maintenance): approximately $4,200. Compare this to enterprise AI platform quotes we've seen ranging from $250,000 to $800,000 annually.

The Procurement Reality

Your procurement team will ask about support contracts and vendor liability. Here's what we tell them:

Ollama, AnythingLLM, and Open WebUI are open-source projects with active communities but no commercial support contracts. This is a feature, not a bug. Commercial "enterprise AI platforms" create vendor lock-in and ongoing license costs that will consume your budget for years.

The risk mitigation is operational: you deploy on hardware you control, you maintain the stack using standard Linux administration practices, and you build internal knowledge of how the components work. We provide our clients with detailed runbooks, but the systems are straightforward enough that competent infrastructure teams can maintain them.

For organizations that absolutely require commercial support, Red Hat offers Instructlab and model serving capabilities through OpenShift AI. It's more expensive and less flexible, but it comes with enterprise support contracts and someone to blame. We haven't needed to deploy it yet.

What About Model Selection

The infrastructure discussion is incomplete without addressing which models to actually run. We default to Llama 3.1 70B Instruct for most deployments. It's large enough to handle complex reasoning about technical documents, small enough to run on practical hardware, and Meta's license permits commercial use.

For specialized applications, we've had good results with:

Qwen 2.5 32B: Better at structured data extraction and technical writing than Llama models of similar size
Mistral 7B: When you need faster responses and the questions are straightforward
DeepSeek Coder: Specifically for generating Python scripts and SQL queries from natural language

The infrastructure we're discussing supports switching models without architectural changes. Start with Llama 3.1 70B, experiment with alternatives as specific needs emerge.

The Verdict

Deploy Ollama as your model runtime—this is non-negotiable and uncontroversial. For the application layer, choose AnythingLLM unless you have specific reasons not to. It's the only option that delivers complete RAG functionality, proper multi-user support, and audit capabilities in a package that energy operations teams can actually deploy and maintain.

Open WebUI is a reasonable choice for pilot projects under 20 users. Plan to migrate to AnythingLLM when you prove value and need to scale.

Avoid LibreChat unless you have AI engineering resources dedicated to managing it. The flexibility isn't worth the operational complexity for typical energy sector use cases.

Msty is great for individual engineers on workstations, wrong for operational deployment.

The harder truth: most organizations spend six months evaluating options when they should spend two weeks deploying AnythingLLM with Ollama and learning what their teams actually need. You'll discover requirements from production use that no amount of analysis will reveal. Get something running, iterate based on real feedback, and expand from there.

We've deployed this stack at utilities managing 40GW of generation capacity and refineries processing 500,000 barrels per day. It works. The question isn't whether local LLM infrastructure is ready for energy operations—it's whether your organization is ready to stop waiting for perfect and start delivering value.

Dimension	AnythingLLM	Open WebUI	LibreChat
Multi-User Support	Full RBAC, workspaces★★★★★	Basic multi-user★★★☆☆	Full multi-user★★★★★
Document RAG Pipeline	Complete, production-ready★★★★★	Per-conversation uploads★★☆☆☆	Advanced agent framework★★★★★
Setup Complexity	2-3 days initial deployment★★★★☆	Hours to functional★★★★★	Complex, multi-service stack★★☆☆☆
Air-Gap Capable	Full offline operation★★★★★	Full offline operation★★★★★	Full offline operation★★★★★
Enterprise Governance	Audit logs, access controls★★★★★	Limited audit capability★★☆☆☆	Configurable but complex★★★☆☆
Best For	Multi-team operational deployment with compliance requirements	Pilot projects and small sites under 20 users	Organizations with dedicated AI engineering resources
Verdict	The proven enterprise choice for energy sector LLM infrastructure	Fastest path to proving value, plan migration path to AnythingLLM	Powerful but operationally expensive unless you need advanced agent capabilities

LLM Infrastructure for Energy Operations: Which Stack Actually Works