AI-Native Web Stack: Firecrawl, Playwright, ChromaDB for Energy Sector

The Problem We're Actually Solving

Every utility we work with has the same frustrating pattern: critical operational data lives scattered across vendor portals, manufacturer documentation sites, regulatory databases, and internal SharePoint hellscapes. Your operators are manually copy-pasting equipment specs from PDFs. Your engineers are screenshotting plots from OEM dashboards because there's no API. Your compliance team is checking regulatory updates on state commission websites like it's 1998.

We've tried the conventional approaches. Traditional web scraping breaks every time a vendor updates their portal. RPA tools like UiPath are expensive, fragile, and require constant babysitting. ETL pipelines can't handle dynamic JavaScript-heavy sites. And forget about making any of this work in air-gapped OT environments.

The AI-native web stack we're describing here solves a specific problem: giving LLM-based agents reliable, repeatable access to web data they can actually reason about. Not for marketing automation or social media scheduling—for real operational workflows where incorrect data means outages or safety incidents.

Architecture: Four Components That Do One Thing Well

Our production deployments use four open-source tools that compose cleanly without requiring a PhD to integrate. We're running this in utilities with NERC CIP requirements, which means we've pressure-tested the security model and air-gap deployment patterns.

Firecrawl: Web to LLM-Ready Markdown

Firecrawl is the newest piece of this stack and it's solved our biggest pain point: converting messy web content into clean markdown that LLMs can actually process. We used to chain together Beautiful Soup, Selenium, and custom parsing logic. Every site required different CSS selectors. JavaScript-rendered content was a nightmare. PDFs embedded in pages were just ignored.

Firecrawl handles all of this with a single API call. Point it at a URL and get back structured markdown with proper heading hierarchy, cleaned tables, and extracted metadata. It renders JavaScript, waits for dynamic content, follows pagination, and chunks long documents semantically—not just by character count.

In our latest deployment at a Western utility, we're using Firecrawl to monitor equipment manufacturer technical bulletins. When a new bulletin appears on a vendor portal, Firecrawl extracts it, our agent checks it against our equipment inventory (stored in ERPNext), and if there's a match, it creates a maintenance work order with the bulletin attached. Before Firecrawl, this was a manual weekly task for two engineers. Now it runs every six hours with zero human intervention.

The critical feature: semantic chunking. When you're feeding 50-page maintenance manuals to an LLM, naive chunking by token count splits context mid-procedure. Firecrawl chunks by document structure, preserving complete sections. Our RAG accuracy improved by 40% after switching from character-based chunking.

Playwright: Browser Automation That Doesn't Hate You

Playwright is Microsoft's answer to Selenium, and it's what browser automation should have been a decade ago. We use it for two scenarios: authenticated vendor portals where Firecrawl's API approach doesn't work, and interactive workflows where an agent needs to fill forms or click through multi-step processes.

The reliability difference versus Selenium is stark. Playwright auto-waits for elements, handles flaky networks gracefully, and has built-in retry logic. More importantly for our use case: it integrates with Model Context Protocol (MCP), which means our LLM agents can drive browsers directly through function calling.

We deployed this at a Southwest utility to automate their NERC CIP evidence collection. Compliance requires screenshots and exports from a dozen different systems—firewall logs, access control systems, patch management consoles. Previously this was a three-day quarterly process. Now a Playwright script driven by an Ollama-based agent logs into each system, navigates to the right reports, takes screenshots, exports CSVs, and organizes everything into a folder structure matching their audit template. Total runtime: 45 minutes.

The security model works for OT environments. Playwright runs in a hardened container with no internet access except to defined destination systems. All credentials are pulled from Vault at runtime. The browser profile is ephemeral—destroyed after each run. This passed our toughest customer's security review.

ChromaDB: Vector Search Without the Ops Burden

We need vector search for RAG workflows. The question is which database won't become a maintenance nightmare. We've deployed Qdrant in production for high-scale scenarios, but ChromaDB has become our default for simpler deployments because the operational complexity is near zero.

ChromaDB is embedded. No separate server process, no clustering configuration, no Kubernetes operators. You import a Python library and you have a vector database. For development and small deployments (under 10M vectors), this is ideal. For larger production systems, ChromaDB can run in client-server mode, but we haven't needed that yet.

The API is deliberately simple. Three operations: add documents with embeddings, query by vector similarity, filter by metadata. That's it. No complex query DSL, no tuning parameters that require a PhD. This matters in energy environments where you're often handing off to operators who need to troubleshoot at 2 AM.

Our typical deployment: Firecrawl scrapes vendor documentation, we generate embeddings using Ollama's nomic-embed-text model, store in ChromaDB with metadata (document source, date, equipment type), then query at runtime when an operator asks a question. Latency is under 100ms for p99 queries on our 2M document collection. Storage is SQLite under the hood, so backups are just file copies.

One gotcha: ChromaDB doesn't do hybrid search (combining vector similarity with keyword matching) as cleanly as Qdrant. If you need that, use Qdrant. But for pure semantic search, ChromaDB's simplicity wins.

Ollama: Local LLM Inference

Ollama is how we run LLMs in environments where cloud APIs aren't an option. Most utilities we work with either have contractual restrictions on sending operational data to third parties or are running in air-gapped OT networks. Ollama lets us deploy Llama 3.1, Mistral, Qwen, and other open models on local hardware with a three-command setup.

The integration story is what makes this work. Ollama exposes an OpenAI-compatible API, which means every tool that works with GPT-4 works with Ollama with a one-line config change. Our Playwright agents use the same LangChain code whether they're hitting Ollama or OpenAI—just different endpoint configuration.

Performance is hardware-dependent but predictable. On our standard deployment (2x NVIDIA L40S GPUs, 96GB VRAM), we get 50-80 tokens/sec with Llama 3.1 70B, good enough for interactive agent workflows. For batch jobs like overnight document processing, we run quantized models (Q4_K_M) which are 3x faster with minimal accuracy loss on our tasks.

The operational win: model updates are atomic. Download a new model with 'ollama pull', test it, switch the config, restart. No Docker image rebuilds, no dependency hell. We've upgraded models in production with zero downtime by running two Ollama instances behind a load balancer.

Integration Pattern: The Autonomous Research Agent

Here's a real workflow we deployed last quarter. A Northeastern utility needed to monitor regulatory changes across six state public utility commissions. Each commission posts orders, notices, and dockets on different websites with different formats. The compliance team was spending 10+ hours per week manually checking and summarizing.

Our agent workflow:

Daily at 6 AM, a cron job triggers the agent
Agent queries ChromaDB for the list of monitored URLs (stored with metadata: commission name, last check date, content hash)
For each URL, agent calls Firecrawl to fetch current content as markdown
Agent compares content hash—if changed, content goes to Ollama with prompt: "Summarize changes since last version, highlight any compliance actions"
If changes are material (determined by second LLM call), agent opens a ticket in ERPNext, attaches the summary and diff, assigns to compliance lead
If changes require detailed investigation (multi-page orders), agent uses Playwright to navigate to linked documents, downloads PDFs, processes with Firecrawl
Final summary email goes to compliance team with links to tickets

Total compute: 15 minutes on a single 8-core VM with one L40S GPU. Cost: $0 marginal (hardware already deployed for other workloads). Human time saved: 8-12 hours weekly. False positive rate after tuning: under 5%.

The failure modes matter. If Firecrawl can't parse a page, agent logs it and continues—doesn't halt the whole workflow. If Ollama times out, built-in retry with exponential backoff. If a commission website is down, agent tries three times over an hour then sends an alert. No silent failures.

Operational Reality: What We Actually Monitor

Production deployment isn't just about making something work once. Here's what we monitor and why:

Firecrawl Success Rate

Track percentage of successful scrapes per domain. If a vendor site drops below 95%, it means they changed their structure and we need to update selectors or switch to Playwright for that source. We alert at 90%.

ChromaDB Query Latency

P50, P95, P99 latency for vector queries. As the database grows, latency creeps up. We've seen p99 go from 50ms to 400ms at 5M documents on spinning disk. Migration to NVMe brought it back to 80ms. Monitor this before users complain.

Ollama Token Throughput

Tokens per second averaged over 5-minute windows. Sudden drops indicate GPU memory issues, thermal throttling, or competing workloads. We've traced "slow agent" complaints to batch jobs starving Ollama of GPU cycles.

Playwright Error Rate

Percentage of browser automation runs that fail with exceptions. Authentication failures, timeouts, and element-not-found errors all go here. A spike usually means a vendor portal updated their interface. We keep Playwright versions pinned and test updates in staging before promoting.

End-to-End Workflow Success

The big one: did the entire agent workflow complete and produce correct output? We have unit tests, but production data is different. We randomly sample 10% of agent outputs weekly and have a human verify correctness. This catches subtle issues like LLM hallucinations or incomplete document extraction.

Deployment Pattern: Air-Gapped and Compliant

NERC CIP utilities need this stack running in OT environments with no internet access. Our reference architecture:

All components run in Podman containers on RHEL 8 (required by most utility security policies)
Ollama models are pre-downloaded and loaded from local registry
Firecrawl runs in proxy mode—allows outbound HTTPS to specific vendor domains only, controlled by firewall ACLs
Playwright uses a custom Firefox build with all telemetry stripped and update mechanisms disabled
ChromaDB data directory is on encrypted volume with daily snapshots to air-gapped backup storage
No component phones home, no crash reporting, no usage analytics

We maintain an internal mirror of PyPI packages so pip installs don't require internet access. Model updates and security patches go through the utility's standard change management process. This adds friction but it's non-negotiable in critical infrastructure.

Cost Comparison: Open Source vs. Vendor

A vendor recently quoted $180K/year for an "AI-powered compliance monitoring solution" that does less than what we built. Here's our actual cost breakdown:

Hardware: $25K one-time (refurbished server with 2x L40S GPUs)
Software: $0 (all open source)
Initial development: 120 hours engineering time
Ongoing maintenance: ~4 hours monthly

Break-even is under four months. After two years, total cost of ownership is under $50K versus $360K for the vendor solution. And we own the code, the data never leaves the utility's network, and we can modify functionality without waiting for vendor release cycles.

The Verdict

This stack works in production for autonomous agent workflows that interact with web data. Firecrawl makes web content LLM-friendly without the parser maintenance burden. Playwright enables reliable browser automation that integrates directly with LLM function calling. ChromaDB provides vector search without operational complexity. Ollama runs it all locally with acceptable performance.

The sweet spot is operational workflows where data sovereignty matters, volumes are moderate (under 10M documents), and latency requirements are human-scale (seconds, not milliseconds). This is not the stack for high-frequency trading or real-time SCADA data processing. It is the stack for automating manual research tasks, monitoring external data sources, and giving operators AI-powered tools that work offline.

We're deploying this pattern across eight utilities now. The reliability has been better than expected—most issues are upstream (vendor websites changing) not in the stack itself. Development velocity is high because components compose cleanly and there's minimal impedance mismatch between tools.

If you're evaluating AI infrastructure for energy operations and your requirements include data sovereignty, air-gapped deployment, and moderate scale, this combination should be on your short list. If you need massive scale or millisecond latency, look at Qdrant instead of ChromaDB and consider GPU-optimized inference servers instead of Ollama. But for 80% of the use cases we see, the simplicity of this stack beats the theoretical performance of more complex alternatives.

Dimension	Firecrawl + Playwright	Selenium + BeautifulSoup	Commercial RPA (UiPath)
Ease of Deployment	30min to production★★★★★	2-3 days setup★★★☆☆	Weeks + vendor engagement★★☆☆☆
Web Scraping Power	JS rendering + automation★★★★★	Basic scraping only★★☆☆☆	Visual automation★★★★☆
LLM Integration	Native markdown output★★★★★	Manual parsing required★★☆☆☆	No native LLM support★☆☆☆☆
Air-Gap Support	Full offline capability★★★★★	Works offline★★★★★	License server required★☆☆☆☆
Operational Complexity	Minimal monitoring needed★★★★☆	Constant maintenance★★☆☆☆	High ops burden★★☆☆☆
Best For	AI agents that need reliable web data in regulated environments	Legacy automation with existing Selenium expertise	Enterprise with existing RPA infrastructure and budget
Verdict	Best all-around stack for autonomous workflows with data sovereignty requirements and moderate scale.	Viable for simple static sites but brittle and maintenance-heavy for modern web applications.	Expensive, poor LLM integration, and licensing model incompatible with air-gapped OT environments.

AI-Native Web Infrastructure: Building Autonomous Agents That Actually Work