Bare Metal AI runs agents — not just chat — entirely on hardware you control. Point any OpenAI- or Anthropic-compatible agent framework at your own GPUs and let it work over your data: on-prem, air-gapped, nothing leaving your network. It's the fastest inference engine on RTX hardware, and a transport that turns ordinary cards into a cluster so you can run models a single GPU can't hold. Same engine from one workstation to a thousand-GPU fleet.
Tool-using, autonomous models run entirely on hardware you control — reading your documents, hitting your internal APIs, automating real work — with nothing leaving your network. AES-256 encrypted on device, credentials encrypted at rest, TLS in transit, no third-party subprocessors. The air-gapped architecture fits the requirements behind HIPAA and SOC 2.
OpenAI- and Anthropic-compatible /v1 endpoints mean LangChain, the OpenAI and
Anthropic SDKs, or your own agent loop point at your GPUs with no SDK changes and no prompt
migration. The cloud agents you've already built just run locally.
Our tensor-parallel transport runs over ordinary networking — no NVLink, no InfiniBand — so several consumer RTX cards can run a model too large for any one of them, in-house and private. Single-GPU is where raw throughput peaks; multi-GPU is how you reach models that wouldn't otherwise run on your hardware at all.
Every consumer local-AI tool wraps llama.cpp. Bare Metal AI runs NVIDIA TensorRT-LLM with paged KV-cache, in-flight batching, and optimized GEMM — so a single card serves more tokens per second and your agents finish their work sooner. More throughput per GPU is fewer GPUs for the same work. Independently benchmarked by Menlo Research, single-GPU vs llama.cpp:
Per-GPU throughput vs. llama.cpp — the backend behind Ollama, LM Studio, and Jan. Source: Menlo Research — Benchmarking NVIDIA TensorRT-LLM →
Cloud AI charges by the token and the seat — so the more your team uses it, the more it costs, forever, and your data leaves the building to earn that bill. Owning the engine flips it: a flat price per seat, unlimited usage, on hardware you control.
The crossover comes fast: heavy agentic workloads bill the most in the cloud and cost nothing extra in-house. We'll model your break-even against current cloud spend in the demo.
Healthcare, finance, legal, and government teams that need AI behind their own firewall, with no third-party subprocessor in the data path.
Engineering orgs that won't paste source, designs, or trade secrets into someone else's API — run coding and research agents entirely in-house.
Back-office and operations work — document processing, retrieval, internal copilots — run by local agents over systems that never touch the internet.
Teams watching per-token cloud bills climb with usage. Move to a flat per-seat cost with unlimited usage on hardware you may already own, and stop metering your own people.
The agent is the engine; the platform is everything it reaches and everything you control. It ships today with 318 one-click integrations and grows along four lines — private knowledge, governance, reach, and deep enterprise connectors — every one of them running on your own hardware, with your data and credentials never leaving the building.
Build a knowledge base from your documents, contracts, and wikis. The agent answers with citations — and the chunker, the embeddings, the vector index, and the source text all stay on your GPU host. No cloud index, no third-party service in the retrieval path.
Every action is governed today — read-only by default, and every side-effecting tool call pauses for human approval before it runs. The compliance trail stays on-prem, like everything else.
Reach the agent where your team already works. @mention it in Slack or Teams, or message it from your phone — allowlist-gated pairing, no inbound ports to open.
318 live integrations on the open Model Context Protocol, plus deep enterprise connectors on the same data-resident pattern as our Databricks and Snowflake servers — token and data stay on the host.
A selection of the catalog — 318 integrations across 14 categories and 7 industry verticals, all data-resident. Browse the full catalog →
The catalog spans the systems regulated and data-sensitive industries actually run on — every one on the same data-resident pattern: credentials and data stay on your hardware, and every write is approval-gated. A selection by vertical:
Private analysis over trading, risk, and customer data. Snowflake, Databricks, SAP, and market-data connectors — without prompts or positions leaving your network.
PHI-safe agents over clinical and claims systems — FHIR, EHR, and document stores — running on-prem, so protected data never reaches a third party.
Air-gap-capable deployments for controlled and classified environments. No inbound ports, no external subprocessors, and a full audit trail.
Contract, matter, and discovery analysis over privileged documents that never leave the firm. Document-management and e-signature connectors included.
Claims triage, policy analysis, and underwriting support over your core systems — on hardware you control, with every write gated for human review.
SOC agents over your SIEM, identity, and endpoint telemetry — Splunk, CrowdStrike, Okta, Entra — triaging and correlating without shipping logs to a vendor cloud.
Don't see your stack? Any OpenAPI or Model Context Protocol service connects, and you can bring your own remote MCP server. Tell us what you run →
A flat price per seat — no token meter, run it as hard as you like. Self-serve Team is $50/seat/month; Enterprise is volume-priced with SSO, governance, and support. Full tiers, the free Home plan, and self-serve checkout live on the pricing page.
Per-seat list price drops with volume; Enterprise is quoted on your seat count, with SCIM provisioning so the bill tracks your active roster automatically.
Annual support with response-time guarantees, a named team, and a direct line to the engineers who build the engine.
Forward-deployed engineers to stand up the daemon, models, and transport on your fleet and integrate it into your stack.
You pay a flat price per seat, and usage is unlimited — no token meter. Cloud AI charges per seat and per token, so the bill grows the more you use it; ours doesn't move. Each paid seat can also share access with up to 4 teammates (chat), and Enterprise seats provision automatically from your identity provider via SCIM. Everything runs on your hardware — your data never leaves the building.
No. Inference runs entirely on your hardware and the runtime can be fully air-gapped. There are no third-party subprocessors. Chat history is AES-256 encrypted on device, credentials are encrypted at rest, and traffic is TLS in transit.
Agents. Because we expose OpenAI- and Anthropic-compatible /v1 endpoints, agent
frameworks like LangChain or your own tool-calling loop point at your GPUs unchanged. Tool-using,
autonomous models run over your internal data and APIs — entirely on hardware you control.
Yes. We expose OpenAI- and Anthropic-compatible /v1 endpoints, so anything already
pointed at GPT or Claude can point at your own GPUs with no SDK changes or prompt migration.
Any NVIDIA RTX GPU. Single-GPU is where per-card throughput is highest. When a model is too large for one card, our network transport splits it across machines — no NVLink or InfiniBand — so you can run it at all, privately, on hardware you own. Multi-GPU buys you model size and capacity; for raw speed per request, a single capable card leads.
Self-serve: pick a plan and subscribe — seats scale up in your admin console, billed per seat. For enterprise volume, SSO/SCIM, and a support SLA, email us your seat count and we'll send pricing the same day. No demo gauntlet.
Chat was the demo. The arc is fleets of capable open agents — the Hermes-class, tool-using models — running your real work autonomously: reading your documents, operating your internal systems, and answering to no one outside your walls. We're building the engine that makes that fast and affordable on hardware you already own. Today it's the fastest local inference and a transport that runs models a single card can't. Next is the agent platform on top of it.
Most teams self-serve: download the free Home plan or start a Team trial, no sales call. For an enterprise rollout (SSO, SCIM, audit, volume pricing) — or a managed appliance, a box we provision and ship that you own outright — tell us what you need and a founder replies, usually within a day.
Prefer email? [email protected]