Bare Metal Presence · preview · look it in the eye

A face to talk to.
100% local.

A photorealistic avatar that hears you, thinks, and answers out loud — a real face and voice rendered entirely on your own GPU. It lip-syncs to its own speech in real time, and nothing — audio, video, or transcript — ever leaves the box. No cloud avatar service. No per-minute meter.

Live loop “Hey — can you summarize what we talked about yesterday?”
01
Hear
Whisper transcribes your speech on-device — no audio is uploaded.
02
Think
Your local model composes the reply on your own GPU — with memory of past chats.
03
Speak
On-device text-to-speech voices the answer — no audio leaves your machine.
04
Face
A photoreal face is rendered frame-by-frame on your GPU, lip-synced to the reply as it speaks.
Whisper STTYour local LLMOn-device TTSGPU-rendered face0 bytes to the cloud
Private by construction
$0
Per minute — your GPU, your model, no meter. Cloud avatar agents bill by the minute.
0
Bytes leave your machine — audio, video and transcript stay on-device
1 GPU
Runs the whole avatar — speech, model, voice and face on a single card
100%
Offline-capable — pull the network cable and it still talks back
Hear · think · speak · appear — one stack

The whole avatar
runs on your GPU.

Cloud avatar platforms stream a face from their servers — your voice and video ride to the cloud, get rendered there, and come back as a video feed you don’t control. Bare Metal Presence runs the entire loop — speech, model, voice, and the photoreal face — on the same box. The face is rendered frame-by-frame on your own GPU and lip-synced to its own speech, so nothing you say and no frame it draws ever leaves the machine.

Hear

On-device speech

Whisper runs locally and transcribes as you talk. Your voice is never streamed to a server.

Think

Your local model

Any model in the catalog, running on your own GPU — with memory of your past conversations.

Speak

On-device voice

Natural text-to-speech renders the reply on your own GPU. The whole round-trip stays offline.

Face

Rendered on your GPU

A photoreal face is generated frame-by-frame and lip-synced to the speech — no cloud video feed.

Look it in the eye.
Privately.

Your face, your voice, your model, your machine — not a frame leaves the box.

Get the runtime Try Voice