Open-weight models that run locally on the NVIDIA TensorRT-LLM stack — downloaded once, compiled to your exact RTX card on install. Runs on your own PC — no API key, no per-token fee. Models marked Available ship in the catalog today; Roadmap models run on our engine and are being validated on-hardware. The 128 GB Spark tier runs larger models at full FP16 precision — no quantization — on DGX Spark’s unified memory.
Every model in the catalog runs entirely on your own hardware. Your prompts never leave your machine.