Model catalog

Every model, on your own GPU.

Open-weight models that run locally on the NVIDIA TensorRT-LLM stack — downloaded once, compiled to your exact RTX card on install. Runs on your own PC — no API key, no per-token fee. Models marked Available ship in the catalog today; Roadmap models run on our engine and are being validated on-hardware. The 128 GB Spark tier runs larger models at full FP16 precision — no quantization — on DGX Spark’s unified memory.

Download once. Run it forever.

Every model in the catalog runs entirely on your own hardware. Your prompts never leave your machine.

Download for Windows Try it live