Search Documentation

Search for pages and headings in the documentation

Liutong

Fast, affordable LLM inference with a fully OpenAI-compatible API. Powered by a custom Rust engine.

Four model families, one API

Drop-in replacement for the OpenAI API. Just change the base URL.

🔴

crimson-falcon-4

General-purpose chat model. Text generation, conversation, code, and summarization.

🔵

indigo-owl-4

Reasoning model. Multi-step problem solving, math, logic, and code analysis.

🟠

amber-phoenix-4

Media generation. High-quality images and video from text prompts.

🟢

jade-mole-4

Embeddings model. Semantic search, clustering, classification, and RAG.

Rust-powered engine

Custom inference engine built in Rust, inspired by vllm and sglang. Optimized for throughput and latency.

OpenAI-compatible

Works with the official OpenAI Python and Node.js SDKs. No custom client libraries needed.

Affordable inference

Self-hosted infrastructure keeps costs low. Automatic fallback to upstream providers ensures reliability.

Ready to get started?

Explore the docs or jump straight to the quickstart guide