Liutong
Liutong is a high-performance inference platform that delivers state-of-the-art language models at a fraction of the cost. Built on a custom Rust-based inference engine inspired by vllm and sglang, Liutong offers blazing-fast token generation with full OpenAI API compatibility.
Why Liutong?
- Affordable inference — Self-hosted infrastructure keeps costs dramatically lower than commercial API providers, with OpenAI fallback for guaranteed availability.
- OpenAI-compatible — Drop-in replacement for the OpenAI API. Use the same SDKs, the same code, the same workflows.
- Purpose-built models — Four model families covering chat, reasoning, media generation, and embeddings.
- Rust performance — Our inference engine is written in Rust for maximum throughput and minimal latency.
Models
| Model | Category | Use Case |
|---|---|---|
crimson-falcon-4 | Chat | General-purpose text generation and conversation |
indigo-owl-4 | Reasoning | Complex reasoning, math, and multi-step problem solving |
amber-phoenix-4 | Media | Image and video generation |
jade-mole-4 | Embeddings | Text embeddings for search and retrieval |
Quick Example
from openai import OpenAI
client = OpenAI(
base_url="https://api.liutong.llby.org/v1",
api_key="lt_your_api_key",
)
response = client.chat.completions.create(
model="crimson-falcon-4",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
Ready to get started? Head to the Quickstart guide.