Liutong
Liutong is an LLM inference platform that runs language models at lower cost than commercial API providers. It uses a custom inference engine written in Rust, inspired by vLLM and SGLang, and exposes a fully OpenAI-compatible API.
Why Liutong?
Liutong runs on self-hosted infrastructure, which keeps costs below commercial API providers, and falls back to OpenAI when needed so requests still get served. The API matches OpenAI’s, so existing SDKs and code work without changes. There are four model families, covering chat, reasoning, media generation, and embeddings. The inference engine is written in Rust.
Who Liutong is for
Liutong is for teams that want predictable, dedicated compute for their LLM workloads. It fits companies running established workflows that cannot absorb the variability of models changing underneath them. If you have a workflow that already works, and you need stable, reproducible results from dedicated capacity, Liutong is built for that. The trade you are making is paying for predictability and reliability instead of chasing the newest model or the lowest price.
Who Liutong is not for
Liutong is not the right tool if you are experimenting or building a new product workflow from scratch. It is also not for you if you need the latest frontier models with the highest-quality output, or if your priority is the lowest price per token for large-volume generation. In those cases, a large frontier lab such as OpenAI, Anthropic, or Google will serve you better.
| Good fit | Not a fit |
|---|---|
| A proven, stable workflow | Experimentation and prototyping |
| Predictable, dedicated compute | Lowest price per token at scale |
| Reproducible, consistent output | Latest frontier-model quality |
In short, Liutong’s main value is stability, predictability, and dedicated compute. It is not built to give you frontier quality or the cheapest rates.
Models
| Model | Category | Use Case |
|---|---|---|
crimson-falcon-4 | Chat | General-purpose text generation and conversation |
indigo-owl-4 | Reasoning | Complex reasoning, math, and multi-step problem solving |
amber-phoenix-4 | Media | Image and video generation |
jade-mole-4 | Embeddings | Text embeddings for search and retrieval |
Quick Example
from openai import OpenAI
client = OpenAI(
base_url="https://api.liutong.llby.org/v1",
api_key="lt_your_api_key",
)
response = client.chat.completions.create(
model="crimson-falcon-4",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
Ready to get started? Head to the Quickstart guide.