Liutong

Liutong is an LLM inference platform that runs language models at lower cost than commercial API providers. It uses a custom inference engine written in Rust, inspired by vLLM and SGLang, and exposes a fully OpenAI-compatible API.

Why Liutong?

Liutong runs on self-hosted infrastructure, which keeps costs below commercial API providers, and falls back to OpenAI when needed so requests still get served. The API matches OpenAI’s, so existing SDKs and code work without changes. There are four model families, covering chat, reasoning, media generation, and embeddings. The inference engine is written in Rust.

Who Liutong is for

Liutong is for teams that want predictable, dedicated compute for their LLM workloads. It fits companies running established workflows that cannot absorb the variability of models changing underneath them. If you have a workflow that already works, and you need stable, reproducible results from dedicated capacity, Liutong is built for that. The trade you are making is paying for predictability and reliability instead of chasing the newest model or the lowest price.

Who Liutong is not for

Liutong is not the right tool if you are experimenting or building a new product workflow from scratch. It is also not for you if you need the latest frontier models with the highest-quality output, or if your priority is the lowest price per token for large-volume generation. In those cases, a large frontier lab such as OpenAI, Anthropic, or Google will serve you better.

Good fit	Not a fit
A proven, stable workflow	Experimentation and prototyping
Predictable, dedicated compute	Lowest price per token at scale
Reproducible, consistent output	Latest frontier-model quality

In short, Liutong’s main value is stability, predictability, and dedicated compute. It is not built to give you frontier quality or the cheapest rates.

Models

Model	Category	Use Case
`crimson-falcon-4`	Chat	General-purpose text generation and conversation
`indigo-owl-4`	Reasoning	Complex reasoning, math, and multi-step problem solving
`amber-phoenix-4`	Media	Image and video generation
`jade-mole-4`	Embeddings	Text embeddings for search and retrieval

Quick Example

from openai import OpenAI

client = OpenAI(
    base_url="https://api.liutong.llby.org/v1",
    api_key="lt_your_api_key",
)

response = client.chat.completions.create(
    model="crimson-falcon-4",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

Ready to get started? Head to the Quickstart guide.

Search Documentation