Search Documentation

Search for pages and headings in the documentation

Liutong

Liutong is an LLM inference platform that runs language models at lower cost than commercial API providers. It uses a custom inference engine written in Rust, inspired by vLLM and SGLang, and exposes a fully OpenAI-compatible API.

Why Liutong?

Liutong runs on self-hosted infrastructure, which keeps costs below commercial API providers, and falls back to OpenAI when needed so requests still get served. The API matches OpenAI’s, so existing SDKs and code work without changes. There are four model families, covering chat, reasoning, media generation, and embeddings. The inference engine is written in Rust.

Who Liutong is for

Liutong is for teams that want predictable, dedicated compute for their LLM workloads. It fits companies running established workflows that cannot absorb the variability of models changing underneath them. If you have a workflow that already works, and you need stable, reproducible results from dedicated capacity, Liutong is built for that. The trade you are making is paying for predictability and reliability instead of chasing the newest model or the lowest price.

Who Liutong is not for

Liutong is not the right tool if you are experimenting or building a new product workflow from scratch. It is also not for you if you need the latest frontier models with the highest-quality output, or if your priority is the lowest price per token for large-volume generation. In those cases, a large frontier lab such as OpenAI, Anthropic, or Google will serve you better.

Good fitNot a fit
A proven, stable workflowExperimentation and prototyping
Predictable, dedicated computeLowest price per token at scale
Reproducible, consistent outputLatest frontier-model quality

In short, Liutong’s main value is stability, predictability, and dedicated compute. It is not built to give you frontier quality or the cheapest rates.

Models

ModelCategoryUse Case
crimson-falcon-4ChatGeneral-purpose text generation and conversation
indigo-owl-4ReasoningComplex reasoning, math, and multi-step problem solving
amber-phoenix-4MediaImage and video generation
jade-mole-4EmbeddingsText embeddings for search and retrieval

Quick Example

from openai import OpenAI

client = OpenAI(
    base_url="https://api.liutong.llby.org/v1",
    api_key="lt_your_api_key",
)

response = client.chat.completions.create(
    model="crimson-falcon-4",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

Ready to get started? Head to the Quickstart guide.