RKLLM API Server

Introduction

A lightweight, high-performance API server for Rockchip NPUs (RKLLM), providing drop-in compatibility with OpenAI API, Claude API and Ollama API formats. This allows you to seamlessly integrate locally hosted large language models on Rockchip hardware with existing AI tools, frontends, and frameworks.

Features

🚀 Hardware Optimized: Leverages Rockchip's NPU for fast inference.
🔄 Triple API Compatibility: Supports both standard OpenAI, Claude and Ollama API endpoints.
🌊 Real-time Streaming: Full support for Server-Sent Events (SSE) streaming token output.
🐳 Docker Ready: Minimal footprint containerization for easy deployment.
🛠️ No External Tokenizers: Operates independently without needing Hugging Face transformers or AutoTokenizer.

Supported Platforms

Hardware: RK3588 Series, RK3576 Series
RKNPU Driver Version: v0.9.8 (Recommended)

Note: Check your RKNPU version before proceeding:
cat /sys/kernel/debug/rknpu/version
If this command returns no output, your Linux kernel does not currently support the RKNPU.

🐳 Quickstart (Docker Recommended)

The easiest way to run the server is via Docker.

Option A: Docker CLI

docker run -d \
  --name rkllm-server \
  --restart unless-stopped \
  --privileged \
  -p 8080:8080 \
  -v /dev:/dev \
  -v /YOUR/PATH/TO/MODELS:/rkllm_server/models \
  -e TARGET_PLATFORM=rk3588 \
  -e RKLLM_MODEL_PATH=YOUR_MODEL_FILE_NAME.rkllm \
  -e PORT=8080 \
  dukihiroi/rkllm-server:latest

Option B: Docker Compose

Create a docker-compose.yml file:

services:
  rkllm-server:
    image: dukihiroi/rkllm-server:latest
    container_name: rkllm-server
    restart: unless-stopped
    privileged: true
    ports:
      - "8080:8080"
    volumes:
      - /dev:/dev
      - ./models:/rkllm_server/models
    environment:
      - TARGET_PLATFORM=rk3588
      - RKLLM_MODEL_PATH=qwen3-vl-2b-instruct_w8a8_rk3588.rkllm
      - PORT=8080

Then start the server:

mkdir models # Place your .rkllm files here
docker compose up -d

Test the deployment:

curl http://localhost:8080/health

🛠️ Manual Installation

If you prefer to run the server directly on the host OS without Docker:

1. Clone the repository:

git clone https://github.com/anand34577/rkllm_openai.git
cd rkllm_openai

2. Install RKLLM Dynamic Libraries:

sudo cp lib/*.so /usr/lib
sudo ldconfig

3. Install uv (Fast Python Package Installer):

curl -LsSf https://astral.sh/uv/install.sh | sh

4. Sync Dependencies:

uv sync

5. Run the Server:

uv run server.py \
  --rkllm_model_path=models/qwen3-vl-2b-instruct_w8a8_rk3588.rkllm \
  --target_platform=rk3588 \
  --port=8080

🔌 API Endpoints

Once running, the server listens on the configured port (default 8080).

API Type	Endpoint	Description
Server	`GET /health`	Check server status and NPU availability.
OpenAI	`POST /v1/chat/completions`	Standard chat completion (supports `stream: true`).
OpenAI	`GET /v1/models`	Returns the currently loaded RKLLM model ID.
Claude	`POST /v1/messages`	Anthropic-compatible message completion (supports `stream: true`).
Ollama	`POST /api/chat`	Ollama-compatible chat completion.
Ollama	`GET /api/tags`	Ollama-compatible model listing.

Testing with the Built-in Client

You can test the OpenAI streaming implementation using the included Python client:

uv run client.py --host http://localhost:8080 --prompt "Explain quantum mechanics briefly." --stream

⚠️ Important Limitations & Notes

Hardware Concurrency Limit Because the NPU handles one inference task at a time, the server can only process one conversation at a time. * Do not use this server for heavy background tasks (like bulk title/tag generation) if you also want it to remain responsive for interactive chat.

If a new request arrives while the NPU is busy, the server will briefly wait. If the NPU does not free up, it will return an HTTP 503 Service Unavailable error rather than crashing.

📦 Model Zoo

To download pre-converted .rkllm models, please refer to the official Rockchip rknn-llm Model Zoo.

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
.idea		.idea
api		api
lib		lib
models		models
rknn-llm-license		rknn-llm-license
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
README.zh.md		README.zh.md
client.py		client.py
docker-compose.yml		docker-compose.yml
fix_freq_rk3588.sh.d		fix_freq_rk3588.sh.d
pyproject.toml		pyproject.toml
rkllm.py		rkllm.py
server.py		server.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RKLLM API Server

Introduction

Features

Supported Platforms

🐳 Quickstart (Docker Recommended)

Option A: Docker CLI

Option B: Docker Compose

🛠️ Manual Installation

🔌 API Endpoints

Testing with the Built-in Client

⚠️ Important Limitations & Notes

📦 Model Zoo

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RKLLM API Server

Introduction

Features

Supported Platforms

🐳 Quickstart (Docker Recommended)

Option A: Docker CLI

Option B: Docker Compose

🛠️ Manual Installation

🔌 API Endpoints

Testing with the Built-in Client

⚠️ Important Limitations & Notes

📦 Model Zoo

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages