Context Prewarm Extension for SillyTavern

A SillyTavern extension that automatically pre-warms your local LLM's KV cache after each AI response, making your next message generate much faster.

The Problem

When using local LLMs (via llama.cpp, koboldcpp, text-generation-webui, etc.), there's often a delay before the AI starts generating a response. This is because the LLM needs to process (tokenize and compute attention for) the entire conversation context before it can generate new tokens.

If you've ever noticed that sending an empty message, canceling it, and then sending your real message results in a much faster response - that's because the KV cache is already "warm" with the conversation context.

The Solution

This extension automates that process. After each AI response completes, it automatically:

Injects a temporary user message into the context
Sends it to your LLM endpoint (warming the KV cache)
Immediately cancels the generation
Cleans up the temporary message

When you send your next real message, the LLM only needs to process the new tokens instead of re-processing the entire conversation.

Installation

Using SillyTavern's Extension Installer (Recommended)

Open SillyTavern
Click the Extensions button (puzzle piece icon) in the top menu
Click Install extension
Paste this URL: https://github.com/tomt610/sillytavern-prewarm
Click Save

The extension will be automatically downloaded and activated.

Manual Installation

Download this repository as a ZIP
Extract to SillyTavern/data/<user-handle>/extensions/prewarm/
Restart SillyTavern

Configuration

After installation, find Context Prewarm in the Extensions panel (puzzle piece icon).

Settings

Setting	Description	Default
Enable Context Prewarm	Toggle the feature on/off	Off
Prewarm Mode	How to prewarm the cache (see below)	User Message
Prewarm Message	The temporary message to send	`.`
Delay before cancel (ms)	How long to wait before canceling	500ms

Prewarm Modes

User Message (recommended): Simulates sending a user message. This is what you'd do manually and works with most setups.
Quiet/Continue: Uses background generation like AI continue. Try this if User Message mode doesn't work for your setup.

Tuning the Delay

The delay setting controls how long to wait before canceling the prewarm request. This needs to be long enough for the context to be sent to your LLM endpoint.

Too short: Context won't be fully processed, prewarm won't be effective
Too long: Wastes time and compute

Start with 500ms and adjust based on your setup. If you have a slow connection to your LLM or a very long context, you may need to increase this.

How It Works

Listens for the GENERATION_ENDED event (when AI finishes responding)
Skips prewarm for swipes, regenerates, and continues (only triggers after normal messages)
Temporarily adds a user message to the chat context
Starts a quiet generation request to the LLM
Cancels after the configured delay
Removes the temporary message and saves

The prewarm happens invisibly - you won't see any flashing messages in the UI.

Compatibility

Works with any local LLM backend that supports KV caching (llama.cpp, koboldcpp, text-generation-webui, etc.)
May not provide benefits with cloud APIs (OpenAI, Claude, etc.) as they manage their own caching

Troubleshooting

Prewarm doesn't seem to help:

Increase the delay setting
Make sure your LLM backend actually supports KV caching
Check the browser console for [Prewarm] log messages

Swipe buttons disappear:

This was fixed in later versions. Make sure you have the latest version.

Prewarm message gets stuck in chat:

Delete the stuck message manually
Make sure you have the latest version which properly cleans up

License

MIT License - Feel free to use, modify, and distribute.

Credits

Created for the SillyTavern community.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
index.js		index.js
manifest.json		manifest.json
settings.html		settings.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Context Prewarm Extension for SillyTavern

The Problem

The Solution

Installation

Using SillyTavern's Extension Installer (Recommended)

Manual Installation

Configuration

Settings

Prewarm Modes

Tuning the Delay

How It Works

Compatibility

Troubleshooting

License

Credits

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Context Prewarm Extension for SillyTavern

The Problem

The Solution

Installation

Using SillyTavern's Extension Installer (Recommended)

Manual Installation

Configuration

Settings

Prewarm Modes

Tuning the Delay

How It Works

Compatibility

Troubleshooting

License

Credits

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages