ζ₯ζ¬θͺ | English
A command-line interface wrapper for VOICEPEAK text-to-speech software with preset management and automatic audio playback.
This wrapper enhances the original VOICEPEAK CLI with several powerful features:
- π΅ Auto-play with mpv - Automatically plays generated audio when no output file is specified
- π Voice presets - Save and reuse combinations of narrator, emotions, and pitch settings
- π Long text support - Automatically splits texts longer than 140 characters and merges audio chunks
- π§ Advanced playback modes - Choose between batch (generate all β merge β play) or sequential (generate β play one by one)
- π Pipe input support - Accept text from stdin:
echo "text" | vp - π Background execution - Run with
--bgto return shell control immediately while audio generates and plays - π Clean output - Suppresses technical output by default (use
--verboseto see debug info) - βοΈ Configuration file - Store your preferred settings in
~/.config/vp/config.toml
- Enhanced Workflow: No need to manually save and play audio files - just run and listen
- Batch Processing: Handle long documents without worrying about character limits
- Flexible Input: Works with direct text, files, or piped input from other commands
- Personalization: Save your favorite voice configurations for consistent results
- Professional Output: Clean interface with optional verbose mode for debugging
- macOS
- VOICEPEAK installed at
/Applications/voicepeak.app/ - mpv for audio playback (install via Homebrew:
brew install mpv) - ffmpeg for batch mode and multi-chunk file output (install via Homebrew:
brew install ffmpeg)
cargo install voicepeak-cli- Clone this repository
- Build and install:
cargo install --path .
# Simple text-to-speech (requires preset or --narrator)
vp "γγγ«γ‘γ―γδΈηοΌ"
# With explicit narrator
vp "γγγ«γ‘γ―γδΈηοΌ" --narrator "ε€θ²θ±ζ’¨"
# Save to file instead of auto-play
vp "γγγ«γ‘γ―γδΈηοΌ" --narrator "ε€θ²θ±ζ’¨" -o output.wav
# Read from file
vp -t input.txt --narrator "ε€θ²θ±ζ’¨"
# Pipe input
echo "γγγ«γ‘γ―γδΈηοΌ" | vp --narrator "ε€θ²θ±ζ’¨"
cat document.txt | vp -p karin-happy# List available presets
vp --list-presets
# Use a preset
vp "γγγ«γ‘γ―γδΈηοΌ" -p karin-happy
# Override preset settings
vp "γγγ«γ‘γ―γδΈηοΌ" -p karin-normal --emotion "happy=50"# Control speech parameters
vp "γγγ«γ‘γ―γδΈηοΌ" --narrator "ε€θ²θ±ζ’¨" --speed 120 --pitch 50
# List available narrators
vp --list-narrator
# List emotions for a specific narrator
vp --list-emotion "ε€θ²θ±ζ’¨"# Allow automatic text splitting (default)
vp "very long text..."
# Strict mode: reject texts longer than 140 characters
vp "text" --strict-length# Run in background (returns shell control immediately)
vp --bg "γγγ«γ‘γ―γδΈηοΌ"
# Combine with other options
vp --bg -p karin-happy "γγγ«γ‘γ―γδΈηοΌ"
vp --bg -o output.wav "γγγ«γ‘γ―γδΈηοΌ"
echo "γγγ«γ‘γ―" | vp --bg# Batch mode: generate all chunks first, merge, then play (default)
vp "long text" --playback-mode batch
# Sequential mode: generate and play chunks one by one
vp "long text" --playback-mode sequential
# Long text file output (uses ffmpeg to merge chunks)
vp "very long text" -o output.wav
# For sequential playback without ffmpeg
vp "long text" --playback-mode sequentialConfiguration is stored in ~/.config/vp/config.toml. The file is automatically created on first run.
default_preset = "karin-custom"
[[presets]]
name = "karin-custom"
narrator = "ε€θ²θ±ζ’¨"
emotions = [
{ name = "hightension", value = 10 },
{ name = "sasayaki", value = 20 },
]
pitch = 30
speed = 120
[[presets]]
name = "karin-normal"
narrator = "ε€θ²θ±ζ’¨"
emotions = []
[[presets]]
name = "karin-happy"
narrator = "ε€θ²θ±ζ’¨"
emotions = [{ name = "hightension", value = 50 }]default_preset: Optional. Preset to use when no-poption is specifiedpresets: Array of voice presets
name: Unique preset identifiernarrator: Voice narrator nameemotions: Array of emotion parameters withnameandvaluepitch: Optional pitch adjustment (-300 to 300)speed: Optional speed adjustment (50 to 200)
Usage: vp [OPTIONS] [TEXT]
Arguments:
[TEXT] Text to say (or pipe from stdin)
Options:
-t, --text <FILE> Text file to say
-o, --out <FILE> Path of output file (optional - will play with mpv if not specified)
-n, --narrator <NAME> Name of voice
-e, --emotion <EXPR> Emotion expression (e.g., happy=50,sad=50)
-p, --preset <NAME> Use voice preset
--list-narrator Print voice list
--list-emotion <NARRATOR> Print emotion list for given voice
--list-presets Print available presets
--speed <VALUE> Speed (50 - 200)
--pitch <VALUE> Pitch (-300 - 300)
--strict-length Reject input longer than 140 characters (default: false, allows splitting)
--playback-mode <MODE> Playback mode: sequential or batch (default: batch)
--bg Run in background (return immediately)
-v, --verbose Enable verbose output (show VOICEPEAK debug messages)
-h, --help Print help
-V, --version Print version
When multiple sources specify the same parameter, the priority order is:
- Command-line options (highest priority)
- Preset values
- Default values / none (lowest priority)
For example:
vp "text" -p my-preset --pitch 100uses pitch=100 (CLI override)vp "text" -p my-presetuses preset's pitch valuevp "text" --narrator "voice"uses no pitch adjustment
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please see CONTRIBUTING.md for detailed guidelines on how to contribute to this project.