Skip to content

feat: add --diarize_device flag to allow separate device for diarization#1373

Open
tioback wants to merge 1 commit intom-bain:mainfrom
tioback:feat/diarize-device
Open

feat: add --diarize_device flag to allow separate device for diarization#1373
tioback wants to merge 1 commit intom-bain:mainfrom
tioback:feat/diarize-device

Conversation

@tioback
Copy link
Copy Markdown

@tioback tioback commented Mar 12, 2026

Problem

ctranslate2 (faster-whisper) does not support MPS, which forces --device cpu for the entire pipeline on Apple Silicon. However, pyannote.audio (diarization) is pure PyTorch and does support MPS.

The result is that diarization runs on CPU unnecessarily, making it the dominant bottleneck. This affects all Apple Silicon users and is the root cause of the performance issues reported in #109 and #1283.

More generally, the transcription and diarization backends have different device support matrices — there is no reason to force them onto the same device.

Benchmark on Apple Silicon M-series, 5-min audio:

Step CPU MPS
Diarization ~300s ~21s

Solution

Add a --diarize_device flag that, when specified, overrides --device only for DiarizationPipeline. Defaults to None, which falls back to --device — fully backward compatible.

Changes

  • whisperx/__main__.py — new --diarize_device argument in the diarization params group
  • whisperx/transcribe.py — pop diarize_device and pass it to DiarizationPipeline
  • tests/test_diarize_device.py — 8 tests covering CLI parsing and device routing (all mocked, no model downloads required)

Usage

# Apple Silicon: transcription on CPU (required by ctranslate2), diarization on MPS
whisperx audio.m4a --device cpu --diarize_device mps --diarize

# Multi-GPU: transcription on GPU 0, diarization on GPU 1
whisperx audio.m4a --device cuda --diarize_device cuda:1 --diarize

# Default behavior unchanged
whisperx audio.m4a --device cuda --diarize

Closes #109

@tioback
Copy link
Copy Markdown
Author

tioback commented Mar 17, 2026

@m-bain - can you take a look at this PR, please?

@Barabazs Barabazs self-requested a review March 17, 2026 06:53
Copy link
Copy Markdown
Collaborator

@Barabazs Barabazs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great contribution, thanks!

Please drop the tests this PR. It's not needed for this type of change.

Comment thread whisperx/transcribe.py
@@ -215,7 +216,7 @@ def transcribe_task(args: dict, parser: argparse.ArgumentParser):
logger.info("Performing diarization...")
logger.info(f"Using model: {diarize_model_name}")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger.info(f"Using model: {diarize_model_name}")
logger.info(f"Using model: {diarize_model_name}")
logger.info(f"Diarization device: {diarize_device}")

Adding a log statement would be helpful.

Comment thread whisperx/__main__.py
parser.add_argument("--min_speakers", default=None, type=int, help="Minimum number of speakers to in audio file")
parser.add_argument("--max_speakers", default=None, type=int, help="Maximum number of speakers to in audio file")
parser.add_argument("--diarize_model", default="pyannote/speaker-diarization-community-1", type=str, help="Name of the speaker diarization model to use")
parser.add_argument("--diarize_device", default=None, type=str, help="Device to use for diarization, overriding --device (e.g. 'mps' on Apple Silicon). Useful because ctranslate2 (transcription) does not support MPS but pyannote (diarization) does. Defaults to --device if not specified.")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
parser.add_argument("--diarize_device", default=None, type=str, help="Device to use for diarization, overriding --device (e.g. 'mps' on Apple Silicon). Useful because ctranslate2 (transcription) does not support MPS but pyannote (diarization) does. Defaults to --device if not specified.")
parser.add_argument("--diarize_device", default=None, type=str, help="Device to use for diarization (e.g. cuda, mps, ...)")

Please keep the help string concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] Support M1 Mac's GPU

2 participants