feat: add --diarize_device flag to allow separate device for diarization#1373
Open
tioback wants to merge 1 commit intom-bain:mainfrom
Open
feat: add --diarize_device flag to allow separate device for diarization#1373tioback wants to merge 1 commit intom-bain:mainfrom
tioback wants to merge 1 commit intom-bain:mainfrom
Conversation
Author
|
@m-bain - can you take a look at this PR, please? |
Barabazs
requested changes
Mar 17, 2026
Collaborator
Barabazs
left a comment
There was a problem hiding this comment.
Great contribution, thanks!
Please drop the tests this PR. It's not needed for this type of change.
| @@ -215,7 +216,7 @@ def transcribe_task(args: dict, parser: argparse.ArgumentParser): | |||
| logger.info("Performing diarization...") | |||
| logger.info(f"Using model: {diarize_model_name}") | |||
Collaborator
There was a problem hiding this comment.
Suggested change
| logger.info(f"Using model: {diarize_model_name}") | |
| logger.info(f"Using model: {diarize_model_name}") | |
| logger.info(f"Diarization device: {diarize_device}") |
Adding a log statement would be helpful.
| parser.add_argument("--min_speakers", default=None, type=int, help="Minimum number of speakers to in audio file") | ||
| parser.add_argument("--max_speakers", default=None, type=int, help="Maximum number of speakers to in audio file") | ||
| parser.add_argument("--diarize_model", default="pyannote/speaker-diarization-community-1", type=str, help="Name of the speaker diarization model to use") | ||
| parser.add_argument("--diarize_device", default=None, type=str, help="Device to use for diarization, overriding --device (e.g. 'mps' on Apple Silicon). Useful because ctranslate2 (transcription) does not support MPS but pyannote (diarization) does. Defaults to --device if not specified.") |
Collaborator
There was a problem hiding this comment.
Suggested change
| parser.add_argument("--diarize_device", default=None, type=str, help="Device to use for diarization, overriding --device (e.g. 'mps' on Apple Silicon). Useful because ctranslate2 (transcription) does not support MPS but pyannote (diarization) does. Defaults to --device if not specified.") | |
| parser.add_argument("--diarize_device", default=None, type=str, help="Device to use for diarization (e.g. cuda, mps, ...)") |
Please keep the help string concise.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
ctranslate2(faster-whisper) does not support MPS, which forces--device cpufor the entire pipeline on Apple Silicon. However,pyannote.audio(diarization) is pure PyTorch and does support MPS.The result is that diarization runs on CPU unnecessarily, making it the dominant bottleneck. This affects all Apple Silicon users and is the root cause of the performance issues reported in #109 and #1283.
More generally, the transcription and diarization backends have different device support matrices — there is no reason to force them onto the same device.
Benchmark on Apple Silicon M-series, 5-min audio:
Solution
Add a
--diarize_deviceflag that, when specified, overrides--deviceonly forDiarizationPipeline. Defaults toNone, which falls back to--device— fully backward compatible.Changes
whisperx/__main__.py— new--diarize_deviceargument in the diarization params groupwhisperx/transcribe.py— popdiarize_deviceand pass it toDiarizationPipelinetests/test_diarize_device.py— 8 tests covering CLI parsing and device routing (all mocked, no model downloads required)Usage
Closes #109