Skip to content

Ultra-Low Latency Fast-Path option for Cloud APIs (Direct Frontend Fetch) #1639

@DavidGP

Description

@DavidGP

Feature Description

Hey Braden, absolutely love Whispering! Been using it 10 hours a day for what feels like years.

In the meantime, I’ve been building a voice-to-text component for my own application and hit a massive performance breakthrough I wanted to share, hoping it might be useful as an optional toggle for your cloud users.

The Latency Benchmark: I noticed that Whispering (on my DELL XPS 17 hardware) currently can have about a ~1/2 to 1 second delay before recording actually starts, and another ~1 second delay after stopping before the transcription appears. In my custom implementation, the recording starts with zero perceptible latency, and after releasing the hotkey, the final text appears in the input field in just a few hundred milliseconds.

Here are the two architectural tricks I used to achieve this "mind-reading" speed with the Groq API:

1. Direct Frontend Fetch (Bypassing Backend IPC) Instead of passing the audio blob to a backend server (or through the Tauri/Rust bridge), my JavaScript executes a direct fetch to api.groq.com right inside the mediaRecorder.onstop event. This completely eliminates the IPC serialization overhead and backend routing delays.

2. Skipping FFmpeg (Browser WebM is already compressed) I saw the "Compress audio before transcription" toggle. If users are using the browser recording backend, the native MediaRecorder already produces a highly compressed webm blob in RAM. Sending this raw blob directly to Groq skips all local disk I/O and FFmpeg processing time. For short dictations, running FFmpeg locally actually takes much longer than just uploading the WebM blob!

Here is the exact vanilla JavaScript logic I use in my app to achieve this. It’s incredibly simple:

// Inside your mediaRecorder.onstop callback
const formData = new FormData();
// The browser's native WebM blob is already highly compressed and Groq-compatible
formData.append("file", audioBlob, "recording.webm"); 
formData.append("model", "whisper-large-v3-turbo");
formData.append("response_format", "verbose_json");
formData.append("temperature", "0.0"); // For deterministic results

// Direct fetch from the frontend to Groq, bypassing the backend entirely
const response = await fetch("https://api.groq.com/openai/v1/audio/transcriptions", {
    method: "POST",
    headers: { "Authorization": `Bearer ${YOUR_GROQ_KEY}` },
    body: formData
});

const data = await response.json();
// data.text is ready to be pasted in ~200-300ms!

I completely understand that your current transcribe-rs architecture is beautifully designed to support local offline models and advanced features like silence trimming. However, maybe this "Direct Frontend Fetch" could be added as an opt-in "Fast-Path" setting specifically for users utilizing Cloud APIs like Groq who want absolute minimum latency.

Hope this insight is helpful for the project!

Relevant Platforms

All Platforms

How important is this feature to you?

Critical for my use case

Willing to Contribute?

No, but I can test it

Discord Link

No response

Checklist

  • I have searched existing issues and this feature hasn't been requested

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions