Ultra-Low Latency Fast-Path option for Cloud APIs (Direct Frontend Fetch)

### Feature Description

Hey Braden, absolutely love Whispering! Been using it 10 hours a day for what feels like years.

In the meantime, I’ve been building a voice-to-text component for my own application and hit a massive performance breakthrough I wanted to share, hoping it might be useful as an optional toggle for your cloud users.

**The Latency Benchmark:** I noticed that Whispering (on my DELL XPS 17 hardware) currently can have about a ~1/2 to 1 second delay before recording actually starts, and another ~1 second delay after stopping before the transcription appears. In my custom implementation, the recording starts with zero perceptible latency, and after releasing the hotkey, the final text appears in the input field in just a few hundred milliseconds.

Here are the two architectural tricks I used to achieve this "mind-reading" speed with the Groq API:

**1. Direct Frontend Fetch (Bypassing Backend IPC)** Instead of passing the audio blob to a backend server (or through the Tauri/Rust bridge), my JavaScript executes a direct fetch to api.groq.com right inside the mediaRecorder.onstop event. This completely eliminates the IPC serialization overhead and backend routing delays.

**2. Skipping FFmpeg (Browser WebM is already compressed)** I saw the "Compress audio before transcription" toggle. If users are using the browser recording backend, the native MediaRecorder already produces a highly compressed webm blob in RAM. Sending this raw blob directly to Groq skips all local disk I/O and FFmpeg processing time. For short dictations, running FFmpeg locally actually takes much longer than just uploading the WebM blob!

Here is the exact vanilla JavaScript logic I use in my app to achieve this. It’s incredibly simple:

```
// Inside your mediaRecorder.onstop callback
const formData = new FormData();
// The browser's native WebM blob is already highly compressed and Groq-compatible
formData.append("file", audioBlob, "recording.webm"); 
formData.append("model", "whisper-large-v3-turbo");
formData.append("response_format", "verbose_json");
formData.append("temperature", "0.0"); // For deterministic results

// Direct fetch from the frontend to Groq, bypassing the backend entirely
const response = await fetch("https://api.groq.com/openai/v1/audio/transcriptions", {
    method: "POST",
    headers: { "Authorization": `Bearer ${YOUR_GROQ_KEY}` },
    body: formData
});

const data = await response.json();
// data.text is ready to be pasted in ~200-300ms!
```

I completely understand that your current `transcribe-rs` architecture is beautifully designed to support local offline models and advanced features like silence trimming. However, maybe this "Direct Frontend Fetch" could be added as an opt-in "Fast-Path" setting specifically for users utilizing Cloud APIs like Groq who want absolute minimum latency.

Hope this insight is helpful for the project!


### Relevant Platforms

All Platforms

### How important is this feature to you?

Critical for my use case

### Willing to Contribute?

No, but I can test it

### Discord Link

_No response_

### Checklist

- [x] I have searched existing issues and this feature hasn't been requested

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ultra-Low Latency Fast-Path option for Cloud APIs (Direct Frontend Fetch) #1639

Feature Description

Relevant Platforms

How important is this feature to you?

Willing to Contribute?

Discord Link

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Ultra-Low Latency Fast-Path option for Cloud APIs (Direct Frontend Fetch) #1639

Description

Feature Description

Relevant Platforms

How important is this feature to you?

Willing to Contribute?

Discord Link

Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions