Streaming response for the "conversation.process" action #808

mitrokun · 2025-08-27T18:14:40Z

mitrokun
Aug 27, 2025

Describe the feature

It would be a valuable improvement to introduce a new option for the conversation.process action that allows the token stream to be sent directly to the speech synthesizer and then to the voice satellite, similar to how it is implemented in LLM interactions via Assist.
This change would eliminate the need for an intermediate variable and the set_conversation_response action in simple automations. Most importantly, it would enhance the user experience by reducing response latency.

Use cases

This can be applied to search queries, processing news data, obtaining information about photos from cameras, and so on.

Anything else?

No response

mitrokun · 2025-11-08T14:53:17Z

mitrokun
Nov 8, 2025
Author

I have implemented a new service with the required logic in the project that I am maintaining. It works well, but there are some nuances in its usage due to the specifics of how custom sentences operate.

Visualization with real value ratios. If you use similar automations, you can recalculate the delay time (red block) using your token generation speed and response size. This will help you figure out whether you need it or not.

%%{init: { "theme": "base", "themeVariables": { "primaryTextColor": "#000000", "taskTextLight": "#000000", "taskTextDark": "#000000", "textColor": "#000000" } } }%%
gantt
    title timeline
    dateFormat  mm:ss.SSS
    axisFormat  %Ss

    section 💬 Standard voice automation
    User Request (~2s)          :a1, 00:00.000, 1s
    Data Fetch              :a2, after a1, 500ms
    conversation.process (10s / ~15tps)    :crit, a3, after a2, 5s
    TTS Synthesis (16s / RTF 0.5)     :a4, after a3, 8s
    🔊 Audio (32s)          :done, a5, 00:06.800, 16s

    section 🚀 My voice automation
    User Request (~2s)            :b1, 00:00.000, 1s
    Data Fetch              :b2, after b1, 500ms
    script.turn_on for stream_response          :crit, b3, after b2, 50ms
    Empty response in automation          :b4, after b3, 50ms

    section 📜 script.stream_response

    LLM token generation (10s / ~15tps)       :active, c1, 00:01.550, 5s
    TTS Synthesis (16s + token waiting time/ RTF 0.5) :c2, 00:01.700, 10s
    🔊 Audio (32s)          :done, c3, 00:01.900, 16s

It would be great if something similar could be implemented in a system component, for example, in assist_satellite.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home Assistant

Streaming response for the "conversation.process" action #808

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Home Assistant

Streaming response for the "conversation.process" action #808

Uh oh!

mitrokun Aug 27, 2025

Describe the feature

Use cases

Anything else?

Replies: 1 comment

Uh oh!

Uh oh!

mitrokun Nov 8, 2025 Author

mitrokun
Aug 27, 2025

mitrokun
Nov 8, 2025
Author