Replies: 1 comment
-
|
I have implemented a new service with the required logic in the project that I am maintaining. It works well, but there are some nuances in its usage due to the specifics of how custom sentences operate. Visualization with real value ratios. If you use similar automations, you can recalculate the delay time (red block) using your token generation speed and response size. This will help you figure out whether you need it or not. %%{init: { "theme": "base", "themeVariables": { "primaryTextColor": "#000000", "taskTextLight": "#000000", "taskTextDark": "#000000", "textColor": "#000000" } } }%%
gantt
title timeline
dateFormat mm:ss.SSS
axisFormat %Ss
section 💬 Standard voice automation
User Request (~2s) :a1, 00:00.000, 1s
Data Fetch :a2, after a1, 500ms
conversation.process (10s / ~15tps) :crit, a3, after a2, 5s
TTS Synthesis (16s / RTF 0.5) :a4, after a3, 8s
🔊 Audio (32s) :done, a5, 00:06.800, 16s
section 🚀 My voice automation
User Request (~2s) :b1, 00:00.000, 1s
Data Fetch :b2, after b1, 500ms
script.turn_on for stream_response :crit, b3, after b2, 50ms
Empty response in automation :b4, after b3, 50ms
section 📜 script.stream_response
LLM token generation (10s / ~15tps) :active, c1, 00:01.550, 5s
TTS Synthesis (16s + token waiting time/ RTF 0.5) :c2, 00:01.700, 10s
🔊 Audio (32s) :done, c3, 00:01.900, 16s
It would be great if something similar could be implemented in a system component, for example, in assist_satellite. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Describe the feature
It would be a valuable improvement to introduce a new option for the conversation.process action that allows the token stream to be sent directly to the speech synthesizer and then to the voice satellite, similar to how it is implemented in LLM interactions via Assist.
This change would eliminate the need for an intermediate variable and the set_conversation_response action in simple automations. Most importantly, it would enhance the user experience by reducing response latency.
Use cases
This can be applied to search queries, processing news data, obtaining information about photos from cameras, and so on.
Anything else?
No response
Beta Was this translation helpful? Give feedback.
All reactions