You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Gemini recently launched Flex Inference (service_tier: "flex"), which offers a 50% cost reduction for batch/background workloads where real-time latency isn't critical. This is a perfect match for Hermes's cron jobs and background subagents.
Hermes heavily utilizes cron jobs and background delegation tasks (via delegate_task). Halving the LLM cost for these asynchronous, non-interactive workflows using Gemini would be a massive efficiency gain for users.
Feature Request
Gemini recently launched Flex Inference (
service_tier: "flex"), which offers a 50% cost reduction for batch/background workloads where real-time latency isn't critical. This is a perfect match for Hermes'scronjobs and background subagents.See documentation: https://ai.google.dev/gemini-api/docs/flex-inference
The Problem
Currently, users cannot utilize the Flex tier in Hermes because:
agent/gemini_native_adapter.py(from feat(providers): route gemini through the native AI Studio API #12674) does not mapservice_tierfromapi_kwargs/request_optionsinto the native GooglegenerateContentJSON payload.service_tierexclusively to OpenAI routes. If merged as-is, it will actively strip the parameter from Gemini calls.Proposed Solution
gemini_native_adapter.py: Interceptservice_tierfromapi_kwargsand inject it into the native request body:hermes_cli/runtime_provider.py(or wherever route validation happens) recognizesgeminias a valid provider forservice_tier.Use Case
Hermes heavily utilizes cron jobs and background delegation tasks (via
delegate_task). Halving the LLM cost for these asynchronous, non-interactive workflows using Gemini would be a massive efficiency gain for users.Thank you!