Skip to content

Feature Request: Support service_tier (e.g. flex) for Gemini provider #12700

@Twanislas

Description

@Twanislas

Feature Request

Gemini recently launched Flex Inference (service_tier: "flex"), which offers a 50% cost reduction for batch/background workloads where real-time latency isn't critical. This is a perfect match for Hermes's cron jobs and background subagents.

See documentation: https://ai.google.dev/gemini-api/docs/flex-inference

The Problem

Currently, users cannot utilize the Flex tier in Hermes because:

  1. The newly merged agent/gemini_native_adapter.py (from feat(providers): route gemini through the native AI Studio API #12674) does not map service_tier from api_kwargs / request_options into the native Google generateContent JSON payload.
  2. The open PR Support OpenAI service_tier across direct API and openai-codex routes #5157 proposes restricting service_tier exclusively to OpenAI routes. If merged as-is, it will actively strip the parameter from Gemini calls.

Proposed Solution

  1. Update gemini_native_adapter.py: Intercept service_tier from api_kwargs and inject it into the native request body:
json
    {
      "contents": [...],
      "service_tier": "flex"
    }
  1. Exempt Gemini from PR Support OpenAI service_tier across direct API and openai-codex routes #5157: Ensure that hermes_cli/runtime_provider.py (or wherever route validation happens) recognizes gemini as a valid provider for service_tier.

Use Case

Hermes heavily utilizes cron jobs and background delegation tasks (via delegate_task). Halving the LLM cost for these asynchronous, non-interactive workflows using Gemini would be a massive efficiency gain for users.

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions