feat(prompt): KG-685 add and introduce, Google Search grounding support for Gemini#1898
feat(prompt): KG-685 add and introduce, Google Search grounding support for Gemini#1898Jai Goyal (goyaljai) wants to merge 13 commits into
Conversation
59b7b88 to
5933a0a
Compare
There was a problem hiding this comment.
Thank you again for this PR! I went through it and I think it could be improved - see my comments below.
The main things to address are:
- We don't want a seperate
LLMCapabilityforGoogleSearchbased grounding - The API integration would not work in the proposed state (we need typed
googleSearch, currently we have loose type forgoogle_search) - This PR tries to integrate with legacy Gemini models we don't support
I'm open to consulting further if you disagree on some points (I might have got something wrong) or need some guidance wrt our codebase
| @Serializable | ||
| internal class GoogleTool( | ||
| val functionDeclarations: List<GoogleFunctionDeclaration>? = null, | ||
| @SerialName("google_search") |
There was a problem hiding this comment.
Are you sure that it's google_search and not googleSearch?
https://ai.google.dev/api/caching#Tool
There was a problem hiding this comment.
Also, this should not just be some JsonObject - there's specific contract for it: https://ai.google.dev/api/caching#GoogleSearch
So we want something like this in the end:
@Serializable
internal class GoogleSearch(
val timeRangeFilter: Interval? = null,
val searchTypes: SearchTypes? = null,
)
And in GoogleTool :
val googleSearch: GoogleSearch? = null
There was a problem hiding this comment.
Good catch, Guess I verified but will verify again against the API spec and fix the serialization name. if needed I will Fix in ~1hr.
There was a problem hiding this comment.
yes it doesnot work like that.
I tested both features using generativelanguage.googleapis.com (this is the Google AI Studio API used by koog, not Vertex AI).
searchTypes.webSearch: The API accepted it, but the results were the same as using just {"googleSearch": {}}. There was no difference in behavior. So the filter did not work.
timeRangeFilter: I tested it with the question “Iran vs US war 2026, what is happening?” and set the date range from Jan 2025 to July 2025. Even with this filter, the model still returned full details about the Feb 2026 conflict, same as without the filter. So the filter did not work.
According to the documentation (https://ai.google.dev/api/caching#GoogleSearch
), these features seem to work only with the Vertex AI API (aiplatform.googleapis.com), not the public API we are using.
Since koog uses generativelanguage.googleapis.com, both features NOT NEEDED to avoid relying on something that doesn’t actually work.
Now, GoogleSearch is just an empty object, and only basic grounding is supported.
There was a problem hiding this comment.
According to the documentation these features seem to work only with the Vertex AI API
I don't think this is a correct statement. Can you show which part of the docs says that?
Afaik these docs cover the exact API we're using.
Moreover, the fact that the API did not give you the result you expected does not mean that we should not comply to the documented request format.
I tested the request shape with simple curl that was meant to be invalid (see missing startTime in timeRangeFilter):
> curl -s "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \
-H "X-goog-api-key: ..." \
-H "Content-Type: application/json" \
-d '{
"contents": [{"parts": [{"text": "Who won latest parliament elections in Hungary"}]}],
"tools": [{
"googleSearch": {
"timeRangeFilter": {
"endTime": "2024-12-31T23:59:59Z"
}
}
}]
}'
and got 400:
{
"error": {
"code": 400,
"message": "* GenerateContentRequest.tools[0].google_search.time_range_filter: [FIELD_INVALID] Both start time and end time must be given\n",
"status": "INVALID_ARGUMENT"
}
}
which proves that the shape IS expected. We must comply with the schema that is defined here: https://ai.google.dev/api/caching#GoogleSearch
The fact that you got the same responses with your calls only proves that inner fields of googleSearch are not required, but we should still allow users to pass them if they want - otherwise we constrain the Gemini API.
There was a problem hiding this comment.
Also, check these docs: https://ai.google.dev/gemini-api/docs/google-search
When a response is successfully grounded, the response includes a
groundingMetadatafield.
In my case, when I run your query - I did receive the mentioned field. It's true that the filter did not seem to work as expected, but the API itself did. We don't test models, we test integration
There was a problem hiding this comment.
It's a fair point — the API accepts it, we just shouldn't claim it guarantees filtered results.
d7151b5 to
8c0be29
Compare
Done all the changes. Please review again. |
|
Jakub Amanowicz (@Amaneusz) Please review now. Changed the code to support these. |
….0+ models
- Add LLMCapability.Grounding capability
- Add GoogleGroundingConfig sealed class (GoogleSearch / GoogleSearchRetrieval)
- Add groundingConfig field to GoogleParams
- Add Grounding to all Gemini 2.0+ models via fullCapabilities
- Inject google_search / googleSearchRetrieval tool in createGoogleRequest()
- Fix: dynamicRetrievalConfig requires mode=MODE_DYNAMIC for threshold to take effect
- Fix (KG-685): add init{} validation for dynamicThreshold in [0.0, 1.0]
- 7 unit tests, live API verified
Closes https://youtrack.jetbrains.com/issue/KG-685/
…nsupported API features - Replace GoogleGroundingConfig sealed class with simple groundingEnabled: Boolean in GoogleParams - Remove LLMCapability.Grounding (grounding is provider-specific, not a model trait) - Remove timeRangeFilter/Interval and searchTypes -- tested on generativelanguage.googleapis.com, neither changed model behavior (timeRangeFilter proved no-op via Iran/US war 2026 date test) - Replace JsonObject grounding fields with typed GoogleSearch/GoogleTool internal classes - Add GoogleGroundingLiveTest integration test - Use shouldNotBeNull() instead of !! in tests
Use @BeforeAll + assumeTrue to skip gracefully in CI when GEMINI_API_TEST_KEY is not set. Tests are ABORTED (not FAILED) when key is absent, so CI passes.
The API validates timeRangeFilter fields (400 on partial input proves it). We should expose what the documented API supports regardless of model behavior. - Interval(startTime, endTime) added back to GoogleGenerateContent - groundingStartTime/groundingEndTime added to GoogleParams - Validation: both must be set together (matches API requirement) - GoogleLLMClient builds Interval when both times are provided
… unsupported, webSearch is default)
c6a61ef to
e1c143b
Compare
Summary
Implements grounding with Google Search for Gemini models, and fixes a missing input validation bug discovered during implementation.
LLMCapability.Groundingso callers can check model support withmodel.supports(LLMCapability.Grounding)GoogleGroundingConfigsealed class —GoogleSearch(Gemini 2.0+, nativegoogle_searchtool) andGoogleSearchRetrieval(Gemini 1.5,googleSearchRetrievalwith optional dynamic threshold)groundingConfig: GoogleGroundingConfig?field toGoogleParamsLLMCapability.GroundingviafullCapabilitiescreateGoogleRequest()injects the appropriate grounding tool and guards unsupported models with a clearrequire()errorUsage
Test plan
createGoogleRequest injects google_search tool when GoogleSearch grounding is setcreateGoogleRequest injects googleSearchRetrieval tool with threshold when set— assertsMODE_DYNAMICis presentcreateGoogleRequest merges grounding tool with function toolscreateGoogleRequest throws when grounding set on model that does not support itGoogleSearchRetrieval rejects dynamicThreshold above 1_0GoogleSearchRetrieval rejects negative dynamicThresholdGoogleSearchRetrieval accepts null and boundary dynamicThreshold valuesGoogleLLMClientTesttests pass, no regressionsCloses https://youtrack.jetbrains.com/issue/KG-685/