expression: add full SQL auto embedding support for starter mode#69014
expression: add full SQL auto embedding support for starter mode#69014ChangRui-Ryan wants to merge 11 commits into
Conversation
|
@ChangRui-Ryan I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details. ⏳ This process typically takes 10-30 minutes depending on the complexity of the changes. ℹ️ Learn more details on Pantheon AI. |
|
Hi @ChangRui-Ryan. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds hosted embedding configuration, embedding provider/runtime wiring, EMBED_TEXT evaluation, planner and DDL support for auto-embedding, executor population paths, and updated tests/build metadata. ChangesEmbedding capability rollout
Sequence Diagram(s)sequenceDiagram
participant SQL as EMBED_TEXT / vec_embed_*_distance
participant Expression as expression.rewriter
participant Domain as Domain.GetEmbedFn
participant EmbedFn as inference.EmbedFn
participant Provider as embedding backend
SQL->>Expression: build or rewrite expression
Expression->>Domain: resolve embed function
Domain->>EmbedFn: return managed embed function
EmbedFn->>Provider: CreateEmbeddings(model,texts,opts)
Provider-->>EmbedFn: embeddings or error
EmbedFn-->>Expression: vector datum
Expression-->>SQL: rewritten or evaluated result
Estimated code review effort🎯 5 (Critical) | ⏱️ ~90+ minutes Suggested labels
Suggested reviewers
Poem
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 19
🧹 Nitpick comments (12)
pkg/planner/core/expression_rewriter.go (1)
3027-3031: ⚡ Quick winImprove error message robustness by handling empty column name.
Line 3029 uses
vecArg.OrigNamein the error message, which may be empty. Consider falling back tovecArg.String()or a placeholder to ensure the error message is always informative.♻️ Proposed improvement
+ colName := vecArg.OrigName + if colName == "" { + colName = vecArg.String() + } vecArg, ok := args[0].(*expression.Column) if !ok || vecArg.GeneratedExprString == "" || !vecArg.RetType.EvalType().IsVectorKind() { - er.err = fmt.Errorf("%s() first argument must be a vector embedding column generated by EMBED_TEXT()", v.FnName.O) + er.err = fmt.Errorf("%s() first argument must be a vector embedding column generated by EMBED_TEXT(), got %s", v.FnName.O, colName) return true }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/planner/core/expression_rewriter.go` around lines 3027 - 3031, The error message for the vector argument in expression_rewriter.go can be empty because vecArg.OrigName may be empty; update the check around the vecArg handling in the block that sets er.err (where vecArg is derived from args[0] and v.FnName.O is used) to build a robust name: if vecArg.OrigName is empty, use vecArg.String() or a constant placeholder like "<unknown column>" when formatting the error text so the message is always informative and never blank.pkg/inference/embedding/base/BUILD.bazel (1)
15-15: 💤 Low valueReconsider the flaky marking for deterministic tests.
The
base_testis marked asflaky = True, but reviewingbase_test.goshows it only performs deterministic base64 decoding and validation tests without concurrency or timing dependencies. Unless there's a known flakiness issue, consider removing this marking to ensure test failures are investigated rather than automatically retried.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/inference/embedding/base/BUILD.bazel` at line 15, The test target "base_test" in the BUILD rule is marked flaky = True even though base_test only runs deterministic base64 decode/validation; remove the flaky = True attribute (or set flaky = False) from the base_test rule so failures surface for investigation, then run the test suite locally to confirm no flakiness before committing.pkg/inference/embedding/batcher/batcher.go (1)
244-260: ⚡ Quick winDocument the batch-level context cancellation semantics.
The batch request context (
reqCtx) only cancels when all individual request contexts are done. This means if requests with different timeouts are batched together, a request with a short timeout may wait longer than expected (bounded bybatchWindow). While this is correct batching behavior, consider adding a comment explaining this trade-off for future maintainers.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/inference/embedding/batcher/batcher.go` around lines 244 - 260, Add a clear comment above the reqCtx/cancelReq creation and goroutine explaining that reqCtx only cancels after all individual request contexts in thisBatch.calls are done (the goroutine loops over call.ctx and only calls cancelReq after every call is done), so batching can cause a request with a short timeout to wait longer (up to batchWindow) before its batch-level context cancels; state this is intentional for batching and note the trade-off for maintainers and any expectations around embedder.CreateEmbeddings using reqCtx.pkg/inference/embedding/huggingface/BUILD.bazel (2)
23-23: ⚡ Quick winInvestigate why tests are marked flaky.
The
flaky = Trueattribute suggests these tests have non-deterministic failures. Consider investigating and fixing the root cause rather than marking tests as flaky, as this can mask real issues and reduce test signal quality.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/inference/embedding/huggingface/BUILD.bazel` at line 23, The BUILD.bazel target currently sets flaky = True, which masks nondeterministic test failures; remove or revert the flaky = True attribute on the target and instead reproduce the intermittent failures by running the corresponding Bazel test target(s) repeatedly (e.g., bazel test --runs_per_test=50), add deterministic fixes (stabilize timers, seed randomness, isolate shared state, add proper setup/teardown) in the test code, and only reintroduce flaky metadata after root causes are identified; look for references to the BUILD target that contains flaky = True to locate the tests to run and stabilize.
18-26: 💤 Low valueInconsistent shard_count across provider tests.
This test uses
shard_count = 10, while the Gemini tests useshard_count = 7. Consider establishing a consistent sharding strategy across similar test suites, or document why different providers need different shard counts.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/inference/embedding/huggingface/BUILD.bazel` around lines 18 - 26, The shard_count for the Huggingface test target "huggingface_test" is set to 10 which is inconsistent with the Gemini provider tests (which use shard_count = 7); update the "huggingface_test" target in BUILD.bazel to use the same shard_count value as other provider tests (e.g., change shard_count to 7) or add a brief comment in the BUILD file next to the "huggingface_test" rule explaining why it must remain different so the discrepancy is documented and intentional.pkg/inference/embedding/gemini/BUILD.bazel (1)
18-26: ⚡ Quick winInvestigate why tests are marked flaky.
The
flaky = Trueattribute suggests these tests have non-deterministic failures. Consider investigating and fixing the root cause rather than marking tests as flaky, as this can mask real issues and reduce test signal quality.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/inference/embedding/gemini/BUILD.bazel` around lines 18 - 26, The test target "gemini_test" in the BUILD.bazel currently sets flaky = True which masks nondeterministic failures; investigate and fix the root causes in gemini_test.go (race conditions, timing, shared state, or reliance on external services) by reproducing failures locally (run the "gemini_test" target without flaky and with shard_count reduced), add deterministic seeds or explicit synchronization in the test code, increase the timeout only if legitimately slow, and then remove the flaky = True attribute from the "gemini_test" rule once the tests are stable.pkg/inference/embedding/jina/jina.go (2)
55-55: ⚖️ Poor tradeoffConfigure HTTP client timeout to prevent indefinite hangs.
The
http.Clientis initialized with default settings (no timeout), which can cause requests to hang indefinitely if the JinaAI API is slow or unresponsive. While the context provides cancellation, the client itself should have a reasonable timeout.⏱️ Suggested fix to add timeout configuration
func NewJinaEmbedder(cfg EmbedderConfig) *Embedder { return &Embedder{ - client: http.Client{}, + client: http.Client{ + Timeout: 30 * time.Second, + }, cfg: cfg, }Consider making the timeout configurable via
EmbedderConfigif different deployments need different values.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/inference/embedding/jina/jina.go` at line 55, The http.Client is created with no timeout (client: http.Client{}), risking indefinite hangs; update the embedder initialization (where the client is constructed in jina.go) to set a sensible Timeout on the http.Client (e.g., from a new EmbedderConfig field) and wire that timeout through the constructor/path that creates the client so the timeout becomes configurable via EmbedderConfig and used when initializing the client.
116-119: 💤 Low valueAPI key may be logged in error responses.
When logging failed requests at line 118, the
bodymay contain echoed authorization information. While theAuthorizationheader isn't logged directly, verify whether JinaAI's error responses could leak sensitive data.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/inference/embedding/jina/jina.go` around lines 116 - 119, The error log in pkg/inference/embedding/jina/jina.go currently logs the full response body in the logutil.BgLogger().Error call (the line calling logutil.BgLogger().Error with zap.String("body", string(body))) which could leak echoed API keys; before calling that logger (or replace that call), sanitize the response body by parsing/masking any sensitive fields (e.g., "authorization", "api_key", "token", "access_token", "credentials") or redact Authorization-like patterns (e.g., Bearer <token>) and then log the sanitized string; update the code around the resp handling in the same function to produce a safeBody string and pass zap.String("body", safeBody) to the logger.pkg/inference/embedding/huggingface/huggingface.go (2)
121-124: 💤 Low valueAPI key may be logged in error responses.
When logging failed requests at line 123, the
bodymay contain echoed authorization information or API keys from the error response. While theAuthorizationheader itself isn't logged here, some APIs echo request details in error responses. Consider whether this is a concern for HuggingFace's API.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/inference/embedding/huggingface/huggingface.go` around lines 121 - 124, The error log at the HuggingFace request failure uses the raw response body (zap.String("body", string(body))) which may contain echoed API keys or authorization data; update the failure handling in the function that makes the request (the code that calls resp and builds the log with logutil.BgLogger()) to avoid logging sensitive contents by either removing the full body from the log or sanitizing it first (e.g., detect and redact common keys like "authorization", "api_key", "token" or apply a safe snippet/length limit and replace sensitive substrings with "[REDACTED]"), and keep the existing metadata (resp.StatusCode) in the log. Ensure you change the zap.String("body", ...) usage to use the sanitized string instead of the raw body.
54-54: ⚖️ Poor tradeoffConfigure HTTP client timeout to prevent indefinite hangs.
The
http.Clientis initialized with default settings, which means no timeout. This can cause requests to hang indefinitely if the HuggingFace API is slow or unresponsive. While the context passed tohttp.NewRequestWithContextprovides cancellation, the client itself should have a reasonable timeout for connection and overall request duration.⏱️ Suggested fix to add timeout configuration
func NewHuggingFaceEmbedder(cfg EmbedderConfig) *Embedder { return &Embedder{ - client: http.Client{}, + client: http.Client{ + Timeout: 30 * time.Second, + }, cfg: cfg, }Alternatively, make the timeout configurable via
EmbedderConfigif different deployments need different values.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/inference/embedding/huggingface/huggingface.go` at line 54, The http.Client is created with zero timeout (client: http.Client{}) which can hang; update the HuggingFace embedder initialization to set a sensible Timeout (e.g., 15–30s) on the http.Client instance used by the embedder, or wire a timeout value from EmbedderConfig into the client construction so the Timeout field is configurable; change the client creation referenced by the client: http.Client{} initializer in the huggingface embedder to use http.Client{Timeout: ...} (or build the client with a config-supplied duration).pkg/inference/embedding/jina/jina_test.go (1)
1-227: ⚡ Quick winConsider adding missing API key test coverage.
The HuggingFace tests include
TestHuggingFaceEmbedder_NoAPIKey,TestHuggingFaceEmbedder_CustomErrorHandling, andTestHuggingFaceEmbedder_CustomUnauthorizedErrorto verify API key validation and custom error handling. The Jina tests lack equivalent coverage for these scenarios. Consider adding similar tests for consistency and completeness.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/inference/embedding/jina/jina_test.go` around lines 1 - 227, Add unit tests to cover missing API key and custom error handling for the Jina embedder: create tests similar to HuggingFace ones that exercise NewJinaEmbedder and CreateEmbeddings for (1) no API key supplied (e.g., TestJinaEmbedder_NoAPIKey) verifying it returns an error and nil embeddings, (2) custom unauthorized response handling (e.g., TestJinaEmbedder_CustomUnauthorizedError) mocking a 401 response and asserting the error contains expected message, and (3) custom error payload handling (e.g., TestJinaEmbedder_CustomErrorHandling) mocking non-2xx responses with body and asserting CreateEmbeddings surfaces the body details; place these tests alongside existing TestJinaEmbedder_* functions and reuse httptest.NewServer, EmbedderConfig GetAPIKey/GetBaseURL, and assertions on returned embeddings/errors to mirror the HuggingFace tests.pkg/inference/embedding/nvidia/protocol.go (1)
17-22: 💤 Low valueRemove unused
Requeststruct or use it in the implementation.The
Requeststruct is defined but never used. Innvidia.golines 89-93, the request body is built using a map literal passed tobase.MarshalJSONWithOptionsinstead of marshalling this struct. This is dead code that adds maintenance burden.🧹 Option 1: Remove the unused struct
-// Request is the model for Nvidia NIM embeddings API request. -type Request struct { - Input []string `json:"input"` - Model string `json:"model"` - EncodingFormat string `json:"encoding_format"` -} - // Response is the model for Nvidia NIM embeddings API response.Alternatively, if the struct is intended for future use or documentation purposes, add a comment explaining why it's not currently used in the implementation.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/inference/embedding/nvidia/protocol.go` around lines 17 - 22, The Request struct (type Request) in protocol.go is dead code because the Nvidia embeddings request body is constructed directly via a map passed to base.MarshalJSONWithOptions in nvidia.go; either remove the unused Request type or explicitly use it when marshalling (replace the map literal with an instance of Request and marshal that), or if you keep the type for future/documentation purposes add a clear comment above type Request explaining why it is intentionally unused today and referencing the code path in nvidia.go that builds the payload via base.MarshalJSONWithOptions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/executor/insert_common.go`:
- Around line 688-699: The code captures a single shared evalCtx (evalCtx :=
e.Ctx().GetExprCtx().GetEvalCtx()) and reuses it from every goroutine, which can
race statement-scoped state when calling task.expr.Eval; instead create a
distinct EvalCtx per worker before calling task.expr.Eval (e.g., call
e.Ctx().GetExprCtx().GetEvalCtx() or use a provided Clone/New method inside the
eg.Go closure) so each goroutine has its own evalCtx; update the goroutine body
around task.expr.Eval and chunk.MutRowFromDatums(task.row).ToRow() to use that
per-goroutine evalCtx and ensure any cleanup if the EvalCtx type requires it.
- Around line 647-650: The code snapshots a single uint64 rowCntInLoadData from
e.rowCount for the whole batch when
e.Ctx().GetSessionVars().StmtCtx.InLoadDataStmt is true, then passes that same
value into HandleBadNull for every row causing LOAD DATA errors to be reported
against the batch's last row; change the logic to use the per-row LOAD DATA
index instead of the single snapshot: inside the per-row loop that calls
HandleBadNull, compute the correct per-row LOAD DATA index (using the per-row
counter/variable used to iterate the batch or the LOAD DATA line index) and pass
that per-row value to HandleBadNull rather than rowCntInLoadData; update any use
sites referencing rowCntInLoadData (e.g., where HandleBadNull is invoked) to use
the per-row index so failures are reported against the failing row.
In `@pkg/expression/builtin_inference.go`:
- Around line 1-13: Update the license header year in
pkg/expression/builtin_inference.go to match the corresponding test file
(builtin_inference_test.go) so the copyright years are consistent across related
files; open builtin_inference.go, locate the top-of-file license comment block
and change the "Copyright 2025" year to the same year used in
builtin_inference_test.go (2026) or vice versa if you prefer to standardize on
2025—ensure both files have the identical year in their header comment.
In `@pkg/expression/inference_helper.go`:
- Around line 83-85: ExtractAutoEmbedInfoFromAST currently only rejects fewer
than 2 args but silently accepts extra args; change the argument validation to
explicitly allow only the supported arities (e.g., exactly 2 or exactly 3
arguments) and return an error for any other arity. Update the len(fnCall.Args)
check(s) that currently read "if len(fnCall.Args) < 2" to instead validate
membership in the allowed set (len == 2 || len == 3) and return
fmt.Errorf("invalid EMBED_TEXT() usage: unexpected arity") when not allowed;
apply this same change to the second arg-count validation block in the file that
also inspects fnCall.Args (the later branch inside ExtractAutoEmbedInfoFromAST).
In `@pkg/expression/integration_test/integration_test.go`:
- Around line 4507-4513: The test mutates the global sysvar
tidb_exp_embed_jina_ai_api_key and currently only resets it at the end of the
happy path, risking leakage if an assertion fails; register an unconditional
cleanup before the first mutation (e.g., call t.Cleanup(func(){ tk.MustExec("set
@@global.tidb_exp_embed_jina_ai_api_key = ''") })) so the global is always
reset, and then proceed with the existing tk.MustExec/tk.MustQuery assertions
that set and check tidb_exp_embed_jina_ai_api_key.
In `@pkg/inference/embedding/cohere/cohere.go`:
- Around line 52-56: NewCohereEmbedder constructs an http.Client without a
timeout causing potential indefinite hangs; update NewCohereEmbedder to
initialize client: http.Client{Timeout: time.Second * <reasonable_value>}
(choose e.g. 10s or configurable), and add the "time" import; ensure the
Embedder struct's client field still uses that configured http.Client so all
Cohere requests get the client-level timeout as defense-in-depth.
In `@pkg/inference/embedding/gemini/gemini.go`:
- Around line 98-100: The model string is interpolated directly into the path
when building fullURL, which can break URLs or allow unintended endpoints;
before the fmt.Sprintf call that sets fullURL, validate that model is non-empty
and does not contain path traversal or illegal chars (e.g., reject segments with
"/" or "..") or alternatively escape it with a path-encoding function (e.g.,
url.PathEscape) and use the escaped value (escapedModel) in place of model;
update the code around variables fullURL, model and baseURL to perform this
validation/escaping and return an error if validation fails.
- Around line 51-56: The NewGeminiEmbedder function constructs an Embedder with
an http.Client that lacks a timeout; update NewGeminiEmbedder to set a sensible
client timeout (e.g., via http.Client{Timeout: time.Second * <N>}) so requests
cannot hang indefinitely, and add the "time" import; modify the Embedder
construction (function NewGeminiEmbedder, type Embedder and its client field) to
use the configured timeout value.
In `@pkg/inference/embedding/huggingface/protocol.go`:
- Around line 22-24: The comment above the Response type is inconsistent with
the actual type; update the docstring for Response (type Response [][]float32 in
protocol.go) to state that HuggingFace returns a 2D array of float32 values, or
if you confirm the API returns float64 instead, change the type to [][]float64
and update all usages accordingly; ensure the comment and the type (Response)
match exactly.
In `@pkg/inference/embedding/jina/BUILD.bazel`:
- Around line 23-24: The BUILD target currently sets flaky = True and
shard_count = 6; remove these patch-work flags and fix the underlying test
instability instead: locate the failing tests under the
pkg/inference/embedding/jina package, reproduce the flakes, and address root
causes such as race conditions, timing-dependent assertions, shared state, or
unmocked external dependencies by adding proper synchronization, deterministic
fixtures, explicit mocks, or adjusting timeouts; once tests are deterministic,
delete flaky = True and shard_count = 6 from the BUILD.bazel target so CI runs a
single stable shard.
In `@pkg/inference/embedding/mock/BUILD.bazel`:
- Around line 16-17: The BUILD.bazel target in pkg/inference/embedding/mock
currently sets flaky = True and shard_count = 3—remove those attributes and make
the tests deterministic instead of masking failures; locate the mock test target
in BUILD.bazel and (1) remove flaky and shard_count, (2) audit tests in
pkg/inference/embedding/mock for shared global state, non-deterministic
timing/race conditions, or reliance on test ordering, and (3) fix by adding
proper setup/teardown (reset globals, use fresh temp dirs, inject deterministic
mocks/fakes, or serialize accesses with locks) so the tests pass reliably in
parallel without sharding. Ensure the target no longer needs flaky/shard_count
after these fixes and run the test suite to confirm stability.
In `@pkg/inference/embedding/mock/mock.go`:
- Around line 72-84: The delay handling sleeps unconditionally and ignores the
provided context, causing hangs if the context is cancelled; modify the block
that parses opts["delay"] so that instead of time.Sleep(dur) you wait with a
select on time.After(dur) and ctx.Done(), returning early with ctx.Err() when
the context is cancelled. Keep the existing parsing and error messages (the opts
map and delay parsing code) and only replace the unconditional sleep with a
context-aware select that returns (nil, ctx.Err()) on cancellation.
In `@pkg/inference/embedding/nvidia/nvidia.go`:
- Around line 147-161: The current loop assigns embeddings by the sorted
position (row) which can misplace vectors if indices are duplicated or
non-sequential; instead validate that respObj.Data contains exactly one entry
for each index 0..len(texts)-1 and then assign using the actual Index.
Specifically, after sort.Slice(respObj.Data, ...) iterate respObj.Data and for
each item check that 0 <= item.Index < len(texts) and that
embeddings[item.Index] is nil (to detect duplicates); on any invalid or
duplicate index return an error indicating the problematic item.Index; otherwise
decode with base.DecodeFloat32ArrayBytes and set embeddings[item.Index] = e (not
embeddings[row]).
- Around line 54-56: The HTTP client for Embedder is created with zero timeout
(http.Client{}), which can cause indefinite hangs; set a sensible Timeout on the
client (e.g., http.Client{Timeout: ...}) and make that timeout configurable via
EmbedderConfig (add a Timeout field and use it when constructing the Embedder's
client); update the constructor that returns &Embedder{client: http.Client{},
cfg: cfg, ...} to initialize client with the configured Timeout and provide a
reasonable default if cfg.Timeout is unset.
In `@pkg/inference/embedding/openai/openai_test.go`:
- Line 28: Replace the misleading comment "Mock successful response from real
Jina API" in openai_test.go with a correct description for the OpenAI embedder
(e.g., "Mock successful response from real OpenAI API" or "Mock successful
response for OpenAI embedder") so the comment accurately matches the test;
locate the comment by searching for that exact string in the test file and
update it in the test function that sets up the mocked OpenAI response.
In `@pkg/inference/embedding/openai/openai.go`:
- Around line 142-157: After sorting respObj.Data by Index, validate that each
item's Index is within [0, len(texts)-1] and that indices form a unique sequence
(no duplicates) before assigning into embeddings; for each item in respObj.Data
call base.DecodeFloat32ArrayBytes(item.Embedding) as before but place the
decoded vector into embeddings[item.Index] (not embeddings[row]), and return a
clear error if any item.Index is out of range or a duplicate/missing so we never
misassign embeddings; reference respObj.Data, item.Index, embeddings and
base.DecodeFloat32ArrayBytes to locate and implement the checks and assignment
change.
- Around line 54-58: The http.Client in NewOpenAIEmbedder is created with
default settings (no timeout); update NewOpenAIEmbedder to set a sensible
timeout on Embedder.client by using a configured value from EmbedderConfig
(e.g., add a Timeout field to EmbedderConfig) and fall back to a reasonable
default if not provided; ensure the Embedder struct uses http.Client{Timeout:
cfg.Timeout} (or default) so embedding requests cannot hang indefinitely.
In `@pkg/planner/core/expression_rewriter.go`:
- Around line 3038-3042: The error wrapping currently replaces the original
error with a new one via fmt.Errorf in the block that calls
expression.ExtractAutoEmbedInfoFromAST(astExpr), losing the original
stack/context; change the assignment to preserve the original error by using
errors.Annotatef (or the project's equivalent) to annotate err and assign that
to er.err (refer to expression.ExtractAutoEmbedInfoFromAST, er.err, and
fmt.Errorf) so the original error context is preserved while adding the "column
%s cannot be used in %s" message.
- Around line 3057-3060: The inline initializer fnInit currently does an
unchecked type assertion to *expression.BuiltinEmbedTextSig which can panic;
update fnInit (used by er.newFunctionWithInit) to perform a safe type assertion
(e.g., using the "v, ok := sf.Function.(*expression.BuiltinEmbedTextSig)"
pattern or a type switch), set v.IsFromVecSearch = true only when ok, and return
a descriptive error (instead of panicking) when the function is not the expected
BuiltinEmbedTextSig type so callers can handle the failure.
---
Nitpick comments:
In `@pkg/inference/embedding/base/BUILD.bazel`:
- Line 15: The test target "base_test" in the BUILD rule is marked flaky = True
even though base_test only runs deterministic base64 decode/validation; remove
the flaky = True attribute (or set flaky = False) from the base_test rule so
failures surface for investigation, then run the test suite locally to confirm
no flakiness before committing.
In `@pkg/inference/embedding/batcher/batcher.go`:
- Around line 244-260: Add a clear comment above the reqCtx/cancelReq creation
and goroutine explaining that reqCtx only cancels after all individual request
contexts in thisBatch.calls are done (the goroutine loops over call.ctx and only
calls cancelReq after every call is done), so batching can cause a request with
a short timeout to wait longer (up to batchWindow) before its batch-level
context cancels; state this is intentional for batching and note the trade-off
for maintainers and any expectations around embedder.CreateEmbeddings using
reqCtx.
In `@pkg/inference/embedding/gemini/BUILD.bazel`:
- Around line 18-26: The test target "gemini_test" in the BUILD.bazel currently
sets flaky = True which masks nondeterministic failures; investigate and fix the
root causes in gemini_test.go (race conditions, timing, shared state, or
reliance on external services) by reproducing failures locally (run the
"gemini_test" target without flaky and with shard_count reduced), add
deterministic seeds or explicit synchronization in the test code, increase the
timeout only if legitimately slow, and then remove the flaky = True attribute
from the "gemini_test" rule once the tests are stable.
In `@pkg/inference/embedding/huggingface/BUILD.bazel`:
- Line 23: The BUILD.bazel target currently sets flaky = True, which masks
nondeterministic test failures; remove or revert the flaky = True attribute on
the target and instead reproduce the intermittent failures by running the
corresponding Bazel test target(s) repeatedly (e.g., bazel test
--runs_per_test=50), add deterministic fixes (stabilize timers, seed randomness,
isolate shared state, add proper setup/teardown) in the test code, and only
reintroduce flaky metadata after root causes are identified; look for references
to the BUILD target that contains flaky = True to locate the tests to run and
stabilize.
- Around line 18-26: The shard_count for the Huggingface test target
"huggingface_test" is set to 10 which is inconsistent with the Gemini provider
tests (which use shard_count = 7); update the "huggingface_test" target in
BUILD.bazel to use the same shard_count value as other provider tests (e.g.,
change shard_count to 7) or add a brief comment in the BUILD file next to the
"huggingface_test" rule explaining why it must remain different so the
discrepancy is documented and intentional.
In `@pkg/inference/embedding/huggingface/huggingface.go`:
- Around line 121-124: The error log at the HuggingFace request failure uses the
raw response body (zap.String("body", string(body))) which may contain echoed
API keys or authorization data; update the failure handling in the function that
makes the request (the code that calls resp and builds the log with
logutil.BgLogger()) to avoid logging sensitive contents by either removing the
full body from the log or sanitizing it first (e.g., detect and redact common
keys like "authorization", "api_key", "token" or apply a safe snippet/length
limit and replace sensitive substrings with "[REDACTED]"), and keep the existing
metadata (resp.StatusCode) in the log. Ensure you change the zap.String("body",
...) usage to use the sanitized string instead of the raw body.
- Line 54: The http.Client is created with zero timeout (client: http.Client{})
which can hang; update the HuggingFace embedder initialization to set a sensible
Timeout (e.g., 15–30s) on the http.Client instance used by the embedder, or wire
a timeout value from EmbedderConfig into the client construction so the Timeout
field is configurable; change the client creation referenced by the client:
http.Client{} initializer in the huggingface embedder to use
http.Client{Timeout: ...} (or build the client with a config-supplied duration).
In `@pkg/inference/embedding/jina/jina_test.go`:
- Around line 1-227: Add unit tests to cover missing API key and custom error
handling for the Jina embedder: create tests similar to HuggingFace ones that
exercise NewJinaEmbedder and CreateEmbeddings for (1) no API key supplied (e.g.,
TestJinaEmbedder_NoAPIKey) verifying it returns an error and nil embeddings, (2)
custom unauthorized response handling (e.g.,
TestJinaEmbedder_CustomUnauthorizedError) mocking a 401 response and asserting
the error contains expected message, and (3) custom error payload handling
(e.g., TestJinaEmbedder_CustomErrorHandling) mocking non-2xx responses with body
and asserting CreateEmbeddings surfaces the body details; place these tests
alongside existing TestJinaEmbedder_* functions and reuse httptest.NewServer,
EmbedderConfig GetAPIKey/GetBaseURL, and assertions on returned
embeddings/errors to mirror the HuggingFace tests.
In `@pkg/inference/embedding/jina/jina.go`:
- Line 55: The http.Client is created with no timeout (client: http.Client{}),
risking indefinite hangs; update the embedder initialization (where the client
is constructed in jina.go) to set a sensible Timeout on the http.Client (e.g.,
from a new EmbedderConfig field) and wire that timeout through the
constructor/path that creates the client so the timeout becomes configurable via
EmbedderConfig and used when initializing the client.
- Around line 116-119: The error log in pkg/inference/embedding/jina/jina.go
currently logs the full response body in the logutil.BgLogger().Error call (the
line calling logutil.BgLogger().Error with zap.String("body", string(body)))
which could leak echoed API keys; before calling that logger (or replace that
call), sanitize the response body by parsing/masking any sensitive fields (e.g.,
"authorization", "api_key", "token", "access_token", "credentials") or redact
Authorization-like patterns (e.g., Bearer <token>) and then log the sanitized
string; update the code around the resp handling in the same function to produce
a safeBody string and pass zap.String("body", safeBody) to the logger.
In `@pkg/inference/embedding/nvidia/protocol.go`:
- Around line 17-22: The Request struct (type Request) in protocol.go is dead
code because the Nvidia embeddings request body is constructed directly via a
map passed to base.MarshalJSONWithOptions in nvidia.go; either remove the unused
Request type or explicitly use it when marshalling (replace the map literal with
an instance of Request and marshal that), or if you keep the type for
future/documentation purposes add a clear comment above type Request explaining
why it is intentionally unused today and referencing the code path in nvidia.go
that builds the payload via base.MarshalJSONWithOptions.
In `@pkg/planner/core/expression_rewriter.go`:
- Around line 3027-3031: The error message for the vector argument in
expression_rewriter.go can be empty because vecArg.OrigName may be empty; update
the check around the vecArg handling in the block that sets er.err (where vecArg
is derived from args[0] and v.FnName.O is used) to build a robust name: if
vecArg.OrigName is empty, use vecArg.String() or a constant placeholder like
"<unknown column>" when formatting the error text so the message is always
informative and never blank.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 6b57fe1c-2b9b-4cb6-9768-bcc6e68c1971
📒 Files selected for processing (75)
pkg/config/config.gopkg/config/config.toml.examplepkg/config/config.toml.nextgen.examplepkg/ddl/add_column.gopkg/ddl/create_table.gopkg/ddl/generated_column.gopkg/domain/BUILD.bazelpkg/domain/domain.gopkg/domain/inference.gopkg/executor/insert_common.gopkg/expression/BUILD.bazelpkg/expression/builtin.gopkg/expression/builtin_inference.gopkg/expression/builtin_inference_test.gopkg/expression/builtin_threadunsafe_generated.gopkg/expression/column.gopkg/expression/expression.gopkg/expression/function_traits.gopkg/expression/function_traits_test.gopkg/expression/generator/builtin_threadsafe.gopkg/expression/inference_helper.gopkg/expression/inference_helper_test.gopkg/expression/integration_test/BUILD.bazelpkg/expression/integration_test/integration_test.gopkg/inference/BUILD.bazelpkg/inference/domainadaptor/BUILD.bazelpkg/inference/domainadaptor/adaptor.gopkg/inference/embedding/base/BUILD.bazelpkg/inference/embedding/base/base.gopkg/inference/embedding/base/base_test.gopkg/inference/embedding/batcher/BUILD.bazelpkg/inference/embedding/batcher/batcher.gopkg/inference/embedding/batcher/batcher_test.gopkg/inference/embedding/cohere/BUILD.bazelpkg/inference/embedding/cohere/cohere.gopkg/inference/embedding/cohere/cohere_test.gopkg/inference/embedding/cohere/protocol.gopkg/inference/embedding/gemini/BUILD.bazelpkg/inference/embedding/gemini/gemini.gopkg/inference/embedding/gemini/gemini_test.gopkg/inference/embedding/gemini/protocol.gopkg/inference/embedding/huggingface/BUILD.bazelpkg/inference/embedding/huggingface/huggingface.gopkg/inference/embedding/huggingface/huggingface_test.gopkg/inference/embedding/huggingface/protocol.gopkg/inference/embedding/jina/BUILD.bazelpkg/inference/embedding/jina/jina.gopkg/inference/embedding/jina/jina_test.gopkg/inference/embedding/jina/protocol.gopkg/inference/embedding/mock/BUILD.bazelpkg/inference/embedding/mock/mock.gopkg/inference/embedding/mock/mock_test.gopkg/inference/embedding/nvidia/BUILD.bazelpkg/inference/embedding/nvidia/nvidia.gopkg/inference/embedding/nvidia/nvidia_test.gopkg/inference/embedding/nvidia/protocol.gopkg/inference/embedding/openai/BUILD.bazelpkg/inference/embedding/openai/openai.gopkg/inference/embedding/openai/openai_test.gopkg/inference/embedding/openai/protocol.gopkg/inference/embedding/tidbcloud/BUILD.bazelpkg/inference/embedding/tidbcloud/protocol.gopkg/inference/embedding/tidbcloud/tidbcloud_free.gopkg/inference/embedding/tidbcloud/tidbcloud_free_test.gopkg/inference/manager_test.gopkg/inference/sqlembed.gopkg/parser/ast/functions.gopkg/planner/core/BUILD.bazelpkg/planner/core/expression_rewriter.gopkg/planner/core/logical_plan_builder.gopkg/sessionctx/vardef/tidb_vars.gopkg/sessionctx/variable/BUILD.bazelpkg/sessionctx/variable/embedding_vars.gopkg/sessionctx/variable/embedding_vars_test.gopkg/sessionctx/variable/sysvar.go
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #69014 +/- ##
================================================
- Coverage 76.3268% 76.0455% -0.2814%
================================================
Files 2041 2078 +37
Lines 562499 583766 +21267
================================================
+ Hits 429338 443928 +14590
- Misses 132250 137130 +4880
- Partials 911 2708 +1797
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
pkg/expression/builtin_inference.go (1)
147-158: ⚡ Quick winConsider adding a comment to explain the
@searchsuffix behavior.The
@searchsuffix processing logic is subtle: whenisFromVecSearchis true, options like"key@search": valueare renamed to"key": value; when false, they're dropped entirely. This allows callers to specify search-time-only options that are ignored during direct EMBED_TEXT() calls but honored when the function is rewritten from VEC_EMBED_*_DISTANCE(). A brief comment would help future maintainers understand this intent.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/expression/builtin_inference.go` around lines 147 - 158, Add a short clarifying comment above the block that collects and processes keys with the "`@search`" suffix explaining that entries like "key@search" are treated as search-time-only options: if isFromVecSearch is true they are promoted to "key" (opts[strings.TrimSuffix(k, "`@search`")] = opts[k]) to be honored when the call was rewritten from a VEC_EMBED_*_DISTANCE path, otherwise they are removed (delete(opts, k)) so direct EMBED_TEXT() calls ignore them; reference the variables searchOpts, opts and isFromVecSearch in the comment so future readers understand the intent of this rename-or-drop behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@pkg/expression/builtin_inference.go`:
- Around line 147-158: Add a short clarifying comment above the block that
collects and processes keys with the "`@search`" suffix explaining that entries
like "key@search" are treated as search-time-only options: if isFromVecSearch is
true they are promoted to "key" (opts[strings.TrimSuffix(k, "`@search`")] =
opts[k]) to be honored when the call was rewritten from a VEC_EMBED_*_DISTANCE
path, otherwise they are removed (delete(opts, k)) so direct EMBED_TEXT() calls
ignore them; reference the variables searchOpts, opts and isFromVecSearch in the
comment so future readers understand the intent of this rename-or-drop behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 32cbf36e-e183-4934-addb-0aaa6faec878
📒 Files selected for processing (28)
pkg/executor/BUILD.bazelpkg/executor/insert_common.gopkg/expression/builtin_inference.gopkg/expression/inference_helper.gopkg/expression/inference_helper_test.gopkg/expression/integration_test/integration_test.gopkg/inference/embedding/base/BUILD.bazelpkg/inference/embedding/base/base.gopkg/inference/embedding/base/base_test.gopkg/inference/embedding/cohere/cohere.gopkg/inference/embedding/gemini/BUILD.bazelpkg/inference/embedding/gemini/gemini.gopkg/inference/embedding/gemini/gemini_test.gopkg/inference/embedding/huggingface/huggingface.gopkg/inference/embedding/huggingface/protocol.gopkg/inference/embedding/jina/BUILD.bazelpkg/inference/embedding/jina/jina.gopkg/inference/embedding/jina/jina_test.gopkg/inference/embedding/mock/mock.gopkg/inference/embedding/mock/mock_test.gopkg/inference/embedding/nvidia/BUILD.bazelpkg/inference/embedding/nvidia/nvidia.gopkg/inference/embedding/nvidia/nvidia_test.gopkg/inference/embedding/openai/BUILD.bazelpkg/inference/embedding/openai/openai.gopkg/inference/embedding/openai/openai_test.gopkg/inference/embedding/tidbcloud/tidbcloud_free.gopkg/planner/core/expression_rewriter.go
🚧 Files skipped from review as they are similar to previous changes (20)
- pkg/inference/embedding/base/BUILD.bazel
- pkg/inference/embedding/openai/BUILD.bazel
- pkg/inference/embedding/huggingface/protocol.go
- pkg/inference/embedding/mock/mock.go
- pkg/inference/embedding/jina/BUILD.bazel
- pkg/inference/embedding/mock/mock_test.go
- pkg/inference/embedding/cohere/cohere.go
- pkg/inference/embedding/nvidia/nvidia.go
- pkg/inference/embedding/openai/openai.go
- pkg/inference/embedding/gemini/BUILD.bazel
- pkg/inference/embedding/jina/jina_test.go
- pkg/inference/embedding/nvidia/BUILD.bazel
- pkg/inference/embedding/openai/openai_test.go
- pkg/inference/embedding/huggingface/huggingface.go
- pkg/planner/core/expression_rewriter.go
- pkg/expression/inference_helper_test.go
- pkg/inference/embedding/tidbcloud/tidbcloud_free.go
- pkg/inference/embedding/gemini/gemini_test.go
- pkg/expression/integration_test/integration_test.go
- pkg/executor/insert_common.go
|
/retest |
|
@ChangRui-Ryan: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
| type HostedEmbedding struct { | ||
| // Enabled indicates whether the hosted embedding service is enabled. | ||
| // It only takes effect for nextgen kernels. | ||
| Enabled bool `toml:"enabled" json:"enabled"` |
| [hosted-embedding] | ||
| # Enables the TiDB Cloud hosted embedding provider tidbcloud_free. | ||
| # Effective only for nextgen kernels. | ||
| enabled = false | ||
| api-endpoint = "" | ||
| api-key-path = "" | ||
|
|
There was a problem hiding this comment.
remove it, not supported in classic kernel
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest |
|
@ChangRui-Ryan: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest |
|
@ChangRui-Ryan: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest |
|
@ChangRui-Ryan: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
D3Hunter
left a comment
There was a problem hiding this comment.
Summary
- Total findings: 11
- Inline comments: 10
- Summary-only findings (no inline anchor): 1
Findings (highest risk first)
🚨 [Blocker] (1)
- Auto-embedding table metadata is introduced without an N-1 compatibility gate (
pkg/ddl/create_table.go:558, pkg/executor/insert_common.go:634, pkg/expression/expression.go:1114, pkg/ddl/generated_column.go:448)
⚠️ [Major] (7)
- LOAD DATA never materializes stored EMBED_TEXT generated columns (
LOAD DATA row path in pkg/executor/load_data.go lines 560, 581, and 663, together with the auto-embedding skip in pkg/executor/insert_common.go line 899) - ALTER or MODIFY can still create generated-column chains on top of auto-embedding columns (
pkg/ddl/create_table.go:602, pkg/ddl/generated_column.go:255, pkg/executor/insert_common.go:899) - DDL accepts auto-embedding generated columns with invalid JSON options (
pkg/ddl/generated_column.go:448, pkg/expression/inference_helper.go:115, pkg/expression/builtin_inference.go:138) - EMBED_TEXT drops statement cancellation and timeout propagation once the remote call starts (
pkg/inference/sqlembed.go:162; pkg/executor/insert_common.go:700-740) - Hosted embedding API-key file is read once at startup and never refreshed (
pkg/inference/sqlembed.go:106-126; pkg/domain/inference.go:19-31) - domainadaptor.GetEmbedFn hides a global-default fallback behind a domain-scoped name (
pkg/inference/domainadaptor/adaptor.go:15, pkg/inference/domainadaptor/adaptor.go:32) - INSERT-side auto-embedding reimplements the EMBED_TEXT runtime contract (
pkg/executor/insert_common.go:634, pkg/expression/builtin_inference.go:95)
🟡 [Minor] (3)
- GeneratedExprString comment describes an AST contract even though the field stores SQL text (
pkg/expression/column.go:297) - Auto-embedding insert helper names imply batching and singular datum handling that the code does not do (
pkg/executor/insert_common.go:57, pkg/executor/insert_common.go:634) - Domain public API was expanded for a single local adaptor path (
pkg/domain/inference.go:19, pkg/inference/domainadaptor/adaptor.go:24)
Unanchored findings
⚠️ [Major] (1)
- LOAD DATA never materializes stored EMBED_TEXT generated columns
- Scope:
LOAD DATA row path in pkg/executor/load_data.go lines 560, 581, and 663, together with the auto-embedding skip in pkg/executor/insert_common.go line 899 - Request: Add the same auto-embedding materialization step to the LOAD DATA row pipeline and cover it with an integration test for
LOAD DATAon a table with a storedEMBED_TEXT()generated column.
- Scope:
| if err := checkIllegalFn4Generated(colDef.Name.Name.L, typeColumn, option.Expr); err != nil { | ||
| return errors.Trace(err) | ||
| } | ||
| if err := checkGeneratedColForAutoEmbedding(colDef.Name.Name.L, option.Expr, option.Stored); err != nil { |
There was a problem hiding this comment.
🚨 [Blocker] Auto-embedding table metadata is introduced without an N-1 compatibility gate
Why
CREATE TABLE now allows persisted stored generated columns backed by EMBED_TEXT(), but the resulting metadata and DML/planner behavior depend on new code paths that are only present in this version. In a rolling upgrade or rollback window, older TiDB nodes can still observe the same table metadata even though this feature is not guarded for mixed-version use.
Scope
pkg/ddl/create_table.go:558, pkg/executor/insert_common.go:634, pkg/expression/expression.go:1114, pkg/ddl/generated_column.go:448
Risk if unchanged
Creating one of these tables before every TiDB node is upgraded can break inserts/updates or vector-search planning on N-1 nodes, which makes the feature unsafe for normal rolling upgrade and downgrade procedures.
Evidence
checkGeneratedColForAutoEmbedding() only validates SQL shape before CREATE TABLE succeeds; there is no cluster-version gate or mixed-version rejection in the DDL path. The new metadata is then consumed by auto-embedding DML evaluation in fillAutoEmbeddingDatum() and by planner schema loading through GeneratedExprString. The added tests cover only single-version starter-mode flows (pkg/expression/integration_test/integration_test.go:4539 through :4668) and do not exercise N/N-1 upgrade or rollback behavior.
Change request
Gate creation and use of EMBED_TEXT()-generated columns until the cluster version guarantees every TiDB node understands the new metadata, or return an explicit mixed-version unsupported error and add upgrade/downgrade coverage for that path.
There was a problem hiding this comment.
Addressed. checkGeneratedColForAutoEmbedding now rejects creating auto-embedding generated columns during mixed-version rolling upgrades.
| } | ||
| } | ||
|
|
||
| autoEmbeddingCols := make(map[string]struct{}) |
There was a problem hiding this comment.
⚠️ [Major] ALTER or MODIFY can still create generated-column chains on top of auto-embedding columns
Why
CREATE TABLE now blocks generated columns that depend on an auto-embedding column, but the modify-column validation path does not enforce the same dependency rule. That leaves an ALTER path where a generated column can be changed to depend on an EMBED_TEXT() column even though the executor deliberately skips evaluating EMBED_TEXT() during the normal generated-column pass.
Scope
pkg/ddl/create_table.go:602, pkg/ddl/generated_column.go:255, pkg/executor/insert_common.go:899
Risk if unchanged
A user can create a legal table and later ALTER TABLE ... MODIFY COLUMN a generated column to reference an auto-embedding column. Subsequent inserts then evaluate the dependent generated column before the embedding column is filled, so the dependent value is computed from NULL or dummy state and stored incorrectly.
Evidence
create_table.go:602-617 adds a dedicated scan that rejects dependencies on auto-embedding generated columns. checkModifyGeneratedColumn in generated_column.go:255-279 validates the new expression plus auto-increment and index rules, but never checks whether dependColNames includes an existing auto-embedding generated column. That omission matters because fillRow skips EMBED_TEXT() columns at insert_common.go:899-900 and does not revisit downstream generated columns afterward.
Change request
Apply the same dependency check in checkModifyGeneratedColumn and any other ALTER path that can rewrite generated expressions, and add coverage for altering a generated column to reference an existing auto-embedding column.
There was a problem hiding this comment.
Addressed. The MODIFY generated-column path now applies the same dependency check and rejects generated columns built on top of auto-embedding columns.
| } | ||
|
|
||
| // Embed generates embeddings for the given text. It handles with cache and batching. | ||
| func (e *EmbedFn) Embed(shouldCancel func() bool, modelWithProvider string, text string, opts map[string]any) ([]float32, error) { |
There was a problem hiding this comment.
⚠️ [Major] EMBED_TEXT drops statement cancellation and timeout propagation once the remote call starts
Why
The new embedding path replaces the statement context with context.Background() and only polls SQLKiller, so request deadlines, executor cancellation, and parent-context shutdown no longer abort in-flight provider calls.
Scope
pkg/inference/sqlembed.go:162; pkg/executor/insert_common.go:700-740
Risk if unchanged
Canceled inserts and queries can keep waiting on remote embedding requests until the provider or HTTP client times out, tying up executor goroutines and slowing overload recovery under user cancels or statement timeouts.
Evidence
In Embed, the provider context is created with context.WithCancel(context.Background()) at line 163, and the only cancellation bridge is the 1-second shouldCancel poll at lines 168-182. The insert path does create egCtx := errgroup.WithContext(ctx), but the actual remote call ignores that context and calls embedFn.Embed(...) with only a SQLKiller callback at lines 723-740.
Change request
Use the caller context as the root through Embed and the provider path so EMBED_TEXT stops promptly on ctx.Done(), keeping the kill-signal poll only as a fallback when no real context is available.
There was a problem hiding this comment.
Addressed. The embedding path now uses EmbedWithContext; direct evaluation and INSERT auto-embedding pass the caller/executor context through to the provider call, with SQLKiller kept as an additional fallback.
| ErrMissingAPIKey: fmt.Errorf(errMissingAPI, "Gemini", strings.ToUpper(vardef.TiDBExpEmbedGeminiAPIKey)), | ||
| // ErrUnauthorized is not provided in gemini. The error message provided by Gemini API is sufficient enough. | ||
| })) | ||
| if isHostedEmbeddingEnabled() { |
There was a problem hiding this comment.
⚠️ [Major] Hosted embedding API-key file is read once at startup and never refreshed
Why
The hosted-embedding provider captures the file contents into a local apiKey variable during NewEmbedFn, so missing credentials at boot or later key rotation cannot recover without rebuilding the whole EmbedFn.
Scope
pkg/inference/sqlembed.go:106-126; pkg/domain/inference.go:19-31
Risk if unchanged
A node that starts before the secret is mounted, or a cluster that rotates the bearer token in place, will keep sending an empty or stale credential and permanently fail EMBED_TEXT requests until restart.
Evidence
Lines 107-116 read HostedEmbedding.APIKeyPath once and store the result in a local string. The registered provider then uses GetAPIKey: func() string { return apiKey } at line 125, while the domain constructs the EmbedFn once in initInferenceProviders() and only closes it on domain shutdown.
Change request
Fetch the API key on demand instead of capturing a startup snapshot, or at least re-read the configured file behind a small refresh cache so secret-mount races and key rotation do not require restart.
There was a problem hiding this comment.
Addressed. Hosted embedding now reads the API key on demand via getHostedEmbeddingAPIKey instead of capturing it once during NewEmbedFn, so secret updates can be picked up without rebuilding EmbedFn.
| // See the License for the specific language governing permissions and | ||
| // limitations under the License. | ||
|
|
||
| // Package domainadaptor provides access to Domain-owned inference resources |
There was a problem hiding this comment.
⚠️ [Major] domainadaptor.GetEmbedFn hides a global-default fallback behind a domain-scoped name
Why
The new adaptor API reads like a pure domain lookup, but the implementation also returns the process-wide default embed function when the session is nil, lacks GetDomain, or exposes a nil domain embedder. That gives the same name two meanings and makes the lifecycle and scope contract hard to read at call sites.
Scope
pkg/inference/domainadaptor/adaptor.go:15, pkg/inference/domainadaptor/adaptor.go:32
Risk if unchanged
Future callers can reasonably assume they are always getting a domain-owned resource and then accidentally couple tests or sessions through the global singleton fallback.
Evidence
The package header says this package provides access to domain-owned inference resources, and GetEmbedFn says it returns the embed bridge associated with the session's domain. The body instead returns inference.DefaultEmbedFn() on three non-domain paths at lines 35, 39, and 45.
Change request
Expose the fallback semantics in the name or comment, for example ResolveEmbedFn or GetOrDefaultEmbedFn, or add a short why-comment that this helper falls back to the process-global default when no domain-owned embedder is available.
There was a problem hiding this comment.
Thanks. I kept the small Domain.GetEmbedFn() accessor for now. The bridge is intentionally isolated in domainadaptor to avoid an import cycle and keep expression/inference code from importing domain. Moving this into a different session/domain capability interface would be a broader API change; for this PR I kept the minimal accessor and documented the fallback semantics in domainadaptor.GetEmbedFn.
| return defaultVal, nil | ||
| } | ||
|
|
||
| func (e *InsertValues) fillAutoEmbeddingDatum(ctx context.Context, rows [][]types.Datum) ([][]types.Datum, error) { |
There was a problem hiding this comment.
⚠️ [Major] INSERT-side auto-embedding reimplements the EMBED_TEXT runtime contract
Why
The new insert path now owns starter-mode gating, provider lookup, SQL-kill cancellation, remote embed invocation, and vector conversion in a second place instead of reusing one canonical evaluator.
Scope
pkg/executor/insert_common.go:634, pkg/expression/builtin_inference.go:95
Risk if unchanged
Future changes to EMBED_TEXT semantics can land in only one path, so direct expression evaluation and generated-column INSERT evaluation can drift apart on cancellation, provider selection, null handling, or vector materialization behavior.
Evidence
fillAutoEmbeddingDatum batches work, but the core EMBED_TEXT execution block at pkg/executor/insert_common.go:695-750 repeats the same runtime steps already implemented in BuiltinEmbedTextSig.evalVectorFloat32 at pkg/expression/builtin_inference.go:101-121; both paths call EvalEmbedTextArgs, enforce starter mode, fetch domainadaptor.GetEmbedFn(...), invoke .Embed(...), and convert the returned slice with types.CreateVectorFloat32.
Change request
Keep one canonical definition for EMBED_TEXT execution and let the executor layer handle only batching and orchestration. Extract a shared helper in the expression or inference layer for evaluating prepared embed args to a vector datum, and have both direct expression evaluation and generated-column INSERT reuse it.
| // VirtualExpr is used to save expression for virtual column | ||
| VirtualExpr Expression | ||
|
|
||
| // GeneratedExprString stores the generated expression for planner rewrites that need the original AST. |
There was a problem hiding this comment.
🟡 [Minor] GeneratedExprString comment describes an AST contract even though the field stores SQL text
Why
The comment says the field keeps the original AST, but the type is string and downstream code reparses it before rewrites. That is a direct comment-to-code mismatch on a planner-facing field.
Scope
pkg/expression/column.go:297
Risk if unchanged
Readers may treat this field as an AST-preserving representation and make incorrect assumptions about fidelity, mutability, or reparsing costs in later planner work.
Evidence
The added field is GeneratedExprString string, while nearby consumers call generatedexpr.ParseExpression(col.GeneratedExprString) in pkg/expression/expression.go:1132 and pkg/planner/core/expression_rewriter.go:3040.
Change request
Update the comment to match the real contract, for example that this stores generated-expression text for later parse-and-rewrite flows.
| ) | ||
|
|
||
| const ( | ||
| autoEmbeddingEvalBatchSizeHint = 16 |
There was a problem hiding this comment.
🟡 [Minor] Auto-embedding insert helper names imply batching and singular datum handling that the code does not do
Why
The helper processes many generated-column cells across rows, yet it is named fillAutoEmbeddingDatum, and the concurrency limit is expressed as autoEmbeddingEvalBatchSizeHint * autoEmbeddingEvalMaxConcurrentBatches even though the implementation launches one goroutine per task rather than batching work.
Scope
pkg/executor/insert_common.go:57, pkg/executor/insert_common.go:634
Risk if unchanged
Tuning or extending this path becomes harder because maintainers have to reverse-engineer whether the knobs are about row batches, task concurrency, or generated-column fanout.
Evidence
The function signature is fillAutoEmbeddingDatum(ctx context.Context, rows [][]types.Datum), it builds a tasks slice over every auto-embedding generated column in every row, and line 701 applies the limit directly to errgroup task concurrency rather than to explicit batches.
Change request
Prefer names that match the behavior, such as fillAutoEmbeddingDatums or populateAutoEmbeddingGeneratedColumns, and rename the limit constants to reflect task concurrency unless real batching is introduced.
d3f1ba8 to
1d43e61
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
pkg/inference/embedding/cohere/cohere_test.go (1)
123-145: ⚡ Quick winUnused mock server in this test.
When
GetAPIKey()returns an empty string,CreateEmbeddingsreturns early (before constructing the HTTP request) per the logic incohere.go. The mock server at lines 127-132 is never called. Consider removing the server setup to clarify that no HTTP request is made in this case.♻️ Simplified test without unused mock server
func TestCohereEmbedder_NoAPIKey(t *testing.T) { - // Mock no API key response from real Cohere API - mockResponse := `{"id":"b6a8a658-261e-4fc2-b44d-0ee3fefa145e","message":"no api key supplied"}` - - server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { - w.Header().Set("Content-Type", "application/json") - w.WriteHeader(http.StatusUnauthorized) - w.Write([]byte(mockResponse)) - })) - defer server.Close() - embedder := NewCohereEmbedder(EmbedderConfig{ GetAPIKey: func() string { return "" }, - GetBaseURL: func() string { return server.URL }, }) texts := []string{"hello world"}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/inference/embedding/cohere/cohere_test.go` around lines 123 - 145, The test TestCohereEmbedder_NoAPIKey sets up an httptest.Server that is never used because NewCohereEmbedder's CreateEmbeddings returns early when GetAPIKey() is empty; remove the unused mock server setup (the httptest.NewServer handler and its defer server.Close()) from the test to make it clear no HTTP call is made, keeping the rest of the test that constructs the embedder via NewCohereEmbedder(EmbedderConfig{GetAPIKey: func() string { return "" }, GetBaseURL: func() string { return server.URL },}) and asserts embeddings is nil and an error containing "API key is not configured for cohere".
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/ddl/generated_column.go`:
- Around line 455-469: The checkGeneratedColForAutoEmbedding function currently
validates AST/versions but doesn't enforce the deployment-mode gate used at
runtime; call the same gate (the CheckEmbedTextAllowed function in
pkg/expression/builtin_inference.go) from checkGeneratedColForAutoEmbedding and
return its error when it disallows EMBED_TEXT, so CREATE/ALTER fails during DDL
on non-Starter deployments; place this call early (before accepting the column)
so the unsupported schema isn't persisted.
In `@pkg/executor/insert_common.go`:
- Around line 705-721: The loop that schedules auto-embedding tasks iterates all
entries in rows including nil sentinels from LOAD DATA, causing skipped records
to be re-evaluated; update the outer loop in the insert/common code (the for
rowIdx, row := range rows loop) to skip nil rows immediately (e.g., if row ==
nil { continue }) before computing gIdx or appending autoEmbeddingEvalTask so
that no tasks are enqueued for sentinel nil rows; keep the existing gIdx logic
and e.GenExprs usage unchanged.
---
Nitpick comments:
In `@pkg/inference/embedding/cohere/cohere_test.go`:
- Around line 123-145: The test TestCohereEmbedder_NoAPIKey sets up an
httptest.Server that is never used because NewCohereEmbedder's CreateEmbeddings
returns early when GetAPIKey() is empty; remove the unused mock server setup
(the httptest.NewServer handler and its defer server.Close()) from the test to
make it clear no HTTP call is made, keeping the rest of the test that constructs
the embedder via NewCohereEmbedder(EmbedderConfig{GetAPIKey: func() string {
return "" }, GetBaseURL: func() string { return server.URL },}) and asserts
embeddings is nil and an error containing "API key is not configured for
cohere".
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: a77e591f-3554-42bd-9b9c-c40eccc1d188
📒 Files selected for processing (80)
pkg/config/config.gopkg/config/config.toml.nextgen.examplepkg/config/config_test.gopkg/ddl/BUILD.bazelpkg/ddl/add_column.gopkg/ddl/create_table.gopkg/ddl/generated_column.gopkg/ddl/generated_column_internal_test.gopkg/domain/BUILD.bazelpkg/domain/domain.gopkg/domain/inference.gopkg/executor/insert_common.gopkg/executor/load_data.gopkg/expression/BUILD.bazelpkg/expression/builtin.gopkg/expression/builtin_inference.gopkg/expression/builtin_inference_test.gopkg/expression/builtin_threadunsafe_generated.gopkg/expression/column.gopkg/expression/expression.gopkg/expression/function_traits.gopkg/expression/function_traits_test.gopkg/expression/generator/builtin_threadsafe.gopkg/expression/inference_helper.gopkg/expression/inference_helper_test.gopkg/expression/integration_test/BUILD.bazelpkg/expression/integration_test/integration_test.gopkg/inference/BUILD.bazelpkg/inference/domainadaptor/BUILD.bazelpkg/inference/domainadaptor/adaptor.gopkg/inference/embedding/base/BUILD.bazelpkg/inference/embedding/base/base.gopkg/inference/embedding/base/base_test.gopkg/inference/embedding/batcher/BUILD.bazelpkg/inference/embedding/batcher/batcher.gopkg/inference/embedding/batcher/batcher_test.gopkg/inference/embedding/cohere/BUILD.bazelpkg/inference/embedding/cohere/cohere.gopkg/inference/embedding/cohere/cohere_test.gopkg/inference/embedding/cohere/protocol.gopkg/inference/embedding/gemini/BUILD.bazelpkg/inference/embedding/gemini/gemini.gopkg/inference/embedding/gemini/gemini_test.gopkg/inference/embedding/gemini/protocol.gopkg/inference/embedding/huggingface/BUILD.bazelpkg/inference/embedding/huggingface/huggingface.gopkg/inference/embedding/huggingface/huggingface_test.gopkg/inference/embedding/huggingface/protocol.gopkg/inference/embedding/jina/BUILD.bazelpkg/inference/embedding/jina/jina.gopkg/inference/embedding/jina/jina_test.gopkg/inference/embedding/jina/protocol.gopkg/inference/embedding/mock/BUILD.bazelpkg/inference/embedding/mock/mock.gopkg/inference/embedding/mock/mock_test.gopkg/inference/embedding/nvidia/BUILD.bazelpkg/inference/embedding/nvidia/nvidia.gopkg/inference/embedding/nvidia/nvidia_test.gopkg/inference/embedding/nvidia/protocol.gopkg/inference/embedding/openai/BUILD.bazelpkg/inference/embedding/openai/openai.gopkg/inference/embedding/openai/openai_test.gopkg/inference/embedding/openai/protocol.gopkg/inference/embedding/tidbcloud/BUILD.bazelpkg/inference/embedding/tidbcloud/protocol.gopkg/inference/embedding/tidbcloud/tidbcloud_free.gopkg/inference/embedding/tidbcloud/tidbcloud_free_test.gopkg/inference/manager_test.gopkg/inference/sqlembed.gopkg/parser/ast/functions.gopkg/planner/core/BUILD.bazelpkg/planner/core/expression_rewriter.gopkg/planner/core/logical_plan_builder.gopkg/planner/util/null_misc_test.gopkg/sessionctx/vardef/tidb_vars.gopkg/sessionctx/variable/BUILD.bazelpkg/sessionctx/variable/embedding_vars.gopkg/sessionctx/variable/embedding_vars_test.gopkg/sessionctx/variable/sysvar.gotests/integrationtest/r/executor/show.result
✅ Files skipped from review due to trivial changes (9)
- pkg/inference/embedding/cohere/protocol.go
- pkg/planner/util/null_misc_test.go
- pkg/parser/ast/functions.go
- tests/integrationtest/r/executor/show.result
- pkg/inference/embedding/tidbcloud/protocol.go
- pkg/inference/embedding/openai/BUILD.bazel
- pkg/sessionctx/variable/BUILD.bazel
- pkg/inference/embedding/huggingface/huggingface.go
- pkg/inference/embedding/gemini/BUILD.bazel
🚧 Files skipped from review as they are similar to previous changes (63)
- pkg/inference/embedding/huggingface/protocol.go
- pkg/expression/builtin_threadunsafe_generated.go
- pkg/planner/core/BUILD.bazel
- pkg/config/config.toml.nextgen.example
- pkg/domain/BUILD.bazel
- pkg/inference/embedding/batcher/BUILD.bazel
- pkg/inference/embedding/openai/protocol.go
- pkg/inference/domainadaptor/BUILD.bazel
- pkg/inference/embedding/jina/BUILD.bazel
- pkg/inference/embedding/base/BUILD.bazel
- pkg/expression/builtin.go
- pkg/inference/embedding/nvidia/BUILD.bazel
- pkg/inference/embedding/mock/mock.go
- pkg/domain/inference.go
- pkg/expression/function_traits_test.go
- pkg/inference/embedding/huggingface/BUILD.bazel
- pkg/inference/embedding/jina/protocol.go
- pkg/expression/expression.go
- pkg/inference/manager_test.go
- pkg/inference/domainadaptor/adaptor.go
- pkg/inference/embedding/gemini/protocol.go
- pkg/inference/embedding/tidbcloud/BUILD.bazel
- pkg/expression/BUILD.bazel
- pkg/inference/BUILD.bazel
- pkg/domain/domain.go
- pkg/sessionctx/variable/embedding_vars_test.go
- pkg/sessionctx/variable/embedding_vars.go
- pkg/expression/function_traits.go
- pkg/expression/generator/builtin_threadsafe.go
- pkg/planner/core/logical_plan_builder.go
- pkg/expression/integration_test/BUILD.bazel
- pkg/inference/embedding/base/base_test.go
- pkg/sessionctx/vardef/tidb_vars.go
- pkg/ddl/add_column.go
- pkg/inference/embedding/mock/BUILD.bazel
- pkg/inference/embedding/nvidia/nvidia_test.go
- pkg/sessionctx/variable/sysvar.go
- pkg/inference/embedding/openai/openai.go
- pkg/inference/embedding/tidbcloud/tidbcloud_free.go
- pkg/expression/builtin_inference_test.go
- pkg/inference/embedding/nvidia/protocol.go
- pkg/inference/embedding/gemini/gemini.go
- pkg/expression/column.go
- pkg/inference/embedding/cohere/cohere.go
- pkg/expression/inference_helper_test.go
- pkg/inference/embedding/mock/mock_test.go
- pkg/inference/embedding/tidbcloud/tidbcloud_free_test.go
- pkg/inference/embedding/base/base.go
- pkg/inference/embedding/jina/jina_test.go
- pkg/inference/embedding/openai/openai_test.go
- pkg/config/config.go
- pkg/inference/embedding/huggingface/huggingface_test.go
- pkg/expression/inference_helper.go
- pkg/inference/embedding/batcher/batcher.go
- pkg/expression/integration_test/integration_test.go
- pkg/inference/embedding/cohere/BUILD.bazel
- pkg/planner/core/expression_rewriter.go
- pkg/inference/embedding/nvidia/nvidia.go
- pkg/inference/embedding/gemini/gemini_test.go
- pkg/inference/embedding/batcher/batcher_test.go
- pkg/inference/sqlembed.go
- pkg/config/config_test.go
- pkg/inference/embedding/jina/jina.go
|
/retest |
|
@ChangRui-Ryan: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
| return checkAutoEmbeddingGeneratedColumnServerVersionsCompatible(versions) | ||
| } | ||
|
|
||
| func checkAutoEmbeddingGeneratedColumnServerVersionsCompatible(versions []string) error { |
There was a problem hiding this comment.
no need to check this as its the first version to introduce it in open source tidb? seems AI is over considering the compatibility issue
There was a problem hiding this comment.
Agreed, the name was misleading. I renamed manager_test.go to sqlembed_test.go since these tests cover the SQL embedding helper / EmbedFn behavior rather than a manager component.
|
/retest |
|
@ChangRui-Ryan: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest |
|
@ChangRui-Ryan: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
| break | ||
| } | ||
| } | ||
| if !hasAutoEmbeddingExpr { |
There was a problem hiding this comment.
It looks to me that hasAutoEmbeddingExpr can be decided during compile time or open(), so there is no need to re-check it for every call of fillAutoEmbeddingDatumsWithRowCount
| continue | ||
| } | ||
| gIdx++ | ||
| if !expression.IsAutoEmbedFnCallAST(gCol.GeneratedExpr.Internal()) { |
There was a problem hiding this comment.
The same as above, it looks to me we can pre-handle these information so during this row-wise loop, we can just construct the autoEmbeddingEvalTask
| results[i].err = fmt.Errorf("auto-embedding generated column expects EMBED_TEXT()") | ||
| continue | ||
| } | ||
| embedArgs, isNull, err := expression.EvalEmbedTextArgs(e.Ctx().GetExprCtx().GetEvalCtx(), chunk.MutRowFromDatums(task.row).ToRow(), sf.GetArgs(), embedSig.IsFromVecSearch) |
There was a problem hiding this comment.
chunk.MutRowFromDatums is very low-efficient, I think we should implement some caching to optimize this logic
| // NOTE: please make sure there are test cases for all functions here. | ||
| } | ||
| specialUnsafeFuncs = map[string]struct{}{ | ||
| "BuiltinEmbedTextSig": {}, |
There was a problem hiding this comment.
Why the name is BuiltinEmbedTextSig, not builtinEmbedTextSig?
|
/retest |
What problem does this PR solve?
Issue Number: ref #67765
Problem Summary:
TiDB Cloud Starter needs SQL Auto Embedding support in the upstream TiDB kernel so that Starter can use an upstream-built TiDB binary while still providing the documented auto-embedding experience.
Before this PR, upstream TiDB did not have a complete SQL-side embedding path. Users could not call
EMBED_TEXT()from SQL to obtain vector embeddings, could not define stored generated vector columns backed by embedding models, and could not use text-query vector distance helper functions such asVEC_EMBED_L2_DISTANCE()to automatically embed query text during vector search.The feature also needs to be deployment-aware. Auto Embedding is intended to be available only in Starter deployment mode. Non-Starter deployments should not accidentally invoke external embedding services or hosted TiDB Cloud embedding infrastructure through
EMBED_TEXT().What changed and how does it work?
This PR adds full SQL Auto Embedding support to the upstream TiDB kernel.
At the SQL layer, it introduces the
EMBED_TEXT(model, text[, options])builtin, which returns aVECTOR<FLOAT32>embedding. The function supports provider-qualified model names and JSON options, and it is treated as a non-foldable, mutable-effect function because it may call external inference services and depends on runtime configuration.At the inference layer, this PR adds a shared embedding framework under
pkg/inference, including provider implementations for OpenAI-compatible endpoints, Jina AI, Cohere, Gemini, Hugging Face, NVIDIA, and TiDB Cloud hosted embedding. It also adds batching, request cancellation, per-session/global configuration integration, API key masking, endpoint validation, and test mock providers.For auto-embedding generated columns, this PR allows
EMBED_TEXT()to be used as the top-level expression of a stored generated vector column. The DDL validation enforces the supported shape: the generated column must be stored, the model/options arguments must be constants, and dependent generated columns on auto-embedding columns are rejected. During DML, TiDB evaluates the embedding expression and materializes the generated vector value on insert/update.For vector search, this PR adds planner rewrite support for
VEC_EMBED_*_DISTANCE()helper functions. When the vector column is backed by an auto-embedding generated column, TiDB rewrites text-query distance functions into the corresponding vector distance function with anEMBED_TEXT()call on the query text. This lets users query by text without manually embedding the query first.The embedding function is managed through the TiDB domain so all sessions can share the same embedding manager and provider configuration. A small domain adaptor is added to avoid package dependency cycles between expression evaluation and domain initialization.
Finally, this PR gates runtime use of
EMBED_TEXT()by deployment mode.EMBED_TEXT()now checksdeploymode.IsStarter()before evaluating arguments or calling any provider. If the current TiDB instance is not running in Starter deployment mode, the function returns an error instead of invoking an embedding provider.Check List
Tests
Side effects
Documentation
Local E2E validation
I also verified the SQL auto-embedding path with a local TiUP playground cluster.
Environment
Validated SQL flow
EMBED_TEXT()execution against the real remote endpoint:The query successfully returned a vector embedding from the real Aliyun endpoint, confirming that the SQL function can call an external OpenAI-compatible provider end-to-end.
The generated vector column was populated during write, confirming that the auto-embedding path works for stored generated columns.
The query executed successfully, confirming that
VEC_EMBED_*_DISTANCE()can reuse theEMBED_TEXT()metadata from the generated vector column and rewrite to the corresponding vector distance function.In a non-starter/classic local cluster, this is rejected with the expected deployment-mode error. In starter mode, the same query works when the embedding endpoint and API key are configured.
Result
EMBED_TEXT()works against a real Aliyun DashScope OpenAI-compatible endpoint.EMBED_TEXT()are populated successfully on insert.VEC_EMBED_*_DISTANCE()works with auto-embedding generated vector columns.Release note
Please refer to Release Notes Language Style Guide to write a quality release note.
Summary by CodeRabbit
EMBED_TEXT()function for generating vector embeddings from text.VEC_EMBED_L1_DISTANCE,VEC_EMBED_L2_DISTANCE,VEC_EMBED_COSINE_DISTANCE,VEC_EMBED_NEGATIVE_INNER_PRODUCT) for similarity search.LOAD DATAoperations.