Skip to content

expression: add full SQL auto embedding support for starter mode#69014

Open
ChangRui-Ryan wants to merge 11 commits into
pingcap:masterfrom
ChangRui-Ryan:changrui_cse_embed-text
Open

expression: add full SQL auto embedding support for starter mode#69014
ChangRui-Ryan wants to merge 11 commits into
pingcap:masterfrom
ChangRui-Ryan:changrui_cse_embed-text

Conversation

@ChangRui-Ryan

@ChangRui-Ryan ChangRui-Ryan commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: ref #67765

Problem Summary:
TiDB Cloud Starter needs SQL Auto Embedding support in the upstream TiDB kernel so that Starter can use an upstream-built TiDB binary while still providing the documented auto-embedding experience.

Before this PR, upstream TiDB did not have a complete SQL-side embedding path. Users could not call EMBED_TEXT() from SQL to obtain vector embeddings, could not define stored generated vector columns backed by embedding models, and could not use text-query vector distance helper functions such as VEC_EMBED_L2_DISTANCE() to automatically embed query text during vector search.

The feature also needs to be deployment-aware. Auto Embedding is intended to be available only in Starter deployment mode. Non-Starter deployments should not accidentally invoke external embedding services or hosted TiDB Cloud embedding infrastructure through EMBED_TEXT().

What changed and how does it work?

This PR adds full SQL Auto Embedding support to the upstream TiDB kernel.

At the SQL layer, it introduces the EMBED_TEXT(model, text[, options]) builtin, which returns a VECTOR<FLOAT32> embedding. The function supports provider-qualified model names and JSON options, and it is treated as a non-foldable, mutable-effect function because it may call external inference services and depends on runtime configuration.

At the inference layer, this PR adds a shared embedding framework under pkg/inference, including provider implementations for OpenAI-compatible endpoints, Jina AI, Cohere, Gemini, Hugging Face, NVIDIA, and TiDB Cloud hosted embedding. It also adds batching, request cancellation, per-session/global configuration integration, API key masking, endpoint validation, and test mock providers.

For auto-embedding generated columns, this PR allows EMBED_TEXT() to be used as the top-level expression of a stored generated vector column. The DDL validation enforces the supported shape: the generated column must be stored, the model/options arguments must be constants, and dependent generated columns on auto-embedding columns are rejected. During DML, TiDB evaluates the embedding expression and materializes the generated vector value on insert/update.

For vector search, this PR adds planner rewrite support for VEC_EMBED_*_DISTANCE() helper functions. When the vector column is backed by an auto-embedding generated column, TiDB rewrites text-query distance functions into the corresponding vector distance function with an EMBED_TEXT() call on the query text. This lets users query by text without manually embedding the query first.

The embedding function is managed through the TiDB domain so all sessions can share the same embedding manager and provider configuration. A small domain adaptor is added to avoid package dependency cycles between expression evaluation and domain initialization.

Finally, this PR gates runtime use of EMBED_TEXT() by deployment mode. EMBED_TEXT() now checks deploymode.IsStarter() before evaluating arguments or calling any provider. If the current TiDB instance is not running in Starter deployment mode, the function returns an error instead of invoking an embedding provider.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)Local E2E validation
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Local E2E validation

I also verified the SQL auto-embedding path with a local TiUP playground cluster.

Environment

  • Built the TiDB server binary from this PR branch.
  • Started a local starter-mode / next-gen TiUP playground cluster with 1 TiDB, 1 PD, 1 TiKV/CSE node, and TiFlash compute enabled.
  • Used a real OpenAI-compatible Aliyun DashScope embedding endpoint with a real API key. The API key is intentionally omitted from the PR.

Validated SQL flow

  1. Configured the OpenAI-compatible embedding endpoint and API key:
SET GLOBAL tidb_embedding_api_base_url = 'https://dashscope.aliyuncs.com/compatible-mode/v1';
SET GLOBAL tidb_embedding_api_key = '<redacted>';
  1. Verified direct EMBED_TEXT() execution against the real remote endpoint:
SELECT VEC_DIMS(EMBED_TEXT('openai/text-embedding-v4', 'hello world'));
SELECT VEC_AS_TEXT(EMBED_TEXT('openai/text-embedding-v4', 'hello world'));

The query successfully returned a vector embedding from the real Aliyun endpoint, confirming that the SQL function can call an external OpenAI-compatible provider end-to-end.

  1. Verified auto-embedding generated column behavior:
CREATE TABLE t_auto_embed (
  id BIGINT PRIMARY KEY,
  text_content TEXT,
  embedding VECTOR GENERATED ALWAYS AS
    (EMBED_TEXT('openai/text-embedding-v4', text_content)) STORED
);

INSERT INTO t_auto_embed VALUES
  (1, 'hello world'),
  (2, 'TiDB vector search');

SELECT id, VEC_DIMS(embedding) FROM t_auto_embed ORDER BY id;

The generated vector column was populated during write, confirming that the auto-embedding path works for stored generated columns.

  1. Verified vector search helper rewrite:
SELECT id
FROM t_auto_embed
ORDER BY VEC_EMBED_COSINE_DISTANCE(embedding, 'hello world')
LIMIT 1;

The query executed successfully, confirming that VEC_EMBED_*_DISTANCE() can reuse the EMBED_TEXT() metadata from the generated vector column and rewrite to the corresponding vector distance function.

  1. Verified deployment-mode gating:
SELECT EMBED_TEXT('openai/text-embedding-v4', 'hello world');

In a non-starter/classic local cluster, this is rejected with the expected deployment-mode error. In starter mode, the same query works when the embedding endpoint and API key are configured.

Result

  • Direct EMBED_TEXT() works against a real Aliyun DashScope OpenAI-compatible endpoint.
  • Stored generated columns using EMBED_TEXT() are populated successfully on insert.
  • VEC_EMBED_*_DISTANCE() works with auto-embedding generated vector columns.
  • The feature is correctly gated to starter deployment mode.

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Summary by CodeRabbit

  • New Features
    • Added EMBED_TEXT() function for generating vector embeddings from text.
    • Added vector embedding distance functions (VEC_EMBED_L1_DISTANCE, VEC_EMBED_L2_DISTANCE, VEC_EMBED_COSINE_DISTANCE, VEC_EMBED_NEGATIVE_INNER_PRODUCT) for similarity search.
    • Added support for auto-embedding generated vector columns in tables.
    • Integrated multiple embedding providers: OpenAI, Cohere, Gemini, HuggingFace, Jina, NVIDIA, and TiDB Cloud.
    • Added embedding support in LOAD DATA operations.
    • Hosted embedding configuration available in starter deployment mode.

@ti-chi-bot ti-chi-bot Bot added do-not-merge/needs-tests-checked release-note-none Denotes a PR that doesn't merit a release note. labels Jun 8, 2026
@pantheon-ai

pantheon-ai Bot commented Jun 8, 2026

Copy link
Copy Markdown

@ChangRui-Ryan I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details.

⏳ This process typically takes 10-30 minutes depending on the complexity of the changes.

ℹ️ Learn more details on Pantheon AI.

@ti-chi-bot ti-chi-bot Bot added sig/planner SIG: Planner size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed do-not-merge/needs-tests-checked labels Jun 8, 2026
@tiprow

tiprow Bot commented Jun 8, 2026

Copy link
Copy Markdown

Hi @ChangRui-Ryan. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds hosted embedding configuration, embedding provider/runtime wiring, EMBED_TEXT evaluation, planner and DDL support for auto-embedding, executor population paths, and updated tests/build metadata.

Changes

Embedding capability rollout

Layer / File(s) Summary
Hosted embedding config gate
pkg/config/config.go, pkg/config/config.toml.nextgen.example, pkg/config/config_test.go
Adds the hosted-embedding config section, default values, deploy-mode validation, and example/test coverage.
Inference runtime and sysvars
pkg/domain/..., pkg/inference/..., pkg/sessionctx/vardef/tidb_vars.go, pkg/sessionctx/variable/...
Wires domain-owned embed functions, provider discovery, embedding sysvars, and OpenAI base-URL normalization/masking.
EMBED_TEXT builtin and AST helpers
pkg/expression/..., pkg/parser/ast/functions.go
Registers EMBED_TEXT, adds AST extraction helpers, builtin traits, generated-column metadata propagation, and tests.
Planner rewrite and generated columns
pkg/ddl/..., pkg/planner/core/...
Adds generated-column validation for auto-embedding and rewrites vector-distance calls using stored generated expression metadata.
Executor auto-embedding
pkg/executor/..., pkg/expression/integration_test/..., pkg/ddl/generated_column_internal_test.go
Populates auto-embedding generated columns during insert/LOAD DATA and adds integration/unit coverage.
Provider implementations and tests
pkg/inference/embedding/..., pkg/inference/manager_test.go, pkg/inference/sqlembed.go
Adds concrete embedding providers, batching, mock support, and runtime tests across all backends.
Build wiring and expected outputs
pkg/*/BUILD.bazel, pkg/planner/util/null_misc_test.go, tests/integrationtest/r/executor/show.result
Updates build targets, snapshot hash, and expected builtin listings.

Sequence Diagram(s)

sequenceDiagram
  participant SQL as EMBED_TEXT / vec_embed_*_distance
  participant Expression as expression.rewriter
  participant Domain as Domain.GetEmbedFn
  participant EmbedFn as inference.EmbedFn
  participant Provider as embedding backend
  SQL->>Expression: build or rewrite expression
  Expression->>Domain: resolve embed function
  Domain->>EmbedFn: return managed embed function
  EmbedFn->>Provider: CreateEmbeddings(model,texts,opts)
  Provider-->>EmbedFn: embeddings or error
  EmbedFn-->>Expression: vector datum
  Expression-->>SQL: rewritten or evaluated result
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~90+ minutes

Suggested labels

ok-to-test, lgtm, release-note

Suggested reviewers

  • yudongusa
  • tiancaiamao
  • D3Hunter
  • YuJuncen

Poem

🐇 I hop through embeddings, quick and light,
New vectors bloom in the SQL night.
With configs, rewrites, and caches bright,
The rabbit grins: “This PR feels right!”

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 19

🧹 Nitpick comments (12)
pkg/planner/core/expression_rewriter.go (1)

3027-3031: ⚡ Quick win

Improve error message robustness by handling empty column name.

Line 3029 uses vecArg.OrigName in the error message, which may be empty. Consider falling back to vecArg.String() or a placeholder to ensure the error message is always informative.

♻️ Proposed improvement
+	colName := vecArg.OrigName
+	if colName == "" {
+		colName = vecArg.String()
+	}
 	vecArg, ok := args[0].(*expression.Column)
 	if !ok || vecArg.GeneratedExprString == "" || !vecArg.RetType.EvalType().IsVectorKind() {
-		er.err = fmt.Errorf("%s() first argument must be a vector embedding column generated by EMBED_TEXT()", v.FnName.O)
+		er.err = fmt.Errorf("%s() first argument must be a vector embedding column generated by EMBED_TEXT(), got %s", v.FnName.O, colName)
 		return true
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/planner/core/expression_rewriter.go` around lines 3027 - 3031, The error
message for the vector argument in expression_rewriter.go can be empty because
vecArg.OrigName may be empty; update the check around the vecArg handling in the
block that sets er.err (where vecArg is derived from args[0] and v.FnName.O is
used) to build a robust name: if vecArg.OrigName is empty, use vecArg.String()
or a constant placeholder like "<unknown column>" when formatting the error text
so the message is always informative and never blank.
pkg/inference/embedding/base/BUILD.bazel (1)

15-15: 💤 Low value

Reconsider the flaky marking for deterministic tests.

The base_test is marked as flaky = True, but reviewing base_test.go shows it only performs deterministic base64 decoding and validation tests without concurrency or timing dependencies. Unless there's a known flakiness issue, consider removing this marking to ensure test failures are investigated rather than automatically retried.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/inference/embedding/base/BUILD.bazel` at line 15, The test target
"base_test" in the BUILD rule is marked flaky = True even though base_test only
runs deterministic base64 decode/validation; remove the flaky = True attribute
(or set flaky = False) from the base_test rule so failures surface for
investigation, then run the test suite locally to confirm no flakiness before
committing.
pkg/inference/embedding/batcher/batcher.go (1)

244-260: ⚡ Quick win

Document the batch-level context cancellation semantics.

The batch request context (reqCtx) only cancels when all individual request contexts are done. This means if requests with different timeouts are batched together, a request with a short timeout may wait longer than expected (bounded by batchWindow). While this is correct batching behavior, consider adding a comment explaining this trade-off for future maintainers.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/inference/embedding/batcher/batcher.go` around lines 244 - 260, Add a
clear comment above the reqCtx/cancelReq creation and goroutine explaining that
reqCtx only cancels after all individual request contexts in thisBatch.calls are
done (the goroutine loops over call.ctx and only calls cancelReq after every
call is done), so batching can cause a request with a short timeout to wait
longer (up to batchWindow) before its batch-level context cancels; state this is
intentional for batching and note the trade-off for maintainers and any
expectations around embedder.CreateEmbeddings using reqCtx.
pkg/inference/embedding/huggingface/BUILD.bazel (2)

23-23: ⚡ Quick win

Investigate why tests are marked flaky.

The flaky = True attribute suggests these tests have non-deterministic failures. Consider investigating and fixing the root cause rather than marking tests as flaky, as this can mask real issues and reduce test signal quality.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/inference/embedding/huggingface/BUILD.bazel` at line 23, The BUILD.bazel
target currently sets flaky = True, which masks nondeterministic test failures;
remove or revert the flaky = True attribute on the target and instead reproduce
the intermittent failures by running the corresponding Bazel test target(s)
repeatedly (e.g., bazel test --runs_per_test=50), add deterministic fixes
(stabilize timers, seed randomness, isolate shared state, add proper
setup/teardown) in the test code, and only reintroduce flaky metadata after root
causes are identified; look for references to the BUILD target that contains
flaky = True to locate the tests to run and stabilize.

18-26: 💤 Low value

Inconsistent shard_count across provider tests.

This test uses shard_count = 10, while the Gemini tests use shard_count = 7. Consider establishing a consistent sharding strategy across similar test suites, or document why different providers need different shard counts.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/inference/embedding/huggingface/BUILD.bazel` around lines 18 - 26, The
shard_count for the Huggingface test target "huggingface_test" is set to 10
which is inconsistent with the Gemini provider tests (which use shard_count =
7); update the "huggingface_test" target in BUILD.bazel to use the same
shard_count value as other provider tests (e.g., change shard_count to 7) or add
a brief comment in the BUILD file next to the "huggingface_test" rule explaining
why it must remain different so the discrepancy is documented and intentional.
pkg/inference/embedding/gemini/BUILD.bazel (1)

18-26: ⚡ Quick win

Investigate why tests are marked flaky.

The flaky = True attribute suggests these tests have non-deterministic failures. Consider investigating and fixing the root cause rather than marking tests as flaky, as this can mask real issues and reduce test signal quality.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/inference/embedding/gemini/BUILD.bazel` around lines 18 - 26, The test
target "gemini_test" in the BUILD.bazel currently sets flaky = True which masks
nondeterministic failures; investigate and fix the root causes in gemini_test.go
(race conditions, timing, shared state, or reliance on external services) by
reproducing failures locally (run the "gemini_test" target without flaky and
with shard_count reduced), add deterministic seeds or explicit synchronization
in the test code, increase the timeout only if legitimately slow, and then
remove the flaky = True attribute from the "gemini_test" rule once the tests are
stable.
pkg/inference/embedding/jina/jina.go (2)

55-55: ⚖️ Poor tradeoff

Configure HTTP client timeout to prevent indefinite hangs.

The http.Client is initialized with default settings (no timeout), which can cause requests to hang indefinitely if the JinaAI API is slow or unresponsive. While the context provides cancellation, the client itself should have a reasonable timeout.

⏱️ Suggested fix to add timeout configuration
 func NewJinaEmbedder(cfg EmbedderConfig) *Embedder {
 	return &Embedder{
-		client: http.Client{},
+		client: http.Client{
+			Timeout: 30 * time.Second,
+		},
 		cfg:    cfg,
 	}

Consider making the timeout configurable via EmbedderConfig if different deployments need different values.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/inference/embedding/jina/jina.go` at line 55, The http.Client is created
with no timeout (client: http.Client{}), risking indefinite hangs; update the
embedder initialization (where the client is constructed in jina.go) to set a
sensible Timeout on the http.Client (e.g., from a new EmbedderConfig field) and
wire that timeout through the constructor/path that creates the client so the
timeout becomes configurable via EmbedderConfig and used when initializing the
client.

116-119: 💤 Low value

API key may be logged in error responses.

When logging failed requests at line 118, the body may contain echoed authorization information. While the Authorization header isn't logged directly, verify whether JinaAI's error responses could leak sensitive data.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/inference/embedding/jina/jina.go` around lines 116 - 119, The error log
in pkg/inference/embedding/jina/jina.go currently logs the full response body in
the logutil.BgLogger().Error call (the line calling logutil.BgLogger().Error
with zap.String("body", string(body))) which could leak echoed API keys; before
calling that logger (or replace that call), sanitize the response body by
parsing/masking any sensitive fields (e.g., "authorization", "api_key", "token",
"access_token", "credentials") or redact Authorization-like patterns (e.g.,
Bearer <token>) and then log the sanitized string; update the code around the
resp handling in the same function to produce a safeBody string and pass
zap.String("body", safeBody) to the logger.
pkg/inference/embedding/huggingface/huggingface.go (2)

121-124: 💤 Low value

API key may be logged in error responses.

When logging failed requests at line 123, the body may contain echoed authorization information or API keys from the error response. While the Authorization header itself isn't logged here, some APIs echo request details in error responses. Consider whether this is a concern for HuggingFace's API.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/inference/embedding/huggingface/huggingface.go` around lines 121 - 124,
The error log at the HuggingFace request failure uses the raw response body
(zap.String("body", string(body))) which may contain echoed API keys or
authorization data; update the failure handling in the function that makes the
request (the code that calls resp and builds the log with logutil.BgLogger()) to
avoid logging sensitive contents by either removing the full body from the log
or sanitizing it first (e.g., detect and redact common keys like
"authorization", "api_key", "token" or apply a safe snippet/length limit and
replace sensitive substrings with "[REDACTED]"), and keep the existing metadata
(resp.StatusCode) in the log. Ensure you change the zap.String("body", ...)
usage to use the sanitized string instead of the raw body.

54-54: ⚖️ Poor tradeoff

Configure HTTP client timeout to prevent indefinite hangs.

The http.Client is initialized with default settings, which means no timeout. This can cause requests to hang indefinitely if the HuggingFace API is slow or unresponsive. While the context passed to http.NewRequestWithContext provides cancellation, the client itself should have a reasonable timeout for connection and overall request duration.

⏱️ Suggested fix to add timeout configuration
 func NewHuggingFaceEmbedder(cfg EmbedderConfig) *Embedder {
 	return &Embedder{
-		client: http.Client{},
+		client: http.Client{
+			Timeout: 30 * time.Second,
+		},
 		cfg:    cfg,
 	}

Alternatively, make the timeout configurable via EmbedderConfig if different deployments need different values.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/inference/embedding/huggingface/huggingface.go` at line 54, The
http.Client is created with zero timeout (client: http.Client{}) which can hang;
update the HuggingFace embedder initialization to set a sensible Timeout (e.g.,
15–30s) on the http.Client instance used by the embedder, or wire a timeout
value from EmbedderConfig into the client construction so the Timeout field is
configurable; change the client creation referenced by the client: http.Client{}
initializer in the huggingface embedder to use http.Client{Timeout: ...} (or
build the client with a config-supplied duration).
pkg/inference/embedding/jina/jina_test.go (1)

1-227: ⚡ Quick win

Consider adding missing API key test coverage.

The HuggingFace tests include TestHuggingFaceEmbedder_NoAPIKey, TestHuggingFaceEmbedder_CustomErrorHandling, and TestHuggingFaceEmbedder_CustomUnauthorizedError to verify API key validation and custom error handling. The Jina tests lack equivalent coverage for these scenarios. Consider adding similar tests for consistency and completeness.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/inference/embedding/jina/jina_test.go` around lines 1 - 227, Add unit
tests to cover missing API key and custom error handling for the Jina embedder:
create tests similar to HuggingFace ones that exercise NewJinaEmbedder and
CreateEmbeddings for (1) no API key supplied (e.g., TestJinaEmbedder_NoAPIKey)
verifying it returns an error and nil embeddings, (2) custom unauthorized
response handling (e.g., TestJinaEmbedder_CustomUnauthorizedError) mocking a 401
response and asserting the error contains expected message, and (3) custom error
payload handling (e.g., TestJinaEmbedder_CustomErrorHandling) mocking non-2xx
responses with body and asserting CreateEmbeddings surfaces the body details;
place these tests alongside existing TestJinaEmbedder_* functions and reuse
httptest.NewServer, EmbedderConfig GetAPIKey/GetBaseURL, and assertions on
returned embeddings/errors to mirror the HuggingFace tests.
pkg/inference/embedding/nvidia/protocol.go (1)

17-22: 💤 Low value

Remove unused Request struct or use it in the implementation.

The Request struct is defined but never used. In nvidia.go lines 89-93, the request body is built using a map literal passed to base.MarshalJSONWithOptions instead of marshalling this struct. This is dead code that adds maintenance burden.

🧹 Option 1: Remove the unused struct
-// Request is the model for Nvidia NIM embeddings API request.
-type Request struct {
-	Input          []string `json:"input"`
-	Model          string   `json:"model"`
-	EncodingFormat string   `json:"encoding_format"`
-}
-
 // Response is the model for Nvidia NIM embeddings API response.

Alternatively, if the struct is intended for future use or documentation purposes, add a comment explaining why it's not currently used in the implementation.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/inference/embedding/nvidia/protocol.go` around lines 17 - 22, The Request
struct (type Request) in protocol.go is dead code because the Nvidia embeddings
request body is constructed directly via a map passed to
base.MarshalJSONWithOptions in nvidia.go; either remove the unused Request type
or explicitly use it when marshalling (replace the map literal with an instance
of Request and marshal that), or if you keep the type for future/documentation
purposes add a clear comment above type Request explaining why it is
intentionally unused today and referencing the code path in nvidia.go that
builds the payload via base.MarshalJSONWithOptions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/executor/insert_common.go`:
- Around line 688-699: The code captures a single shared evalCtx (evalCtx :=
e.Ctx().GetExprCtx().GetEvalCtx()) and reuses it from every goroutine, which can
race statement-scoped state when calling task.expr.Eval; instead create a
distinct EvalCtx per worker before calling task.expr.Eval (e.g., call
e.Ctx().GetExprCtx().GetEvalCtx() or use a provided Clone/New method inside the
eg.Go closure) so each goroutine has its own evalCtx; update the goroutine body
around task.expr.Eval and chunk.MutRowFromDatums(task.row).ToRow() to use that
per-goroutine evalCtx and ensure any cleanup if the EvalCtx type requires it.
- Around line 647-650: The code snapshots a single uint64 rowCntInLoadData from
e.rowCount for the whole batch when
e.Ctx().GetSessionVars().StmtCtx.InLoadDataStmt is true, then passes that same
value into HandleBadNull for every row causing LOAD DATA errors to be reported
against the batch's last row; change the logic to use the per-row LOAD DATA
index instead of the single snapshot: inside the per-row loop that calls
HandleBadNull, compute the correct per-row LOAD DATA index (using the per-row
counter/variable used to iterate the batch or the LOAD DATA line index) and pass
that per-row value to HandleBadNull rather than rowCntInLoadData; update any use
sites referencing rowCntInLoadData (e.g., where HandleBadNull is invoked) to use
the per-row index so failures are reported against the failing row.

In `@pkg/expression/builtin_inference.go`:
- Around line 1-13: Update the license header year in
pkg/expression/builtin_inference.go to match the corresponding test file
(builtin_inference_test.go) so the copyright years are consistent across related
files; open builtin_inference.go, locate the top-of-file license comment block
and change the "Copyright 2025" year to the same year used in
builtin_inference_test.go (2026) or vice versa if you prefer to standardize on
2025—ensure both files have the identical year in their header comment.

In `@pkg/expression/inference_helper.go`:
- Around line 83-85: ExtractAutoEmbedInfoFromAST currently only rejects fewer
than 2 args but silently accepts extra args; change the argument validation to
explicitly allow only the supported arities (e.g., exactly 2 or exactly 3
arguments) and return an error for any other arity. Update the len(fnCall.Args)
check(s) that currently read "if len(fnCall.Args) < 2" to instead validate
membership in the allowed set (len == 2 || len == 3) and return
fmt.Errorf("invalid EMBED_TEXT() usage: unexpected arity") when not allowed;
apply this same change to the second arg-count validation block in the file that
also inspects fnCall.Args (the later branch inside ExtractAutoEmbedInfoFromAST).

In `@pkg/expression/integration_test/integration_test.go`:
- Around line 4507-4513: The test mutates the global sysvar
tidb_exp_embed_jina_ai_api_key and currently only resets it at the end of the
happy path, risking leakage if an assertion fails; register an unconditional
cleanup before the first mutation (e.g., call t.Cleanup(func(){ tk.MustExec("set
@@global.tidb_exp_embed_jina_ai_api_key = ''") })) so the global is always
reset, and then proceed with the existing tk.MustExec/tk.MustQuery assertions
that set and check tidb_exp_embed_jina_ai_api_key.

In `@pkg/inference/embedding/cohere/cohere.go`:
- Around line 52-56: NewCohereEmbedder constructs an http.Client without a
timeout causing potential indefinite hangs; update NewCohereEmbedder to
initialize client: http.Client{Timeout: time.Second * <reasonable_value>}
(choose e.g. 10s or configurable), and add the "time" import; ensure the
Embedder struct's client field still uses that configured http.Client so all
Cohere requests get the client-level timeout as defense-in-depth.

In `@pkg/inference/embedding/gemini/gemini.go`:
- Around line 98-100: The model string is interpolated directly into the path
when building fullURL, which can break URLs or allow unintended endpoints;
before the fmt.Sprintf call that sets fullURL, validate that model is non-empty
and does not contain path traversal or illegal chars (e.g., reject segments with
"/" or "..") or alternatively escape it with a path-encoding function (e.g.,
url.PathEscape) and use the escaped value (escapedModel) in place of model;
update the code around variables fullURL, model and baseURL to perform this
validation/escaping and return an error if validation fails.
- Around line 51-56: The NewGeminiEmbedder function constructs an Embedder with
an http.Client that lacks a timeout; update NewGeminiEmbedder to set a sensible
client timeout (e.g., via http.Client{Timeout: time.Second * <N>}) so requests
cannot hang indefinitely, and add the "time" import; modify the Embedder
construction (function NewGeminiEmbedder, type Embedder and its client field) to
use the configured timeout value.

In `@pkg/inference/embedding/huggingface/protocol.go`:
- Around line 22-24: The comment above the Response type is inconsistent with
the actual type; update the docstring for Response (type Response [][]float32 in
protocol.go) to state that HuggingFace returns a 2D array of float32 values, or
if you confirm the API returns float64 instead, change the type to [][]float64
and update all usages accordingly; ensure the comment and the type (Response)
match exactly.

In `@pkg/inference/embedding/jina/BUILD.bazel`:
- Around line 23-24: The BUILD target currently sets flaky = True and
shard_count = 6; remove these patch-work flags and fix the underlying test
instability instead: locate the failing tests under the
pkg/inference/embedding/jina package, reproduce the flakes, and address root
causes such as race conditions, timing-dependent assertions, shared state, or
unmocked external dependencies by adding proper synchronization, deterministic
fixtures, explicit mocks, or adjusting timeouts; once tests are deterministic,
delete flaky = True and shard_count = 6 from the BUILD.bazel target so CI runs a
single stable shard.

In `@pkg/inference/embedding/mock/BUILD.bazel`:
- Around line 16-17: The BUILD.bazel target in pkg/inference/embedding/mock
currently sets flaky = True and shard_count = 3—remove those attributes and make
the tests deterministic instead of masking failures; locate the mock test target
in BUILD.bazel and (1) remove flaky and shard_count, (2) audit tests in
pkg/inference/embedding/mock for shared global state, non-deterministic
timing/race conditions, or reliance on test ordering, and (3) fix by adding
proper setup/teardown (reset globals, use fresh temp dirs, inject deterministic
mocks/fakes, or serialize accesses with locks) so the tests pass reliably in
parallel without sharding. Ensure the target no longer needs flaky/shard_count
after these fixes and run the test suite to confirm stability.

In `@pkg/inference/embedding/mock/mock.go`:
- Around line 72-84: The delay handling sleeps unconditionally and ignores the
provided context, causing hangs if the context is cancelled; modify the block
that parses opts["delay"] so that instead of time.Sleep(dur) you wait with a
select on time.After(dur) and ctx.Done(), returning early with ctx.Err() when
the context is cancelled. Keep the existing parsing and error messages (the opts
map and delay parsing code) and only replace the unconditional sleep with a
context-aware select that returns (nil, ctx.Err()) on cancellation.

In `@pkg/inference/embedding/nvidia/nvidia.go`:
- Around line 147-161: The current loop assigns embeddings by the sorted
position (row) which can misplace vectors if indices are duplicated or
non-sequential; instead validate that respObj.Data contains exactly one entry
for each index 0..len(texts)-1 and then assign using the actual Index.
Specifically, after sort.Slice(respObj.Data, ...) iterate respObj.Data and for
each item check that 0 <= item.Index < len(texts) and that
embeddings[item.Index] is nil (to detect duplicates); on any invalid or
duplicate index return an error indicating the problematic item.Index; otherwise
decode with base.DecodeFloat32ArrayBytes and set embeddings[item.Index] = e (not
embeddings[row]).
- Around line 54-56: The HTTP client for Embedder is created with zero timeout
(http.Client{}), which can cause indefinite hangs; set a sensible Timeout on the
client (e.g., http.Client{Timeout: ...}) and make that timeout configurable via
EmbedderConfig (add a Timeout field and use it when constructing the Embedder's
client); update the constructor that returns &Embedder{client: http.Client{},
cfg: cfg, ...} to initialize client with the configured Timeout and provide a
reasonable default if cfg.Timeout is unset.

In `@pkg/inference/embedding/openai/openai_test.go`:
- Line 28: Replace the misleading comment "Mock successful response from real
Jina API" in openai_test.go with a correct description for the OpenAI embedder
(e.g., "Mock successful response from real OpenAI API" or "Mock successful
response for OpenAI embedder") so the comment accurately matches the test;
locate the comment by searching for that exact string in the test file and
update it in the test function that sets up the mocked OpenAI response.

In `@pkg/inference/embedding/openai/openai.go`:
- Around line 142-157: After sorting respObj.Data by Index, validate that each
item's Index is within [0, len(texts)-1] and that indices form a unique sequence
(no duplicates) before assigning into embeddings; for each item in respObj.Data
call base.DecodeFloat32ArrayBytes(item.Embedding) as before but place the
decoded vector into embeddings[item.Index] (not embeddings[row]), and return a
clear error if any item.Index is out of range or a duplicate/missing so we never
misassign embeddings; reference respObj.Data, item.Index, embeddings and
base.DecodeFloat32ArrayBytes to locate and implement the checks and assignment
change.
- Around line 54-58: The http.Client in NewOpenAIEmbedder is created with
default settings (no timeout); update NewOpenAIEmbedder to set a sensible
timeout on Embedder.client by using a configured value from EmbedderConfig
(e.g., add a Timeout field to EmbedderConfig) and fall back to a reasonable
default if not provided; ensure the Embedder struct uses http.Client{Timeout:
cfg.Timeout} (or default) so embedding requests cannot hang indefinitely.

In `@pkg/planner/core/expression_rewriter.go`:
- Around line 3038-3042: The error wrapping currently replaces the original
error with a new one via fmt.Errorf in the block that calls
expression.ExtractAutoEmbedInfoFromAST(astExpr), losing the original
stack/context; change the assignment to preserve the original error by using
errors.Annotatef (or the project's equivalent) to annotate err and assign that
to er.err (refer to expression.ExtractAutoEmbedInfoFromAST, er.err, and
fmt.Errorf) so the original error context is preserved while adding the "column
%s cannot be used in %s" message.
- Around line 3057-3060: The inline initializer fnInit currently does an
unchecked type assertion to *expression.BuiltinEmbedTextSig which can panic;
update fnInit (used by er.newFunctionWithInit) to perform a safe type assertion
(e.g., using the "v, ok := sf.Function.(*expression.BuiltinEmbedTextSig)"
pattern or a type switch), set v.IsFromVecSearch = true only when ok, and return
a descriptive error (instead of panicking) when the function is not the expected
BuiltinEmbedTextSig type so callers can handle the failure.

---

Nitpick comments:
In `@pkg/inference/embedding/base/BUILD.bazel`:
- Line 15: The test target "base_test" in the BUILD rule is marked flaky = True
even though base_test only runs deterministic base64 decode/validation; remove
the flaky = True attribute (or set flaky = False) from the base_test rule so
failures surface for investigation, then run the test suite locally to confirm
no flakiness before committing.

In `@pkg/inference/embedding/batcher/batcher.go`:
- Around line 244-260: Add a clear comment above the reqCtx/cancelReq creation
and goroutine explaining that reqCtx only cancels after all individual request
contexts in thisBatch.calls are done (the goroutine loops over call.ctx and only
calls cancelReq after every call is done), so batching can cause a request with
a short timeout to wait longer (up to batchWindow) before its batch-level
context cancels; state this is intentional for batching and note the trade-off
for maintainers and any expectations around embedder.CreateEmbeddings using
reqCtx.

In `@pkg/inference/embedding/gemini/BUILD.bazel`:
- Around line 18-26: The test target "gemini_test" in the BUILD.bazel currently
sets flaky = True which masks nondeterministic failures; investigate and fix the
root causes in gemini_test.go (race conditions, timing, shared state, or
reliance on external services) by reproducing failures locally (run the
"gemini_test" target without flaky and with shard_count reduced), add
deterministic seeds or explicit synchronization in the test code, increase the
timeout only if legitimately slow, and then remove the flaky = True attribute
from the "gemini_test" rule once the tests are stable.

In `@pkg/inference/embedding/huggingface/BUILD.bazel`:
- Line 23: The BUILD.bazel target currently sets flaky = True, which masks
nondeterministic test failures; remove or revert the flaky = True attribute on
the target and instead reproduce the intermittent failures by running the
corresponding Bazel test target(s) repeatedly (e.g., bazel test
--runs_per_test=50), add deterministic fixes (stabilize timers, seed randomness,
isolate shared state, add proper setup/teardown) in the test code, and only
reintroduce flaky metadata after root causes are identified; look for references
to the BUILD target that contains flaky = True to locate the tests to run and
stabilize.
- Around line 18-26: The shard_count for the Huggingface test target
"huggingface_test" is set to 10 which is inconsistent with the Gemini provider
tests (which use shard_count = 7); update the "huggingface_test" target in
BUILD.bazel to use the same shard_count value as other provider tests (e.g.,
change shard_count to 7) or add a brief comment in the BUILD file next to the
"huggingface_test" rule explaining why it must remain different so the
discrepancy is documented and intentional.

In `@pkg/inference/embedding/huggingface/huggingface.go`:
- Around line 121-124: The error log at the HuggingFace request failure uses the
raw response body (zap.String("body", string(body))) which may contain echoed
API keys or authorization data; update the failure handling in the function that
makes the request (the code that calls resp and builds the log with
logutil.BgLogger()) to avoid logging sensitive contents by either removing the
full body from the log or sanitizing it first (e.g., detect and redact common
keys like "authorization", "api_key", "token" or apply a safe snippet/length
limit and replace sensitive substrings with "[REDACTED]"), and keep the existing
metadata (resp.StatusCode) in the log. Ensure you change the zap.String("body",
...) usage to use the sanitized string instead of the raw body.
- Line 54: The http.Client is created with zero timeout (client: http.Client{})
which can hang; update the HuggingFace embedder initialization to set a sensible
Timeout (e.g., 15–30s) on the http.Client instance used by the embedder, or wire
a timeout value from EmbedderConfig into the client construction so the Timeout
field is configurable; change the client creation referenced by the client:
http.Client{} initializer in the huggingface embedder to use
http.Client{Timeout: ...} (or build the client with a config-supplied duration).

In `@pkg/inference/embedding/jina/jina_test.go`:
- Around line 1-227: Add unit tests to cover missing API key and custom error
handling for the Jina embedder: create tests similar to HuggingFace ones that
exercise NewJinaEmbedder and CreateEmbeddings for (1) no API key supplied (e.g.,
TestJinaEmbedder_NoAPIKey) verifying it returns an error and nil embeddings, (2)
custom unauthorized response handling (e.g.,
TestJinaEmbedder_CustomUnauthorizedError) mocking a 401 response and asserting
the error contains expected message, and (3) custom error payload handling
(e.g., TestJinaEmbedder_CustomErrorHandling) mocking non-2xx responses with body
and asserting CreateEmbeddings surfaces the body details; place these tests
alongside existing TestJinaEmbedder_* functions and reuse httptest.NewServer,
EmbedderConfig GetAPIKey/GetBaseURL, and assertions on returned
embeddings/errors to mirror the HuggingFace tests.

In `@pkg/inference/embedding/jina/jina.go`:
- Line 55: The http.Client is created with no timeout (client: http.Client{}),
risking indefinite hangs; update the embedder initialization (where the client
is constructed in jina.go) to set a sensible Timeout on the http.Client (e.g.,
from a new EmbedderConfig field) and wire that timeout through the
constructor/path that creates the client so the timeout becomes configurable via
EmbedderConfig and used when initializing the client.
- Around line 116-119: The error log in pkg/inference/embedding/jina/jina.go
currently logs the full response body in the logutil.BgLogger().Error call (the
line calling logutil.BgLogger().Error with zap.String("body", string(body)))
which could leak echoed API keys; before calling that logger (or replace that
call), sanitize the response body by parsing/masking any sensitive fields (e.g.,
"authorization", "api_key", "token", "access_token", "credentials") or redact
Authorization-like patterns (e.g., Bearer <token>) and then log the sanitized
string; update the code around the resp handling in the same function to produce
a safeBody string and pass zap.String("body", safeBody) to the logger.

In `@pkg/inference/embedding/nvidia/protocol.go`:
- Around line 17-22: The Request struct (type Request) in protocol.go is dead
code because the Nvidia embeddings request body is constructed directly via a
map passed to base.MarshalJSONWithOptions in nvidia.go; either remove the unused
Request type or explicitly use it when marshalling (replace the map literal with
an instance of Request and marshal that), or if you keep the type for
future/documentation purposes add a clear comment above type Request explaining
why it is intentionally unused today and referencing the code path in nvidia.go
that builds the payload via base.MarshalJSONWithOptions.

In `@pkg/planner/core/expression_rewriter.go`:
- Around line 3027-3031: The error message for the vector argument in
expression_rewriter.go can be empty because vecArg.OrigName may be empty; update
the check around the vecArg handling in the block that sets er.err (where vecArg
is derived from args[0] and v.FnName.O is used) to build a robust name: if
vecArg.OrigName is empty, use vecArg.String() or a constant placeholder like
"<unknown column>" when formatting the error text so the message is always
informative and never blank.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 6b57fe1c-2b9b-4cb6-9768-bcc6e68c1971

📥 Commits

Reviewing files that changed from the base of the PR and between b399d61 and b98639b.

📒 Files selected for processing (75)
  • pkg/config/config.go
  • pkg/config/config.toml.example
  • pkg/config/config.toml.nextgen.example
  • pkg/ddl/add_column.go
  • pkg/ddl/create_table.go
  • pkg/ddl/generated_column.go
  • pkg/domain/BUILD.bazel
  • pkg/domain/domain.go
  • pkg/domain/inference.go
  • pkg/executor/insert_common.go
  • pkg/expression/BUILD.bazel
  • pkg/expression/builtin.go
  • pkg/expression/builtin_inference.go
  • pkg/expression/builtin_inference_test.go
  • pkg/expression/builtin_threadunsafe_generated.go
  • pkg/expression/column.go
  • pkg/expression/expression.go
  • pkg/expression/function_traits.go
  • pkg/expression/function_traits_test.go
  • pkg/expression/generator/builtin_threadsafe.go
  • pkg/expression/inference_helper.go
  • pkg/expression/inference_helper_test.go
  • pkg/expression/integration_test/BUILD.bazel
  • pkg/expression/integration_test/integration_test.go
  • pkg/inference/BUILD.bazel
  • pkg/inference/domainadaptor/BUILD.bazel
  • pkg/inference/domainadaptor/adaptor.go
  • pkg/inference/embedding/base/BUILD.bazel
  • pkg/inference/embedding/base/base.go
  • pkg/inference/embedding/base/base_test.go
  • pkg/inference/embedding/batcher/BUILD.bazel
  • pkg/inference/embedding/batcher/batcher.go
  • pkg/inference/embedding/batcher/batcher_test.go
  • pkg/inference/embedding/cohere/BUILD.bazel
  • pkg/inference/embedding/cohere/cohere.go
  • pkg/inference/embedding/cohere/cohere_test.go
  • pkg/inference/embedding/cohere/protocol.go
  • pkg/inference/embedding/gemini/BUILD.bazel
  • pkg/inference/embedding/gemini/gemini.go
  • pkg/inference/embedding/gemini/gemini_test.go
  • pkg/inference/embedding/gemini/protocol.go
  • pkg/inference/embedding/huggingface/BUILD.bazel
  • pkg/inference/embedding/huggingface/huggingface.go
  • pkg/inference/embedding/huggingface/huggingface_test.go
  • pkg/inference/embedding/huggingface/protocol.go
  • pkg/inference/embedding/jina/BUILD.bazel
  • pkg/inference/embedding/jina/jina.go
  • pkg/inference/embedding/jina/jina_test.go
  • pkg/inference/embedding/jina/protocol.go
  • pkg/inference/embedding/mock/BUILD.bazel
  • pkg/inference/embedding/mock/mock.go
  • pkg/inference/embedding/mock/mock_test.go
  • pkg/inference/embedding/nvidia/BUILD.bazel
  • pkg/inference/embedding/nvidia/nvidia.go
  • pkg/inference/embedding/nvidia/nvidia_test.go
  • pkg/inference/embedding/nvidia/protocol.go
  • pkg/inference/embedding/openai/BUILD.bazel
  • pkg/inference/embedding/openai/openai.go
  • pkg/inference/embedding/openai/openai_test.go
  • pkg/inference/embedding/openai/protocol.go
  • pkg/inference/embedding/tidbcloud/BUILD.bazel
  • pkg/inference/embedding/tidbcloud/protocol.go
  • pkg/inference/embedding/tidbcloud/tidbcloud_free.go
  • pkg/inference/embedding/tidbcloud/tidbcloud_free_test.go
  • pkg/inference/manager_test.go
  • pkg/inference/sqlembed.go
  • pkg/parser/ast/functions.go
  • pkg/planner/core/BUILD.bazel
  • pkg/planner/core/expression_rewriter.go
  • pkg/planner/core/logical_plan_builder.go
  • pkg/sessionctx/vardef/tidb_vars.go
  • pkg/sessionctx/variable/BUILD.bazel
  • pkg/sessionctx/variable/embedding_vars.go
  • pkg/sessionctx/variable/embedding_vars_test.go
  • pkg/sessionctx/variable/sysvar.go

Comment thread pkg/executor/insert_common.go Outdated
Comment thread pkg/executor/insert_common.go Outdated
Comment thread pkg/expression/builtin_inference.go Outdated
Comment thread pkg/expression/inference_helper.go Outdated
Comment thread pkg/expression/integration_test/integration_test.go
Comment thread pkg/inference/embedding/openai/openai_test.go Outdated
Comment thread pkg/inference/embedding/openai/openai.go
Comment thread pkg/inference/embedding/openai/openai.go Outdated
Comment thread pkg/planner/core/expression_rewriter.go
Comment thread pkg/planner/core/expression_rewriter.go
@codecov

codecov Bot commented Jun 8, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 13.26841% with 1425 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.0455%. Comparing base (74e231b) to head (78aba2b).
⚠️ Report is 36 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #69014        +/-   ##
================================================
- Coverage   76.3268%   76.0455%   -0.2814%     
================================================
  Files          2041       2078        +37     
  Lines        562499     583766     +21267     
================================================
+ Hits         429338     443928     +14590     
- Misses       132250     137130      +4880     
- Partials        911       2708      +1797     
Flag Coverage Δ
integration 45.9389% <13.2684%> (+6.2191%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 60.4610% <ø> (ø)
parser ∅ <ø> (∅)
br 63.9120% <ø> (+1.1221%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/expression/builtin_inference.go (1)

147-158: ⚡ Quick win

Consider adding a comment to explain the @search suffix behavior.

The @search suffix processing logic is subtle: when isFromVecSearch is true, options like "key@search": value are renamed to "key": value; when false, they're dropped entirely. This allows callers to specify search-time-only options that are ignored during direct EMBED_TEXT() calls but honored when the function is rewritten from VEC_EMBED_*_DISTANCE(). A brief comment would help future maintainers understand this intent.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/expression/builtin_inference.go` around lines 147 - 158, Add a short
clarifying comment above the block that collects and processes keys with the
"`@search`" suffix explaining that entries like "key@search" are treated as
search-time-only options: if isFromVecSearch is true they are promoted to "key"
(opts[strings.TrimSuffix(k, "`@search`")] = opts[k]) to be honored when the call
was rewritten from a VEC_EMBED_*_DISTANCE path, otherwise they are removed
(delete(opts, k)) so direct EMBED_TEXT() calls ignore them; reference the
variables searchOpts, opts and isFromVecSearch in the comment so future readers
understand the intent of this rename-or-drop behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@pkg/expression/builtin_inference.go`:
- Around line 147-158: Add a short clarifying comment above the block that
collects and processes keys with the "`@search`" suffix explaining that entries
like "key@search" are treated as search-time-only options: if isFromVecSearch is
true they are promoted to "key" (opts[strings.TrimSuffix(k, "`@search`")] =
opts[k]) to be honored when the call was rewritten from a VEC_EMBED_*_DISTANCE
path, otherwise they are removed (delete(opts, k)) so direct EMBED_TEXT() calls
ignore them; reference the variables searchOpts, opts and isFromVecSearch in the
comment so future readers understand the intent of this rename-or-drop behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 32cbf36e-e183-4934-addb-0aaa6faec878

📥 Commits

Reviewing files that changed from the base of the PR and between b98639b and c3c302f.

📒 Files selected for processing (28)
  • pkg/executor/BUILD.bazel
  • pkg/executor/insert_common.go
  • pkg/expression/builtin_inference.go
  • pkg/expression/inference_helper.go
  • pkg/expression/inference_helper_test.go
  • pkg/expression/integration_test/integration_test.go
  • pkg/inference/embedding/base/BUILD.bazel
  • pkg/inference/embedding/base/base.go
  • pkg/inference/embedding/base/base_test.go
  • pkg/inference/embedding/cohere/cohere.go
  • pkg/inference/embedding/gemini/BUILD.bazel
  • pkg/inference/embedding/gemini/gemini.go
  • pkg/inference/embedding/gemini/gemini_test.go
  • pkg/inference/embedding/huggingface/huggingface.go
  • pkg/inference/embedding/huggingface/protocol.go
  • pkg/inference/embedding/jina/BUILD.bazel
  • pkg/inference/embedding/jina/jina.go
  • pkg/inference/embedding/jina/jina_test.go
  • pkg/inference/embedding/mock/mock.go
  • pkg/inference/embedding/mock/mock_test.go
  • pkg/inference/embedding/nvidia/BUILD.bazel
  • pkg/inference/embedding/nvidia/nvidia.go
  • pkg/inference/embedding/nvidia/nvidia_test.go
  • pkg/inference/embedding/openai/BUILD.bazel
  • pkg/inference/embedding/openai/openai.go
  • pkg/inference/embedding/openai/openai_test.go
  • pkg/inference/embedding/tidbcloud/tidbcloud_free.go
  • pkg/planner/core/expression_rewriter.go
🚧 Files skipped from review as they are similar to previous changes (20)
  • pkg/inference/embedding/base/BUILD.bazel
  • pkg/inference/embedding/openai/BUILD.bazel
  • pkg/inference/embedding/huggingface/protocol.go
  • pkg/inference/embedding/mock/mock.go
  • pkg/inference/embedding/jina/BUILD.bazel
  • pkg/inference/embedding/mock/mock_test.go
  • pkg/inference/embedding/cohere/cohere.go
  • pkg/inference/embedding/nvidia/nvidia.go
  • pkg/inference/embedding/openai/openai.go
  • pkg/inference/embedding/gemini/BUILD.bazel
  • pkg/inference/embedding/jina/jina_test.go
  • pkg/inference/embedding/nvidia/BUILD.bazel
  • pkg/inference/embedding/openai/openai_test.go
  • pkg/inference/embedding/huggingface/huggingface.go
  • pkg/planner/core/expression_rewriter.go
  • pkg/expression/inference_helper_test.go
  • pkg/inference/embedding/tidbcloud/tidbcloud_free.go
  • pkg/inference/embedding/gemini/gemini_test.go
  • pkg/expression/integration_test/integration_test.go
  • pkg/executor/insert_common.go

@ChangRui-Ryan

Copy link
Copy Markdown
Contributor Author

/retest

@tiprow

tiprow Bot commented Jun 9, 2026

Copy link
Copy Markdown

@ChangRui-Ryan: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Comment thread pkg/config/config.go Outdated
type HostedEmbedding struct {
// Enabled indicates whether the hosted embedding service is enabled.
// It only takes effect for nextgen kernels.
Enabled bool `toml:"enabled" json:"enabled"`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add omitempty

Comment thread pkg/config/config.go
Comment thread pkg/config/config.toml.example Outdated
Comment on lines +433 to +439
[hosted-embedding]
# Enables the TiDB Cloud hosted embedding provider tidbcloud_free.
# Effective only for nextgen kernels.
enabled = false
api-endpoint = ""
api-key-path = ""

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove it, not supported in classic kernel

@ti-chi-bot

ti-chi-bot Bot commented Jun 10, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign d3hunter, gengliqi, terry1purcell for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ChangRui-Ryan

Copy link
Copy Markdown
Contributor Author

/retest

@tiprow

tiprow Bot commented Jun 10, 2026

Copy link
Copy Markdown

@ChangRui-Ryan: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ChangRui-Ryan

Copy link
Copy Markdown
Contributor Author

/retest

@tiprow

tiprow Bot commented Jun 10, 2026

Copy link
Copy Markdown

@ChangRui-Ryan: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ChangRui-Ryan

Copy link
Copy Markdown
Contributor Author

/retest

@tiprow

tiprow Bot commented Jun 10, 2026

Copy link
Copy Markdown

@ChangRui-Ryan: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@D3Hunter D3Hunter left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

  • Total findings: 11
  • Inline comments: 10
  • Summary-only findings (no inline anchor): 1
Findings (highest risk first)

🚨 [Blocker] (1)

  1. Auto-embedding table metadata is introduced without an N-1 compatibility gate (pkg/ddl/create_table.go:558, pkg/executor/insert_common.go:634, pkg/expression/expression.go:1114, pkg/ddl/generated_column.go:448)

⚠️ [Major] (7)

  1. LOAD DATA never materializes stored EMBED_TEXT generated columns (LOAD DATA row path in pkg/executor/load_data.go lines 560, 581, and 663, together with the auto-embedding skip in pkg/executor/insert_common.go line 899)
  2. ALTER or MODIFY can still create generated-column chains on top of auto-embedding columns (pkg/ddl/create_table.go:602, pkg/ddl/generated_column.go:255, pkg/executor/insert_common.go:899)
  3. DDL accepts auto-embedding generated columns with invalid JSON options (pkg/ddl/generated_column.go:448, pkg/expression/inference_helper.go:115, pkg/expression/builtin_inference.go:138)
  4. EMBED_TEXT drops statement cancellation and timeout propagation once the remote call starts (pkg/inference/sqlembed.go:162; pkg/executor/insert_common.go:700-740)
  5. Hosted embedding API-key file is read once at startup and never refreshed (pkg/inference/sqlembed.go:106-126; pkg/domain/inference.go:19-31)
  6. domainadaptor.GetEmbedFn hides a global-default fallback behind a domain-scoped name (pkg/inference/domainadaptor/adaptor.go:15, pkg/inference/domainadaptor/adaptor.go:32)
  7. INSERT-side auto-embedding reimplements the EMBED_TEXT runtime contract (pkg/executor/insert_common.go:634, pkg/expression/builtin_inference.go:95)

🟡 [Minor] (3)

  1. GeneratedExprString comment describes an AST contract even though the field stores SQL text (pkg/expression/column.go:297)
  2. Auto-embedding insert helper names imply batching and singular datum handling that the code does not do (pkg/executor/insert_common.go:57, pkg/executor/insert_common.go:634)
  3. Domain public API was expanded for a single local adaptor path (pkg/domain/inference.go:19, pkg/inference/domainadaptor/adaptor.go:24)

Unanchored findings

⚠️ [Major] (1)

  1. LOAD DATA never materializes stored EMBED_TEXT generated columns
    • Scope: LOAD DATA row path in pkg/executor/load_data.go lines 560, 581, and 663, together with the auto-embedding skip in pkg/executor/insert_common.go line 899
    • Request: Add the same auto-embedding materialization step to the LOAD DATA row pipeline and cover it with an integration test for LOAD DATA on a table with a stored EMBED_TEXT() generated column.

Comment thread pkg/ddl/create_table.go
if err := checkIllegalFn4Generated(colDef.Name.Name.L, typeColumn, option.Expr); err != nil {
return errors.Trace(err)
}
if err := checkGeneratedColForAutoEmbedding(colDef.Name.Name.L, option.Expr, option.Stored); err != nil {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚨 [Blocker] Auto-embedding table metadata is introduced without an N-1 compatibility gate

Why
CREATE TABLE now allows persisted stored generated columns backed by EMBED_TEXT(), but the resulting metadata and DML/planner behavior depend on new code paths that are only present in this version. In a rolling upgrade or rollback window, older TiDB nodes can still observe the same table metadata even though this feature is not guarded for mixed-version use.

Scope
pkg/ddl/create_table.go:558, pkg/executor/insert_common.go:634, pkg/expression/expression.go:1114, pkg/ddl/generated_column.go:448

Risk if unchanged
Creating one of these tables before every TiDB node is upgraded can break inserts/updates or vector-search planning on N-1 nodes, which makes the feature unsafe for normal rolling upgrade and downgrade procedures.

Evidence
checkGeneratedColForAutoEmbedding() only validates SQL shape before CREATE TABLE succeeds; there is no cluster-version gate or mixed-version rejection in the DDL path. The new metadata is then consumed by auto-embedding DML evaluation in fillAutoEmbeddingDatum() and by planner schema loading through GeneratedExprString. The added tests cover only single-version starter-mode flows (pkg/expression/integration_test/integration_test.go:4539 through :4668) and do not exercise N/N-1 upgrade or rollback behavior.

Change request
Gate creation and use of EMBED_TEXT()-generated columns until the cluster version guarantees every TiDB node understands the new metadata, or return an explicit mixed-version unsupported error and add upgrade/downgrade coverage for that path.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed. checkGeneratedColForAutoEmbedding now rejects creating auto-embedding generated columns during mixed-version rolling upgrades.

Comment thread pkg/ddl/create_table.go
}
}

autoEmbeddingCols := make(map[string]struct{})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [Major] ALTER or MODIFY can still create generated-column chains on top of auto-embedding columns

Why
CREATE TABLE now blocks generated columns that depend on an auto-embedding column, but the modify-column validation path does not enforce the same dependency rule. That leaves an ALTER path where a generated column can be changed to depend on an EMBED_TEXT() column even though the executor deliberately skips evaluating EMBED_TEXT() during the normal generated-column pass.

Scope
pkg/ddl/create_table.go:602, pkg/ddl/generated_column.go:255, pkg/executor/insert_common.go:899

Risk if unchanged
A user can create a legal table and later ALTER TABLE ... MODIFY COLUMN a generated column to reference an auto-embedding column. Subsequent inserts then evaluate the dependent generated column before the embedding column is filled, so the dependent value is computed from NULL or dummy state and stored incorrectly.

Evidence
create_table.go:602-617 adds a dedicated scan that rejects dependencies on auto-embedding generated columns. checkModifyGeneratedColumn in generated_column.go:255-279 validates the new expression plus auto-increment and index rules, but never checks whether dependColNames includes an existing auto-embedding generated column. That omission matters because fillRow skips EMBED_TEXT() columns at insert_common.go:899-900 and does not revisit downstream generated columns afterward.

Change request
Apply the same dependency check in checkModifyGeneratedColumn and any other ALTER path that can rewrite generated expressions, and add coverage for altering a generated column to reference an existing auto-embedding column.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed. The MODIFY generated-column path now applies the same dependency check and rejects generated columns built on top of auto-embedding columns.

Comment thread pkg/ddl/generated_column.go
Comment thread pkg/inference/sqlembed.go
}

// Embed generates embeddings for the given text. It handles with cache and batching.
func (e *EmbedFn) Embed(shouldCancel func() bool, modelWithProvider string, text string, opts map[string]any) ([]float32, error) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [Major] EMBED_TEXT drops statement cancellation and timeout propagation once the remote call starts

Why
The new embedding path replaces the statement context with context.Background() and only polls SQLKiller, so request deadlines, executor cancellation, and parent-context shutdown no longer abort in-flight provider calls.

Scope
pkg/inference/sqlembed.go:162; pkg/executor/insert_common.go:700-740

Risk if unchanged
Canceled inserts and queries can keep waiting on remote embedding requests until the provider or HTTP client times out, tying up executor goroutines and slowing overload recovery under user cancels or statement timeouts.

Evidence
In Embed, the provider context is created with context.WithCancel(context.Background()) at line 163, and the only cancellation bridge is the 1-second shouldCancel poll at lines 168-182. The insert path does create egCtx := errgroup.WithContext(ctx), but the actual remote call ignores that context and calls embedFn.Embed(...) with only a SQLKiller callback at lines 723-740.

Change request
Use the caller context as the root through Embed and the provider path so EMBED_TEXT stops promptly on ctx.Done(), keeping the kill-signal poll only as a fallback when no real context is available.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed. The embedding path now uses EmbedWithContext; direct evaluation and INSERT auto-embedding pass the caller/executor context through to the provider call, with SQLKiller kept as an additional fallback.

Comment thread pkg/inference/sqlembed.go
ErrMissingAPIKey: fmt.Errorf(errMissingAPI, "Gemini", strings.ToUpper(vardef.TiDBExpEmbedGeminiAPIKey)),
// ErrUnauthorized is not provided in gemini. The error message provided by Gemini API is sufficient enough.
}))
if isHostedEmbeddingEnabled() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [Major] Hosted embedding API-key file is read once at startup and never refreshed

Why
The hosted-embedding provider captures the file contents into a local apiKey variable during NewEmbedFn, so missing credentials at boot or later key rotation cannot recover without rebuilding the whole EmbedFn.

Scope
pkg/inference/sqlembed.go:106-126; pkg/domain/inference.go:19-31

Risk if unchanged
A node that starts before the secret is mounted, or a cluster that rotates the bearer token in place, will keep sending an empty or stale credential and permanently fail EMBED_TEXT requests until restart.

Evidence
Lines 107-116 read HostedEmbedding.APIKeyPath once and store the result in a local string. The registered provider then uses GetAPIKey: func() string { return apiKey } at line 125, while the domain constructs the EmbedFn once in initInferenceProviders() and only closes it on domain shutdown.

Change request
Fetch the API key on demand instead of capturing a startup snapshot, or at least re-read the configured file behind a small refresh cache so secret-mount races and key rotation do not require restart.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed. Hosted embedding now reads the API key on demand via getHostedEmbeddingAPIKey instead of capturing it once during NewEmbedFn, so secret updates can be picked up without rebuilding EmbedFn.

// See the License for the specific language governing permissions and
// limitations under the License.

// Package domainadaptor provides access to Domain-owned inference resources

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [Major] domainadaptor.GetEmbedFn hides a global-default fallback behind a domain-scoped name

Why
The new adaptor API reads like a pure domain lookup, but the implementation also returns the process-wide default embed function when the session is nil, lacks GetDomain, or exposes a nil domain embedder. That gives the same name two meanings and makes the lifecycle and scope contract hard to read at call sites.

Scope
pkg/inference/domainadaptor/adaptor.go:15, pkg/inference/domainadaptor/adaptor.go:32

Risk if unchanged
Future callers can reasonably assume they are always getting a domain-owned resource and then accidentally couple tests or sessions through the global singleton fallback.

Evidence
The package header says this package provides access to domain-owned inference resources, and GetEmbedFn says it returns the embed bridge associated with the session's domain. The body instead returns inference.DefaultEmbedFn() on three non-domain paths at lines 35, 39, and 45.

Change request
Expose the fallback semantics in the name or comment, for example ResolveEmbedFn or GetOrDefaultEmbedFn, or add a short why-comment that this helper falls back to the process-global default when no domain-owned embedder is available.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I kept the small Domain.GetEmbedFn() accessor for now. The bridge is intentionally isolated in domainadaptor to avoid an import cycle and keep expression/inference code from importing domain. Moving this into a different session/domain capability interface would be a broader API change; for this PR I kept the minimal accessor and documented the fallback semantics in domainadaptor.GetEmbedFn.

Comment thread pkg/executor/insert_common.go Outdated
return defaultVal, nil
}

func (e *InsertValues) fillAutoEmbeddingDatum(ctx context.Context, rows [][]types.Datum) ([][]types.Datum, error) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [Major] INSERT-side auto-embedding reimplements the EMBED_TEXT runtime contract

Why
The new insert path now owns starter-mode gating, provider lookup, SQL-kill cancellation, remote embed invocation, and vector conversion in a second place instead of reusing one canonical evaluator.

Scope
pkg/executor/insert_common.go:634, pkg/expression/builtin_inference.go:95

Risk if unchanged
Future changes to EMBED_TEXT semantics can land in only one path, so direct expression evaluation and generated-column INSERT evaluation can drift apart on cancellation, provider selection, null handling, or vector materialization behavior.

Evidence
fillAutoEmbeddingDatum batches work, but the core EMBED_TEXT execution block at pkg/executor/insert_common.go:695-750 repeats the same runtime steps already implemented in BuiltinEmbedTextSig.evalVectorFloat32 at pkg/expression/builtin_inference.go:101-121; both paths call EvalEmbedTextArgs, enforce starter mode, fetch domainadaptor.GetEmbedFn(...), invoke .Embed(...), and convert the returned slice with types.CreateVectorFloat32.

Change request
Keep one canonical definition for EMBED_TEXT execution and let the executor layer handle only batching and orchestration. Extract a shared helper in the expression or inference layer for evaluating prepared embed args to a vector datum, and have both direct expression evaluation and generated-column INSERT reuse it.

Comment thread pkg/expression/column.go Outdated
// VirtualExpr is used to save expression for virtual column
VirtualExpr Expression

// GeneratedExprString stores the generated expression for planner rewrites that need the original AST.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 [Minor] GeneratedExprString comment describes an AST contract even though the field stores SQL text

Why
The comment says the field keeps the original AST, but the type is string and downstream code reparses it before rewrites. That is a direct comment-to-code mismatch on a planner-facing field.

Scope
pkg/expression/column.go:297

Risk if unchanged
Readers may treat this field as an AST-preserving representation and make incorrect assumptions about fidelity, mutability, or reparsing costs in later planner work.

Evidence
The added field is GeneratedExprString string, while nearby consumers call generatedexpr.ParseExpression(col.GeneratedExprString) in pkg/expression/expression.go:1132 and pkg/planner/core/expression_rewriter.go:3040.

Change request
Update the comment to match the real contract, for example that this stores generated-expression text for later parse-and-rewrite flows.

Comment thread pkg/executor/insert_common.go Outdated
)

const (
autoEmbeddingEvalBatchSizeHint = 16

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 [Minor] Auto-embedding insert helper names imply batching and singular datum handling that the code does not do

Why
The helper processes many generated-column cells across rows, yet it is named fillAutoEmbeddingDatum, and the concurrency limit is expressed as autoEmbeddingEvalBatchSizeHint * autoEmbeddingEvalMaxConcurrentBatches even though the implementation launches one goroutine per task rather than batching work.

Scope
pkg/executor/insert_common.go:57, pkg/executor/insert_common.go:634

Risk if unchanged
Tuning or extending this path becomes harder because maintainers have to reverse-engineer whether the knobs are about row batches, task concurrency, or generated-column fanout.

Evidence
The function signature is fillAutoEmbeddingDatum(ctx context.Context, rows [][]types.Datum), it builds a tasks slice over every auto-embedding generated column in every row, and line 701 applies the limit directly to errgroup task concurrency rather than to explicit batches.

Change request
Prefer names that match the behavior, such as fillAutoEmbeddingDatums or populateAutoEmbeddingGeneratedColumns, and rename the limit constants to reflect task concurrency unless real batching is introduced.

Comment thread pkg/domain/inference.go
@ChangRui-Ryan ChangRui-Ryan force-pushed the changrui_cse_embed-text branch from d3f1ba8 to 1d43e61 Compare June 13, 2026 08:59

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
pkg/inference/embedding/cohere/cohere_test.go (1)

123-145: ⚡ Quick win

Unused mock server in this test.

When GetAPIKey() returns an empty string, CreateEmbeddings returns early (before constructing the HTTP request) per the logic in cohere.go. The mock server at lines 127-132 is never called. Consider removing the server setup to clarify that no HTTP request is made in this case.

♻️ Simplified test without unused mock server
 func TestCohereEmbedder_NoAPIKey(t *testing.T) {
-	// Mock no API key response from real Cohere API
-	mockResponse := `{"id":"b6a8a658-261e-4fc2-b44d-0ee3fefa145e","message":"no api key supplied"}`
-
-	server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
-		w.Header().Set("Content-Type", "application/json")
-		w.WriteHeader(http.StatusUnauthorized)
-		w.Write([]byte(mockResponse))
-	}))
-	defer server.Close()
-
 	embedder := NewCohereEmbedder(EmbedderConfig{
 		GetAPIKey:  func() string { return "" },
-		GetBaseURL: func() string { return server.URL },
 	})
 
 	texts := []string{"hello world"}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/inference/embedding/cohere/cohere_test.go` around lines 123 - 145, The
test TestCohereEmbedder_NoAPIKey sets up an httptest.Server that is never used
because NewCohereEmbedder's CreateEmbeddings returns early when GetAPIKey() is
empty; remove the unused mock server setup (the httptest.NewServer handler and
its defer server.Close()) from the test to make it clear no HTTP call is made,
keeping the rest of the test that constructs the embedder via
NewCohereEmbedder(EmbedderConfig{GetAPIKey: func() string { return "" },
GetBaseURL: func() string { return server.URL },}) and asserts embeddings is nil
and an error containing "API key is not configured for cohere".
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/ddl/generated_column.go`:
- Around line 455-469: The checkGeneratedColForAutoEmbedding function currently
validates AST/versions but doesn't enforce the deployment-mode gate used at
runtime; call the same gate (the CheckEmbedTextAllowed function in
pkg/expression/builtin_inference.go) from checkGeneratedColForAutoEmbedding and
return its error when it disallows EMBED_TEXT, so CREATE/ALTER fails during DDL
on non-Starter deployments; place this call early (before accepting the column)
so the unsupported schema isn't persisted.

In `@pkg/executor/insert_common.go`:
- Around line 705-721: The loop that schedules auto-embedding tasks iterates all
entries in rows including nil sentinels from LOAD DATA, causing skipped records
to be re-evaluated; update the outer loop in the insert/common code (the for
rowIdx, row := range rows loop) to skip nil rows immediately (e.g., if row ==
nil { continue }) before computing gIdx or appending autoEmbeddingEvalTask so
that no tasks are enqueued for sentinel nil rows; keep the existing gIdx logic
and e.GenExprs usage unchanged.

---

Nitpick comments:
In `@pkg/inference/embedding/cohere/cohere_test.go`:
- Around line 123-145: The test TestCohereEmbedder_NoAPIKey sets up an
httptest.Server that is never used because NewCohereEmbedder's CreateEmbeddings
returns early when GetAPIKey() is empty; remove the unused mock server setup
(the httptest.NewServer handler and its defer server.Close()) from the test to
make it clear no HTTP call is made, keeping the rest of the test that constructs
the embedder via NewCohereEmbedder(EmbedderConfig{GetAPIKey: func() string {
return "" }, GetBaseURL: func() string { return server.URL },}) and asserts
embeddings is nil and an error containing "API key is not configured for
cohere".
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: a77e591f-3554-42bd-9b9c-c40eccc1d188

📥 Commits

Reviewing files that changed from the base of the PR and between d3f1ba8 and 1d43e61.

📒 Files selected for processing (80)
  • pkg/config/config.go
  • pkg/config/config.toml.nextgen.example
  • pkg/config/config_test.go
  • pkg/ddl/BUILD.bazel
  • pkg/ddl/add_column.go
  • pkg/ddl/create_table.go
  • pkg/ddl/generated_column.go
  • pkg/ddl/generated_column_internal_test.go
  • pkg/domain/BUILD.bazel
  • pkg/domain/domain.go
  • pkg/domain/inference.go
  • pkg/executor/insert_common.go
  • pkg/executor/load_data.go
  • pkg/expression/BUILD.bazel
  • pkg/expression/builtin.go
  • pkg/expression/builtin_inference.go
  • pkg/expression/builtin_inference_test.go
  • pkg/expression/builtin_threadunsafe_generated.go
  • pkg/expression/column.go
  • pkg/expression/expression.go
  • pkg/expression/function_traits.go
  • pkg/expression/function_traits_test.go
  • pkg/expression/generator/builtin_threadsafe.go
  • pkg/expression/inference_helper.go
  • pkg/expression/inference_helper_test.go
  • pkg/expression/integration_test/BUILD.bazel
  • pkg/expression/integration_test/integration_test.go
  • pkg/inference/BUILD.bazel
  • pkg/inference/domainadaptor/BUILD.bazel
  • pkg/inference/domainadaptor/adaptor.go
  • pkg/inference/embedding/base/BUILD.bazel
  • pkg/inference/embedding/base/base.go
  • pkg/inference/embedding/base/base_test.go
  • pkg/inference/embedding/batcher/BUILD.bazel
  • pkg/inference/embedding/batcher/batcher.go
  • pkg/inference/embedding/batcher/batcher_test.go
  • pkg/inference/embedding/cohere/BUILD.bazel
  • pkg/inference/embedding/cohere/cohere.go
  • pkg/inference/embedding/cohere/cohere_test.go
  • pkg/inference/embedding/cohere/protocol.go
  • pkg/inference/embedding/gemini/BUILD.bazel
  • pkg/inference/embedding/gemini/gemini.go
  • pkg/inference/embedding/gemini/gemini_test.go
  • pkg/inference/embedding/gemini/protocol.go
  • pkg/inference/embedding/huggingface/BUILD.bazel
  • pkg/inference/embedding/huggingface/huggingface.go
  • pkg/inference/embedding/huggingface/huggingface_test.go
  • pkg/inference/embedding/huggingface/protocol.go
  • pkg/inference/embedding/jina/BUILD.bazel
  • pkg/inference/embedding/jina/jina.go
  • pkg/inference/embedding/jina/jina_test.go
  • pkg/inference/embedding/jina/protocol.go
  • pkg/inference/embedding/mock/BUILD.bazel
  • pkg/inference/embedding/mock/mock.go
  • pkg/inference/embedding/mock/mock_test.go
  • pkg/inference/embedding/nvidia/BUILD.bazel
  • pkg/inference/embedding/nvidia/nvidia.go
  • pkg/inference/embedding/nvidia/nvidia_test.go
  • pkg/inference/embedding/nvidia/protocol.go
  • pkg/inference/embedding/openai/BUILD.bazel
  • pkg/inference/embedding/openai/openai.go
  • pkg/inference/embedding/openai/openai_test.go
  • pkg/inference/embedding/openai/protocol.go
  • pkg/inference/embedding/tidbcloud/BUILD.bazel
  • pkg/inference/embedding/tidbcloud/protocol.go
  • pkg/inference/embedding/tidbcloud/tidbcloud_free.go
  • pkg/inference/embedding/tidbcloud/tidbcloud_free_test.go
  • pkg/inference/manager_test.go
  • pkg/inference/sqlembed.go
  • pkg/parser/ast/functions.go
  • pkg/planner/core/BUILD.bazel
  • pkg/planner/core/expression_rewriter.go
  • pkg/planner/core/logical_plan_builder.go
  • pkg/planner/util/null_misc_test.go
  • pkg/sessionctx/vardef/tidb_vars.go
  • pkg/sessionctx/variable/BUILD.bazel
  • pkg/sessionctx/variable/embedding_vars.go
  • pkg/sessionctx/variable/embedding_vars_test.go
  • pkg/sessionctx/variable/sysvar.go
  • tests/integrationtest/r/executor/show.result
✅ Files skipped from review due to trivial changes (9)
  • pkg/inference/embedding/cohere/protocol.go
  • pkg/planner/util/null_misc_test.go
  • pkg/parser/ast/functions.go
  • tests/integrationtest/r/executor/show.result
  • pkg/inference/embedding/tidbcloud/protocol.go
  • pkg/inference/embedding/openai/BUILD.bazel
  • pkg/sessionctx/variable/BUILD.bazel
  • pkg/inference/embedding/huggingface/huggingface.go
  • pkg/inference/embedding/gemini/BUILD.bazel
🚧 Files skipped from review as they are similar to previous changes (63)
  • pkg/inference/embedding/huggingface/protocol.go
  • pkg/expression/builtin_threadunsafe_generated.go
  • pkg/planner/core/BUILD.bazel
  • pkg/config/config.toml.nextgen.example
  • pkg/domain/BUILD.bazel
  • pkg/inference/embedding/batcher/BUILD.bazel
  • pkg/inference/embedding/openai/protocol.go
  • pkg/inference/domainadaptor/BUILD.bazel
  • pkg/inference/embedding/jina/BUILD.bazel
  • pkg/inference/embedding/base/BUILD.bazel
  • pkg/expression/builtin.go
  • pkg/inference/embedding/nvidia/BUILD.bazel
  • pkg/inference/embedding/mock/mock.go
  • pkg/domain/inference.go
  • pkg/expression/function_traits_test.go
  • pkg/inference/embedding/huggingface/BUILD.bazel
  • pkg/inference/embedding/jina/protocol.go
  • pkg/expression/expression.go
  • pkg/inference/manager_test.go
  • pkg/inference/domainadaptor/adaptor.go
  • pkg/inference/embedding/gemini/protocol.go
  • pkg/inference/embedding/tidbcloud/BUILD.bazel
  • pkg/expression/BUILD.bazel
  • pkg/inference/BUILD.bazel
  • pkg/domain/domain.go
  • pkg/sessionctx/variable/embedding_vars_test.go
  • pkg/sessionctx/variable/embedding_vars.go
  • pkg/expression/function_traits.go
  • pkg/expression/generator/builtin_threadsafe.go
  • pkg/planner/core/logical_plan_builder.go
  • pkg/expression/integration_test/BUILD.bazel
  • pkg/inference/embedding/base/base_test.go
  • pkg/sessionctx/vardef/tidb_vars.go
  • pkg/ddl/add_column.go
  • pkg/inference/embedding/mock/BUILD.bazel
  • pkg/inference/embedding/nvidia/nvidia_test.go
  • pkg/sessionctx/variable/sysvar.go
  • pkg/inference/embedding/openai/openai.go
  • pkg/inference/embedding/tidbcloud/tidbcloud_free.go
  • pkg/expression/builtin_inference_test.go
  • pkg/inference/embedding/nvidia/protocol.go
  • pkg/inference/embedding/gemini/gemini.go
  • pkg/expression/column.go
  • pkg/inference/embedding/cohere/cohere.go
  • pkg/expression/inference_helper_test.go
  • pkg/inference/embedding/mock/mock_test.go
  • pkg/inference/embedding/tidbcloud/tidbcloud_free_test.go
  • pkg/inference/embedding/base/base.go
  • pkg/inference/embedding/jina/jina_test.go
  • pkg/inference/embedding/openai/openai_test.go
  • pkg/config/config.go
  • pkg/inference/embedding/huggingface/huggingface_test.go
  • pkg/expression/inference_helper.go
  • pkg/inference/embedding/batcher/batcher.go
  • pkg/expression/integration_test/integration_test.go
  • pkg/inference/embedding/cohere/BUILD.bazel
  • pkg/planner/core/expression_rewriter.go
  • pkg/inference/embedding/nvidia/nvidia.go
  • pkg/inference/embedding/gemini/gemini_test.go
  • pkg/inference/embedding/batcher/batcher_test.go
  • pkg/inference/sqlembed.go
  • pkg/config/config_test.go
  • pkg/inference/embedding/jina/jina.go

Comment thread pkg/ddl/generated_column.go
Comment thread pkg/executor/insert_common.go
@ChangRui-Ryan

Copy link
Copy Markdown
Contributor Author

/retest

@tiprow

tiprow Bot commented Jun 15, 2026

Copy link
Copy Markdown

@ChangRui-Ryan: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Comment thread pkg/ddl/generated_column.go Outdated
return checkAutoEmbeddingGeneratedColumnServerVersionsCompatible(versions)
}

func checkAutoEmbeddingGeneratedColumnServerVersionsCompatible(versions []string) error {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to check this as its the first version to introduce it in open source tidb? seems AI is over considering the compatibility issue

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's manager

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, the name was misleading. I renamed manager_test.go to sqlembed_test.go since these tests cover the SQL embedding helper / EmbedFn behavior rather than a manager component.

@ChangRui-Ryan

Copy link
Copy Markdown
Contributor Author

/retest

@tiprow

tiprow Bot commented Jun 16, 2026

Copy link
Copy Markdown

@ChangRui-Ryan: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ChangRui-Ryan

Copy link
Copy Markdown
Contributor Author

/retest

@tiprow

tiprow Bot commented Jun 16, 2026

Copy link
Copy Markdown

@ChangRui-Ryan: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Comment thread pkg/executor/insert_common.go Outdated
break
}
}
if !hasAutoEmbeddingExpr {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me that hasAutoEmbeddingExpr can be decided during compile time or open(), so there is no need to re-check it for every call of fillAutoEmbeddingDatumsWithRowCount

Comment thread pkg/executor/insert_common.go Outdated
continue
}
gIdx++
if !expression.IsAutoEmbedFnCallAST(gCol.GeneratedExpr.Internal()) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as above, it looks to me we can pre-handle these information so during this row-wise loop, we can just construct the autoEmbeddingEvalTask

Comment thread pkg/executor/insert_common.go Outdated
results[i].err = fmt.Errorf("auto-embedding generated column expects EMBED_TEXT()")
continue
}
embedArgs, isNull, err := expression.EvalEmbedTextArgs(e.Ctx().GetExprCtx().GetEvalCtx(), chunk.MutRowFromDatums(task.row).ToRow(), sf.GetArgs(), embedSig.IsFromVecSearch)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chunk.MutRowFromDatums is very low-efficient, I think we should implement some caching to optimize this logic

// NOTE: please make sure there are test cases for all functions here.
}
specialUnsafeFuncs = map[string]struct{}{
"BuiltinEmbedTextSig": {},

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the name is BuiltinEmbedTextSig, not builtinEmbedTextSig?

@ChangRui-Ryan

Copy link
Copy Markdown
Contributor Author

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note-none Denotes a PR that doesn't merit a release note. sig/planner SIG: Planner size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants