expression: add full SQL auto embedding support for starter mode by ChangRui-Ryan · Pull Request #69014 · pingcap/tidb

ChangRui-Ryan · 2026-06-08T07:35:02Z

What problem does this PR solve?

Issue Number: ref #67765

Problem Summary:
TiDB Cloud Starter needs SQL Auto Embedding support in the upstream TiDB kernel so that Starter can use an upstream-built TiDB binary while still providing the documented auto-embedding experience.

Before this PR, upstream TiDB did not have a complete SQL-side embedding path. Users could not call EMBED_TEXT() from SQL to obtain vector embeddings, could not define stored generated vector columns backed by embedding models, and could not use text-query vector distance helper functions such as VEC_EMBED_L2_DISTANCE() to automatically embed query text during vector search.

The feature also needs to be deployment-aware. Auto Embedding is intended to be available only in Starter deployment mode. Non-Starter deployments should not accidentally invoke external embedding services or hosted TiDB Cloud embedding infrastructure through EMBED_TEXT().

What changed and how does it work?

This PR adds full SQL Auto Embedding support to the upstream TiDB kernel.

At the SQL layer, it introduces the EMBED_TEXT(model, text[, options]) builtin, which returns a VECTOR<FLOAT32> embedding. The function supports provider-qualified model names and JSON options, and it is treated as a non-foldable, mutable-effect function because it may call external inference services and depends on runtime configuration.

At the inference layer, this PR adds a shared embedding framework under pkg/inference, including provider implementations for OpenAI-compatible endpoints, Jina AI, Cohere, Gemini, Hugging Face, NVIDIA, and TiDB Cloud hosted embedding. It also adds batching, request cancellation, per-session/global configuration integration, API key masking, endpoint validation, and test mock providers.

For auto-embedding generated columns, this PR allows EMBED_TEXT() to be used as the top-level expression of a stored generated vector column. The DDL validation enforces the supported shape: the generated column must be stored, the model/options arguments must be constants, and dependent generated columns on auto-embedding columns are rejected. During DML, TiDB evaluates the embedding expression and materializes the generated vector value on insert/update.

For vector search, this PR adds planner rewrite support for VEC_EMBED_*_DISTANCE() helper functions. When the vector column is backed by an auto-embedding generated column, TiDB rewrites text-query distance functions into the corresponding vector distance function with an EMBED_TEXT() call on the query text. This lets users query by text without manually embedding the query first.

The embedding function is managed through the TiDB domain so all sessions can share the same embedding manager and provider configuration. A small domain adaptor is added to avoid package dependency cycles between expression evaluation and domain initialization.

Finally, this PR gates runtime use of EMBED_TEXT() by deployment mode. EMBED_TEXT() now checks deploymode.IsStarter() before evaluating arguments or calling any provider. If the current TiDB instance is not running in Starter deployment mode, the function returns an error instead of invoking an embedding provider.

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)Local E2E validation
No need to test
- I checked and no code files have been changed.

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Local E2E validation

I also verified the SQL auto-embedding path with a local TiUP playground cluster.

Environment

Built the TiDB server binary from this PR branch.
Started a local starter-mode / next-gen TiUP playground cluster with 1 TiDB, 1 PD, 1 TiKV/CSE node, and TiFlash compute enabled.
Used a real OpenAI-compatible Aliyun DashScope embedding endpoint with a real API key. The API key is intentionally omitted from the PR.

Validated SQL flow

Configured the OpenAI-compatible embedding endpoint and API key:

SET GLOBAL tidb_embedding_api_base_url = 'https://dashscope.aliyuncs.com/compatible-mode/v1';
SET GLOBAL tidb_embedding_api_key = '<redacted>';

Verified direct EMBED_TEXT() execution against the real remote endpoint:

SELECT VEC_DIMS(EMBED_TEXT('openai/text-embedding-v4', 'hello world'));
SELECT VEC_AS_TEXT(EMBED_TEXT('openai/text-embedding-v4', 'hello world'));

The query successfully returned a vector embedding from the real Aliyun endpoint, confirming that the SQL function can call an external OpenAI-compatible provider end-to-end.

Verified auto-embedding generated column behavior:

CREATE TABLE t_auto_embed (
  id BIGINT PRIMARY KEY,
  text_content TEXT,
  embedding VECTOR GENERATED ALWAYS AS
    (EMBED_TEXT('openai/text-embedding-v4', text_content)) STORED
);

INSERT INTO t_auto_embed VALUES
  (1, 'hello world'),
  (2, 'TiDB vector search');

SELECT id, VEC_DIMS(embedding) FROM t_auto_embed ORDER BY id;

The generated vector column was populated during write, confirming that the auto-embedding path works for stored generated columns.

Verified vector search helper rewrite:

SELECT id
FROM t_auto_embed
ORDER BY VEC_EMBED_COSINE_DISTANCE(embedding, 'hello world')
LIMIT 1;

The query executed successfully, confirming that VEC_EMBED_*_DISTANCE() can reuse the EMBED_TEXT() metadata from the generated vector column and rewrite to the corresponding vector distance function.

Verified deployment-mode gating:

SELECT EMBED_TEXT('openai/text-embedding-v4', 'hello world');

In a non-starter/classic local cluster, this is rejected with the expected deployment-mode error. In starter mode, the same query works when the embedding endpoint and API key are configured.

Result

Direct EMBED_TEXT() works against a real Aliyun DashScope OpenAI-compatible endpoint.
Stored generated columns using EMBED_TEXT() are populated successfully on insert.
VEC_EMBED_*_DISTANCE() works with auto-embedding generated vector columns.
The feature is correctly gated to starter deployment mode.

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Summary by CodeRabbit

New Features
- Added EMBED_TEXT() function for generating vector embeddings from text.
- Added vector embedding distance functions (VEC_EMBED_L1_DISTANCE, VEC_EMBED_L2_DISTANCE, VEC_EMBED_COSINE_DISTANCE, VEC_EMBED_NEGATIVE_INNER_PRODUCT) for similarity search.
- Added support for auto-embedding generated vector columns in tables.
- Integrated multiple embedding providers: OpenAI, Cohere, Gemini, HuggingFace, Jina, NVIDIA, and TiDB Cloud.
- Added embedding support in LOAD DATA operations.
- Hosted embedding configuration available in starter deployment mode.

pantheon-ai · 2026-06-08T07:35:08Z

@ChangRui-Ryan I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details.

⏳ This process typically takes 10-30 minutes depending on the complexity of the changes.

_{ℹ️ Learn more details on Pantheon AI.}

tiprow · 2026-06-08T07:35:22Z

Hi @ChangRui-Ryan. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

coderabbitai · 2026-06-08T07:35:26Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds hosted embedding configuration, embedding provider/runtime wiring, EMBED_TEXT evaluation, planner and DDL support for auto-embedding, executor population paths, and updated tests/build metadata.

Changes

Embedding capability rollout

Layer / File(s)	Summary
Hosted embedding config gate `pkg/config/config.go`, `pkg/config/config.toml.nextgen.example`, `pkg/config/config_test.go`	Adds the `hosted-embedding` config section, default values, deploy-mode validation, and example/test coverage.
Inference runtime and sysvars `pkg/domain/...`, `pkg/inference/...`, `pkg/sessionctx/vardef/tidb_vars.go`, `pkg/sessionctx/variable/...`	Wires domain-owned embed functions, provider discovery, embedding sysvars, and OpenAI base-URL normalization/masking.
EMBED_TEXT builtin and AST helpers `pkg/expression/...`, `pkg/parser/ast/functions.go`	Registers `EMBED_TEXT`, adds AST extraction helpers, builtin traits, generated-column metadata propagation, and tests.
Planner rewrite and generated columns `pkg/ddl/...`, `pkg/planner/core/...`	Adds generated-column validation for auto-embedding and rewrites vector-distance calls using stored generated expression metadata.
Executor auto-embedding `pkg/executor/...`, `pkg/expression/integration_test/...`, `pkg/ddl/generated_column_internal_test.go`	Populates auto-embedding generated columns during insert/LOAD DATA and adds integration/unit coverage.
Provider implementations and tests `pkg/inference/embedding/...`, `pkg/inference/manager_test.go`, `pkg/inference/sqlembed.go`	Adds concrete embedding providers, batching, mock support, and runtime tests across all backends.
Build wiring and expected outputs `pkg/*/BUILD.bazel`, `pkg/planner/util/null_misc_test.go`, `tests/integrationtest/r/executor/show.result`	Updates build targets, snapshot hash, and expected builtin listings.

Sequence Diagram(s)

sequenceDiagram
  participant SQL as EMBED_TEXT / vec_embed_*_distance
  participant Expression as expression.rewriter
  participant Domain as Domain.GetEmbedFn
  participant EmbedFn as inference.EmbedFn
  participant Provider as embedding backend
  SQL->>Expression: build or rewrite expression
  Expression->>Domain: resolve embed function
  Domain->>EmbedFn: return managed embed function
  EmbedFn->>Provider: CreateEmbeddings(model,texts,opts)
  Provider-->>EmbedFn: embeddings or error
  EmbedFn-->>Expression: vector datum
  Expression-->>SQL: rewritten or evaluated result

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~90+ minutes

Suggested labels

ok-to-test, lgtm, release-note

Suggested reviewers

yudongusa
tiancaiamao
D3Hunter
YuJuncen

Poem

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 19

🧹 Nitpick comments (12)

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/executor/insert_common.go`:
- Around line 688-699: The code captures a single shared evalCtx (evalCtx :=
e.Ctx().GetExprCtx().GetEvalCtx()) and reuses it from every goroutine, which can
race statement-scoped state when calling task.expr.Eval; instead create a
distinct EvalCtx per worker before calling task.expr.Eval (e.g., call
e.Ctx().GetExprCtx().GetEvalCtx() or use a provided Clone/New method inside the
eg.Go closure) so each goroutine has its own evalCtx; update the goroutine body
around task.expr.Eval and chunk.MutRowFromDatums(task.row).ToRow() to use that
per-goroutine evalCtx and ensure any cleanup if the EvalCtx type requires it.
- Around line 647-650: The code snapshots a single uint64 rowCntInLoadData from
e.rowCount for the whole batch when
e.Ctx().GetSessionVars().StmtCtx.InLoadDataStmt is true, then passes that same
value into HandleBadNull for every row causing LOAD DATA errors to be reported
against the batch's last row; change the logic to use the per-row LOAD DATA
index instead of the single snapshot: inside the per-row loop that calls
HandleBadNull, compute the correct per-row LOAD DATA index (using the per-row
counter/variable used to iterate the batch or the LOAD DATA line index) and pass
that per-row value to HandleBadNull rather than rowCntInLoadData; update any use
sites referencing rowCntInLoadData (e.g., where HandleBadNull is invoked) to use
the per-row index so failures are reported against the failing row.

In `@pkg/expression/builtin_inference.go`:
- Around line 1-13: Update the license header year in
pkg/expression/builtin_inference.go to match the corresponding test file
(builtin_inference_test.go) so the copyright years are consistent across related
files; open builtin_inference.go, locate the top-of-file license comment block
and change the "Copyright 2025" year to the same year used in
builtin_inference_test.go (2026) or vice versa if you prefer to standardize on
2025—ensure both files have the identical year in their header comment.

In `@pkg/expression/inference_helper.go`:
- Around line 83-85: ExtractAutoEmbedInfoFromAST currently only rejects fewer
than 2 args but silently accepts extra args; change the argument validation to
explicitly allow only the supported arities (e.g., exactly 2 or exactly 3
arguments) and return an error for any other arity. Update the len(fnCall.Args)
check(s) that currently read "if len(fnCall.Args) < 2" to instead validate
membership in the allowed set (len == 2 || len == 3) and return
fmt.Errorf("invalid EMBED_TEXT() usage: unexpected arity") when not allowed;
apply this same change to the second arg-count validation block in the file that
also inspects fnCall.Args (the later branch inside ExtractAutoEmbedInfoFromAST).

In `@pkg/expression/integration_test/integration_test.go`:
- Around line 4507-4513: The test mutates the global sysvar
tidb_exp_embed_jina_ai_api_key and currently only resets it at the end of the
happy path, risking leakage if an assertion fails; register an unconditional
cleanup before the first mutation (e.g., call t.Cleanup(func(){ tk.MustExec("set
@@global.tidb_exp_embed_jina_ai_api_key = ''") })) so the global is always
reset, and then proceed with the existing tk.MustExec/tk.MustQuery assertions
that set and check tidb_exp_embed_jina_ai_api_key.

In `@pkg/inference/embedding/cohere/cohere.go`:
- Around line 52-56: NewCohereEmbedder constructs an http.Client without a
timeout causing potential indefinite hangs; update NewCohereEmbedder to
initialize client: http.Client{Timeout: time.Second * <reasonable_value>}
(choose e.g. 10s or configurable), and add the "time" import; ensure the
Embedder struct's client field still uses that configured http.Client so all
Cohere requests get the client-level timeout as defense-in-depth.

In `@pkg/inference/embedding/gemini/gemini.go`:
- Around line 98-100: The model string is interpolated directly into the path
when building fullURL, which can break URLs or allow unintended endpoints;
before the fmt.Sprintf call that sets fullURL, validate that model is non-empty
and does not contain path traversal or illegal chars (e.g., reject segments with
"/" or "..") or alternatively escape it with a path-encoding function (e.g.,
url.PathEscape) and use the escaped value (escapedModel) in place of model;
update the code around variables fullURL, model and baseURL to perform this
validation/escaping and return an error if validation fails.
- Around line 51-56: The NewGeminiEmbedder function constructs an Embedder with
an http.Client that lacks a timeout; update NewGeminiEmbedder to set a sensible
client timeout (e.g., via http.Client{Timeout: time.Second * <N>}) so requests
cannot hang indefinitely, and add the "time" import; modify the Embedder
construction (function NewGeminiEmbedder, type Embedder and its client field) to
use the configured timeout value.

In `@pkg/inference/embedding/huggingface/protocol.go`:
- Around line 22-24: The comment above the Response type is inconsistent with
the actual type; update the docstring for Response (type Response [][]float32 in
protocol.go) to state that HuggingFace returns a 2D array of float32 values, or
if you confirm the API returns float64 instead, change the type to [][]float64
and update all usages accordingly; ensure the comment and the type (Response)
match exactly.

In `@pkg/inference/embedding/jina/BUILD.bazel`:
- Around line 23-24: The BUILD target currently sets flaky = True and
shard_count = 6; remove these patch-work flags and fix the underlying test
instability instead: locate the failing tests under the
pkg/inference/embedding/jina package, reproduce the flakes, and address root
causes such as race conditions, timing-dependent assertions, shared state, or
unmocked external dependencies by adding proper synchronization, deterministic
fixtures, explicit mocks, or adjusting timeouts; once tests are deterministic,
delete flaky = True and shard_count = 6 from the BUILD.bazel target so CI runs a
single stable shard.

In `@pkg/inference/embedding/mock/BUILD.bazel`:
- Around line 16-17: The BUILD.bazel target in pkg/inference/embedding/mock
currently sets flaky = True and shard_count = 3—remove those attributes and make
the tests deterministic instead of masking failures; locate the mock test target
in BUILD.bazel and (1) remove flaky and shard_count, (2) audit tests in
pkg/inference/embedding/mock for shared global state, non-deterministic
timing/race conditions, or reliance on test ordering, and (3) fix by adding
proper setup/teardown (reset globals, use fresh temp dirs, inject deterministic
mocks/fakes, or serialize accesses with locks) so the tests pass reliably in
parallel without sharding. Ensure the target no longer needs flaky/shard_count
after these fixes and run the test suite to confirm stability.

In `@pkg/inference/embedding/mock/mock.go`:
- Around line 72-84: The delay handling sleeps unconditionally and ignores the
provided context, causing hangs if the context is cancelled; modify the block
that parses opts["delay"] so that instead of time.Sleep(dur) you wait with a
select on time.After(dur) and ctx.Done(), returning early with ctx.Err() when
the context is cancelled. Keep the existing parsing and error messages (the opts
map and delay parsing code) and only replace the unconditional sleep with a
context-aware select that returns (nil, ctx.Err()) on cancellation.

In `@pkg/inference/embedding/nvidia/nvidia.go`:
- Around line 147-161: The current loop assigns embeddings by the sorted
position (row) which can misplace vectors if indices are duplicated or
non-sequential; instead validate that respObj.Data contains exactly one entry
for each index 0..len(texts)-1 and then assign using the actual Index.
Specifically, after sort.Slice(respObj.Data, ...) iterate respObj.Data and for
each item check that 0 <= item.Index < len(texts) and that
embeddings[item.Index] is nil (to detect duplicates); on any invalid or
duplicate index return an error indicating the problematic item.Index; otherwise
decode with base.DecodeFloat32ArrayBytes and set embeddings[item.Index] = e (not
embeddings[row]).
- Around line 54-56: The HTTP client for Embedder is created with zero timeout
(http.Client{}), which can cause indefinite hangs; set a sensible Timeout on the
client (e.g., http.Client{Timeout: ...}) and make that timeout configurable via
EmbedderConfig (add a Timeout field and use it when constructing the Embedder's
client); update the constructor that returns &Embedder{client: http.Client{},
cfg: cfg, ...} to initialize client with the configured Timeout and provide a
reasonable default if cfg.Timeout is unset.

In `@pkg/inference/embedding/openai/openai_test.go`:
- Line 28: Replace the misleading comment "Mock successful response from real
Jina API" in openai_test.go with a correct description for the OpenAI embedder
(e.g., "Mock successful response from real OpenAI API" or "Mock successful
response for OpenAI embedder") so the comment accurately matches the test;
locate the comment by searching for that exact string in the test file and
update it in the test function that sets up the mocked OpenAI response.

In `@pkg/inference/embedding/openai/openai.go`:
- Around line 142-157: After sorting respObj.Data by Index, validate that each
item's Index is within [0, len(texts)-1] and that indices form a unique sequence
(no duplicates) before assigning into embeddings; for each item in respObj.Data
call base.DecodeFloat32ArrayBytes(item.Embedding) as before but place the
decoded vector into embeddings[item.Index] (not embeddings[row]), and return a
clear error if any item.Index is out of range or a duplicate/missing so we never
misassign embeddings; reference respObj.Data, item.Index, embeddings and
base.DecodeFloat32ArrayBytes to locate and implement the checks and assignment
change.
- Around line 54-58: The http.Client in NewOpenAIEmbedder is created with
default settings (no timeout); update NewOpenAIEmbedder to set a sensible
timeout on Embedder.client by using a configured value from EmbedderConfig
(e.g., add a Timeout field to EmbedderConfig) and fall back to a reasonable
default if not provided; ensure the Embedder struct uses http.Client{Timeout:
cfg.Timeout} (or default) so embedding requests cannot hang indefinitely.

In `@pkg/planner/core/expression_rewriter.go`:
- Around line 3038-3042: The error wrapping currently replaces the original
error with a new one via fmt.Errorf in the block that calls
expression.ExtractAutoEmbedInfoFromAST(astExpr), losing the original
stack/context; change the assignment to preserve the original error by using
errors.Annotatef (or the project's equivalent) to annotate err and assign that
to er.err (refer to expression.ExtractAutoEmbedInfoFromAST, er.err, and
fmt.Errorf) so the original error context is preserved while adding the "column
%s cannot be used in %s" message.
- Around line 3057-3060: The inline initializer fnInit currently does an
unchecked type assertion to *expression.BuiltinEmbedTextSig which can panic;
update fnInit (used by er.newFunctionWithInit) to perform a safe type assertion
(e.g., using the "v, ok := sf.Function.(*expression.BuiltinEmbedTextSig)"
pattern or a type switch), set v.IsFromVecSearch = true only when ok, and return
a descriptive error (instead of panicking) when the function is not the expected
BuiltinEmbedTextSig type so callers can handle the failure.

---

Nitpick comments:
In `@pkg/inference/embedding/base/BUILD.bazel`:
- Line 15: The test target "base_test" in the BUILD rule is marked flaky = True
even though base_test only runs deterministic base64 decode/validation; remove
the flaky = True attribute (or set flaky = False) from the base_test rule so
failures surface for investigation, then run the test suite locally to confirm
no flakiness before committing.

In `@pkg/inference/embedding/batcher/batcher.go`:
- Around line 244-260: Add a clear comment above the reqCtx/cancelReq creation
and goroutine explaining that reqCtx only cancels after all individual request
contexts in thisBatch.calls are done (the goroutine loops over call.ctx and only
calls cancelReq after every call is done), so batching can cause a request with
a short timeout to wait longer (up to batchWindow) before its batch-level
context cancels; state this is intentional for batching and note the trade-off
for maintainers and any expectations around embedder.CreateEmbeddings using
reqCtx.

In `@pkg/inference/embedding/gemini/BUILD.bazel`:
- Around line 18-26: The test target "gemini_test" in the BUILD.bazel currently
sets flaky = True which masks nondeterministic failures; investigate and fix the
root causes in gemini_test.go (race conditions, timing, shared state, or
reliance on external services) by reproducing failures locally (run the
"gemini_test" target without flaky and with shard_count reduced), add
deterministic seeds or explicit synchronization in the test code, increase the
timeout only if legitimately slow, and then remove the flaky = True attribute
from the "gemini_test" rule once the tests are stable.

In `@pkg/inference/embedding/huggingface/BUILD.bazel`:
- Line 23: The BUILD.bazel target currently sets flaky = True, which masks
nondeterministic test failures; remove or revert the flaky = True attribute on
the target and instead reproduce the intermittent failures by running the
corresponding Bazel test target(s) repeatedly (e.g., bazel test
--runs_per_test=50), add deterministic fixes (stabilize timers, seed randomness,
isolate shared state, add proper setup/teardown) in the test code, and only
reintroduce flaky metadata after root causes are identified; look for references
to the BUILD target that contains flaky = True to locate the tests to run and
stabilize.
- Around line 18-26: The shard_count for the Huggingface test target
"huggingface_test" is set to 10 which is inconsistent with the Gemini provider
tests (which use shard_count = 7); update the "huggingface_test" target in
BUILD.bazel to use the same shard_count value as other provider tests (e.g.,
change shard_count to 7) or add a brief comment in the BUILD file next to the
"huggingface_test" rule explaining why it must remain different so the
discrepancy is documented and intentional.

In `@pkg/inference/embedding/huggingface/huggingface.go`:
- Around line 121-124: The error log at the HuggingFace request failure uses the
raw response body (zap.String("body", string(body))) which may contain echoed
API keys or authorization data; update the failure handling in the function that
makes the request (the code that calls resp and builds the log with
logutil.BgLogger()) to avoid logging sensitive contents by either removing the
full body from the log or sanitizing it first (e.g., detect and redact common
keys like "authorization", "api_key", "token" or apply a safe snippet/length
limit and replace sensitive substrings with "[REDACTED]"), and keep the existing
metadata (resp.StatusCode) in the log. Ensure you change the zap.String("body",
...) usage to use the sanitized string instead of the raw body.
- Line 54: The http.Client is created with zero timeout (client: http.Client{})
which can hang; update the HuggingFace embedder initialization to set a sensible
Timeout (e.g., 15–30s) on the http.Client instance used by the embedder, or wire
a timeout value from EmbedderConfig into the client construction so the Timeout
field is configurable; change the client creation referenced by the client:
http.Client{} initializer in the huggingface embedder to use
http.Client{Timeout: ...} (or build the client with a config-supplied duration).

In `@pkg/inference/embedding/jina/jina_test.go`:
- Around line 1-227: Add unit tests to cover missing API key and custom error
handling for the Jina embedder: create tests similar to HuggingFace ones that
exercise NewJinaEmbedder and CreateEmbeddings for (1) no API key supplied (e.g.,
TestJinaEmbedder_NoAPIKey) verifying it returns an error and nil embeddings, (2)
custom unauthorized response handling (e.g.,
TestJinaEmbedder_CustomUnauthorizedError) mocking a 401 response and asserting
the error contains expected message, and (3) custom error payload handling
(e.g., TestJinaEmbedder_CustomErrorHandling) mocking non-2xx responses with body
and asserting CreateEmbeddings surfaces the body details; place these tests
alongside existing TestJinaEmbedder_* functions and reuse httptest.NewServer,
EmbedderConfig GetAPIKey/GetBaseURL, and assertions on returned
embeddings/errors to mirror the HuggingFace tests.

In `@pkg/inference/embedding/jina/jina.go`:
- Line 55: The http.Client is created with no timeout (client: http.Client{}),
risking indefinite hangs; update the embedder initialization (where the client
is constructed in jina.go) to set a sensible Timeout on the http.Client (e.g.,
from a new EmbedderConfig field) and wire that timeout through the
constructor/path that creates the client so the timeout becomes configurable via
EmbedderConfig and used when initializing the client.
- Around line 116-119: The error log in pkg/inference/embedding/jina/jina.go
currently logs the full response body in the logutil.BgLogger().Error call (the
line calling logutil.BgLogger().Error with zap.String("body", string(body)))
which could leak echoed API keys; before calling that logger (or replace that
call), sanitize the response body by parsing/masking any sensitive fields (e.g.,
"authorization", "api_key", "token", "access_token", "credentials") or redact
Authorization-like patterns (e.g., Bearer <token>) and then log the sanitized
string; update the code around the resp handling in the same function to produce
a safeBody string and pass zap.String("body", safeBody) to the logger.

In `@pkg/inference/embedding/nvidia/protocol.go`:
- Around line 17-22: The Request struct (type Request) in protocol.go is dead
code because the Nvidia embeddings request body is constructed directly via a
map passed to base.MarshalJSONWithOptions in nvidia.go; either remove the unused
Request type or explicitly use it when marshalling (replace the map literal with
an instance of Request and marshal that), or if you keep the type for
future/documentation purposes add a clear comment above type Request explaining
why it is intentionally unused today and referencing the code path in nvidia.go
that builds the payload via base.MarshalJSONWithOptions.

In `@pkg/planner/core/expression_rewriter.go`:
- Around line 3027-3031: The error message for the vector argument in
expression_rewriter.go can be empty because vecArg.OrigName may be empty; update
the check around the vecArg handling in the block that sets er.err (where vecArg
is derived from args[0] and v.FnName.O is used) to build a robust name: if
vecArg.OrigName is empty, use vecArg.String() or a constant placeholder like
"<unknown column>" when formatting the error text so the message is always
informative and never blank.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 6b57fe1c-2b9b-4cb6-9768-bcc6e68c1971

📥 Commits

Reviewing files that changed from the base of the PR and between b399d61 and b98639b.

📒 Files selected for processing (75)

pkg/config/config.go
pkg/config/config.toml.example
pkg/config/config.toml.nextgen.example
pkg/ddl/add_column.go
pkg/ddl/create_table.go
pkg/ddl/generated_column.go
pkg/domain/BUILD.bazel
pkg/domain/domain.go
pkg/domain/inference.go
pkg/executor/insert_common.go
pkg/expression/BUILD.bazel
pkg/expression/builtin.go
pkg/expression/builtin_inference.go
pkg/expression/builtin_inference_test.go
pkg/expression/builtin_threadunsafe_generated.go
pkg/expression/column.go
pkg/expression/expression.go
pkg/expression/function_traits.go
pkg/expression/function_traits_test.go
pkg/expression/generator/builtin_threadsafe.go
pkg/expression/inference_helper.go
pkg/expression/inference_helper_test.go
pkg/expression/integration_test/BUILD.bazel
pkg/expression/integration_test/integration_test.go
pkg/inference/BUILD.bazel
pkg/inference/domainadaptor/BUILD.bazel
pkg/inference/domainadaptor/adaptor.go
pkg/inference/embedding/base/BUILD.bazel
pkg/inference/embedding/base/base.go
pkg/inference/embedding/base/base_test.go
pkg/inference/embedding/batcher/BUILD.bazel
pkg/inference/embedding/batcher/batcher.go
pkg/inference/embedding/batcher/batcher_test.go
pkg/inference/embedding/cohere/BUILD.bazel
pkg/inference/embedding/cohere/cohere.go
pkg/inference/embedding/cohere/cohere_test.go
pkg/inference/embedding/cohere/protocol.go
pkg/inference/embedding/gemini/BUILD.bazel
pkg/inference/embedding/gemini/gemini.go
pkg/inference/embedding/gemini/gemini_test.go
pkg/inference/embedding/gemini/protocol.go
pkg/inference/embedding/huggingface/BUILD.bazel
pkg/inference/embedding/huggingface/huggingface.go
pkg/inference/embedding/huggingface/huggingface_test.go
pkg/inference/embedding/huggingface/protocol.go
pkg/inference/embedding/jina/BUILD.bazel
pkg/inference/embedding/jina/jina.go
pkg/inference/embedding/jina/jina_test.go
pkg/inference/embedding/jina/protocol.go
pkg/inference/embedding/mock/BUILD.bazel
pkg/inference/embedding/mock/mock.go
pkg/inference/embedding/mock/mock_test.go
pkg/inference/embedding/nvidia/BUILD.bazel
pkg/inference/embedding/nvidia/nvidia.go
pkg/inference/embedding/nvidia/nvidia_test.go
pkg/inference/embedding/nvidia/protocol.go
pkg/inference/embedding/openai/BUILD.bazel
pkg/inference/embedding/openai/openai.go
pkg/inference/embedding/openai/openai_test.go
pkg/inference/embedding/openai/protocol.go
pkg/inference/embedding/tidbcloud/BUILD.bazel
pkg/inference/embedding/tidbcloud/protocol.go
pkg/inference/embedding/tidbcloud/tidbcloud_free.go
pkg/inference/embedding/tidbcloud/tidbcloud_free_test.go
pkg/inference/manager_test.go
pkg/inference/sqlembed.go
pkg/parser/ast/functions.go
pkg/planner/core/BUILD.bazel
pkg/planner/core/expression_rewriter.go
pkg/planner/core/logical_plan_builder.go
pkg/sessionctx/vardef/tidb_vars.go
pkg/sessionctx/variable/BUILD.bazel
pkg/sessionctx/variable/embedding_vars.go
pkg/sessionctx/variable/embedding_vars_test.go
pkg/sessionctx/variable/sysvar.go

codecov · 2026-06-08T07:53:01Z

Codecov Report

❌ Patch coverage is 13.26841% with 1425 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.0455%. Comparing base (74e231b) to head (78aba2b).
⚠️ Report is 36 commits behind head on master.

Additional details and impacted files

@@               Coverage Diff                @@
##             master     #69014        +/-   ##
================================================
- Coverage   76.3268%   76.0455%   -0.2814%     
================================================
  Files          2041       2078        +37     
  Lines        562499     583766     +21267     
================================================
+ Hits         429338     443928     +14590     
- Misses       132250     137130      +4880     
- Partials        911       2708      +1797

Flag	Coverage Δ
integration	`45.9389% <13.2684%> (+6.2191%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
dumpling	`60.4610% <ø> (ø)`
parser	`∅ <ø> (∅)`
br	`63.9120% <ø> (+1.1221%)`	⬆️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

🧹 Nitpick comments (1)

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@pkg/expression/builtin_inference.go`:
- Around line 147-158: Add a short clarifying comment above the block that
collects and processes keys with the "`@search`" suffix explaining that entries
like "key@search" are treated as search-time-only options: if isFromVecSearch is
true they are promoted to "key" (opts[strings.TrimSuffix(k, "`@search`")] =
opts[k]) to be honored when the call was rewritten from a VEC_EMBED_*_DISTANCE
path, otherwise they are removed (delete(opts, k)) so direct EMBED_TEXT() calls
ignore them; reference the variables searchOpts, opts and isFromVecSearch in the
comment so future readers understand the intent of this rename-or-drop behavior.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 32cbf36e-e183-4934-addb-0aaa6faec878

📥 Commits

Reviewing files that changed from the base of the PR and between b98639b and c3c302f.

📒 Files selected for processing (28)

pkg/executor/BUILD.bazel
pkg/executor/insert_common.go
pkg/expression/builtin_inference.go
pkg/expression/inference_helper.go
pkg/expression/inference_helper_test.go
pkg/expression/integration_test/integration_test.go
pkg/inference/embedding/base/BUILD.bazel
pkg/inference/embedding/base/base.go
pkg/inference/embedding/base/base_test.go
pkg/inference/embedding/cohere/cohere.go
pkg/inference/embedding/gemini/BUILD.bazel
pkg/inference/embedding/gemini/gemini.go
pkg/inference/embedding/gemini/gemini_test.go
pkg/inference/embedding/huggingface/huggingface.go
pkg/inference/embedding/huggingface/protocol.go
pkg/inference/embedding/jina/BUILD.bazel
pkg/inference/embedding/jina/jina.go
pkg/inference/embedding/jina/jina_test.go
pkg/inference/embedding/mock/mock.go
pkg/inference/embedding/mock/mock_test.go
pkg/inference/embedding/nvidia/BUILD.bazel
pkg/inference/embedding/nvidia/nvidia.go
pkg/inference/embedding/nvidia/nvidia_test.go
pkg/inference/embedding/openai/BUILD.bazel
pkg/inference/embedding/openai/openai.go
pkg/inference/embedding/openai/openai_test.go
pkg/inference/embedding/tidbcloud/tidbcloud_free.go
pkg/planner/core/expression_rewriter.go

🚧 Files skipped from review as they are similar to previous changes (20)

pkg/inference/embedding/base/BUILD.bazel
pkg/inference/embedding/openai/BUILD.bazel
pkg/inference/embedding/huggingface/protocol.go
pkg/inference/embedding/mock/mock.go
pkg/inference/embedding/jina/BUILD.bazel
pkg/inference/embedding/mock/mock_test.go
pkg/inference/embedding/cohere/cohere.go
pkg/inference/embedding/nvidia/nvidia.go
pkg/inference/embedding/openai/openai.go
pkg/inference/embedding/gemini/BUILD.bazel
pkg/inference/embedding/jina/jina_test.go
pkg/inference/embedding/nvidia/BUILD.bazel
pkg/inference/embedding/openai/openai_test.go
pkg/inference/embedding/huggingface/huggingface.go
pkg/planner/core/expression_rewriter.go
pkg/expression/inference_helper_test.go
pkg/inference/embedding/tidbcloud/tidbcloud_free.go
pkg/inference/embedding/gemini/gemini_test.go
pkg/expression/integration_test/integration_test.go
pkg/executor/insert_common.go

ChangRui-Ryan · 2026-06-09T02:48:36Z

/retest

tiprow · 2026-06-09T02:49:00Z

@ChangRui-Ryan: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

D3Hunter · 2026-06-09T10:16:41Z

+type HostedEmbedding struct {
+	// Enabled indicates whether the hosted embedding service is enabled.
+	// It only takes effect for nextgen kernels.
+	Enabled bool `toml:"enabled" json:"enabled"`


please add omitempty

D3Hunter · 2026-06-09T10:19:06Z

+[hosted-embedding]
+# Enables the TiDB Cloud hosted embedding provider tidbcloud_free.
+# Effective only for nextgen kernels.
+enabled = false
+api-endpoint = ""
+api-key-path = ""
+


remove it, not supported in classic kernel

ti-chi-bot · 2026-06-10T06:41:53Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign d3hunter, gengliqi, terry1purcell for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ChangRui-Ryan · 2026-06-10T07:13:37Z

/retest

tiprow · 2026-06-10T07:14:04Z

@ChangRui-Ryan: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

ChangRui-Ryan · 2026-06-10T08:01:57Z

/retest

tiprow · 2026-06-10T08:02:27Z

@ChangRui-Ryan: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

ChangRui-Ryan · 2026-06-10T08:30:53Z

/retest

tiprow · 2026-06-10T08:31:19Z

@ChangRui-Ryan: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

D3Hunter

Summary

Total findings: 11
Inline comments: 10
Summary-only findings (no inline anchor): 1

Findings (highest risk first)

🚨 [Blocker] (1)

Auto-embedding table metadata is introduced without an N-1 compatibility gate (pkg/ddl/create_table.go:558, pkg/executor/insert_common.go:634, pkg/expression/expression.go:1114, pkg/ddl/generated_column.go:448)

⚠️ [Major] (7)

LOAD DATA never materializes stored EMBED_TEXT generated columns (LOAD DATA row path in pkg/executor/load_data.go lines 560, 581, and 663, together with the auto-embedding skip in pkg/executor/insert_common.go line 899)
ALTER or MODIFY can still create generated-column chains on top of auto-embedding columns (pkg/ddl/create_table.go:602, pkg/ddl/generated_column.go:255, pkg/executor/insert_common.go:899)
DDL accepts auto-embedding generated columns with invalid JSON options (pkg/ddl/generated_column.go:448, pkg/expression/inference_helper.go:115, pkg/expression/builtin_inference.go:138)
EMBED_TEXT drops statement cancellation and timeout propagation once the remote call starts (pkg/inference/sqlembed.go:162; pkg/executor/insert_common.go:700-740)
Hosted embedding API-key file is read once at startup and never refreshed (pkg/inference/sqlembed.go:106-126; pkg/domain/inference.go:19-31)
domainadaptor.GetEmbedFn hides a global-default fallback behind a domain-scoped name (pkg/inference/domainadaptor/adaptor.go:15, pkg/inference/domainadaptor/adaptor.go:32)
INSERT-side auto-embedding reimplements the EMBED_TEXT runtime contract (pkg/executor/insert_common.go:634, pkg/expression/builtin_inference.go:95)

🟡 [Minor] (3)

GeneratedExprString comment describes an AST contract even though the field stores SQL text (pkg/expression/column.go:297)
Auto-embedding insert helper names imply batching and singular datum handling that the code does not do (pkg/executor/insert_common.go:57, pkg/executor/insert_common.go:634)
Domain public API was expanded for a single local adaptor path (pkg/domain/inference.go:19, pkg/inference/domainadaptor/adaptor.go:24)

Unanchored findings

⚠️ [Major] (1)

LOAD DATA never materializes stored EMBED_TEXT generated columns
- Scope: LOAD DATA row path in pkg/executor/load_data.go lines 560, 581, and 663, together with the auto-embedding skip in pkg/executor/insert_common.go line 899
- Request: Add the same auto-embedding materialization step to the LOAD DATA row pipeline and cover it with an integration test for LOAD DATA on a table with a stored EMBED_TEXT() generated column.

D3Hunter · 2026-06-12T07:47:56Z

 				if err := checkIllegalFn4Generated(colDef.Name.Name.L, typeColumn, option.Expr); err != nil {
 					return errors.Trace(err)
 				}
+				if err := checkGeneratedColForAutoEmbedding(colDef.Name.Name.L, option.Expr, option.Stored); err != nil {


🚨 [Blocker] Auto-embedding table metadata is introduced without an N-1 compatibility gate

Why
CREATE TABLE now allows persisted stored generated columns backed by EMBED_TEXT(), but the resulting metadata and DML/planner behavior depend on new code paths that are only present in this version. In a rolling upgrade or rollback window, older TiDB nodes can still observe the same table metadata even though this feature is not guarded for mixed-version use.

Scope
pkg/ddl/create_table.go:558, pkg/executor/insert_common.go:634, pkg/expression/expression.go:1114, pkg/ddl/generated_column.go:448

Risk if unchanged
Creating one of these tables before every TiDB node is upgraded can break inserts/updates or vector-search planning on N-1 nodes, which makes the feature unsafe for normal rolling upgrade and downgrade procedures.

Evidence
checkGeneratedColForAutoEmbedding() only validates SQL shape before CREATE TABLE succeeds; there is no cluster-version gate or mixed-version rejection in the DDL path. The new metadata is then consumed by auto-embedding DML evaluation in fillAutoEmbeddingDatum() and by planner schema loading through GeneratedExprString. The added tests cover only single-version starter-mode flows (pkg/expression/integration_test/integration_test.go:4539 through :4668) and do not exercise N/N-1 upgrade or rollback behavior.

Change request
Gate creation and use of EMBED_TEXT()-generated columns until the cluster version guarantees every TiDB node understands the new metadata, or return an explicit mixed-version unsupported error and add upgrade/downgrade coverage for that path.

Addressed. checkGeneratedColForAutoEmbedding now rejects creating auto-embedding generated columns during mixed-version rolling upgrades.

D3Hunter · 2026-06-12T07:47:56Z

 		}
 	}
+
+	autoEmbeddingCols := make(map[string]struct{})


⚠️ [Major] ALTER or MODIFY can still create generated-column chains on top of auto-embedding columns

Why
CREATE TABLE now blocks generated columns that depend on an auto-embedding column, but the modify-column validation path does not enforce the same dependency rule. That leaves an ALTER path where a generated column can be changed to depend on an EMBED_TEXT() column even though the executor deliberately skips evaluating EMBED_TEXT() during the normal generated-column pass.

Scope
pkg/ddl/create_table.go:602, pkg/ddl/generated_column.go:255, pkg/executor/insert_common.go:899

Risk if unchanged
A user can create a legal table and later ALTER TABLE ... MODIFY COLUMN a generated column to reference an auto-embedding column. Subsequent inserts then evaluate the dependent generated column before the embedding column is filled, so the dependent value is computed from NULL or dummy state and stored incorrectly.

Evidence
create_table.go:602-617 adds a dedicated scan that rejects dependencies on auto-embedding generated columns. checkModifyGeneratedColumn in generated_column.go:255-279 validates the new expression plus auto-increment and index rules, but never checks whether dependColNames includes an existing auto-embedding generated column. That omission matters because fillRow skips EMBED_TEXT() columns at insert_common.go:899-900 and does not revisit downstream generated columns afterward.

Change request
Apply the same dependency check in checkModifyGeneratedColumn and any other ALTER path that can rewrite generated expressions, and add coverage for altering a generated column to reference an existing auto-embedding column.

Addressed. The MODIFY generated-column path now applies the same dependency check and rejects generated columns built on top of auto-embedding columns.

D3Hunter · 2026-06-12T07:47:56Z

+}
+
+// Embed generates embeddings for the given text. It handles with cache and batching.
+func (e *EmbedFn) Embed(shouldCancel func() bool, modelWithProvider string, text string, opts map[string]any) ([]float32, error) {


⚠️ [Major] EMBED_TEXT drops statement cancellation and timeout propagation once the remote call starts

Why
The new embedding path replaces the statement context with context.Background() and only polls SQLKiller, so request deadlines, executor cancellation, and parent-context shutdown no longer abort in-flight provider calls.

Scope
pkg/inference/sqlembed.go:162; pkg/executor/insert_common.go:700-740

Risk if unchanged
Canceled inserts and queries can keep waiting on remote embedding requests until the provider or HTTP client times out, tying up executor goroutines and slowing overload recovery under user cancels or statement timeouts.

Evidence
In Embed, the provider context is created with context.WithCancel(context.Background()) at line 163, and the only cancellation bridge is the 1-second shouldCancel poll at lines 168-182. The insert path does create egCtx := errgroup.WithContext(ctx), but the actual remote call ignores that context and calls embedFn.Embed(...) with only a SQLKiller callback at lines 723-740.

Change request
Use the caller context as the root through Embed and the provider path so EMBED_TEXT stops promptly on ctx.Done(), keeping the kill-signal poll only as a fallback when no real context is available.

Addressed. The embedding path now uses EmbedWithContext; direct evaluation and INSERT auto-embedding pass the caller/executor context through to the provider call, with SQLKiller kept as an additional fallback.

D3Hunter · 2026-06-12T07:47:56Z

+		ErrMissingAPIKey: fmt.Errorf(errMissingAPI, "Gemini", strings.ToUpper(vardef.TiDBExpEmbedGeminiAPIKey)),
+		// ErrUnauthorized is not provided in gemini. The error message provided by Gemini API is sufficient enough.
+	}))
+	if isHostedEmbeddingEnabled() {


⚠️ [Major] Hosted embedding API-key file is read once at startup and never refreshed

Why
The hosted-embedding provider captures the file contents into a local apiKey variable during NewEmbedFn, so missing credentials at boot or later key rotation cannot recover without rebuilding the whole EmbedFn.

Scope
pkg/inference/sqlembed.go:106-126; pkg/domain/inference.go:19-31

Risk if unchanged
A node that starts before the secret is mounted, or a cluster that rotates the bearer token in place, will keep sending an empty or stale credential and permanently fail EMBED_TEXT requests until restart.

Evidence
Lines 107-116 read HostedEmbedding.APIKeyPath once and store the result in a local string. The registered provider then uses GetAPIKey: func() string { return apiKey } at line 125, while the domain constructs the EmbedFn once in initInferenceProviders() and only closes it on domain shutdown.

Change request
Fetch the API key on demand instead of capturing a startup snapshot, or at least re-read the configured file behind a small refresh cache so secret-mount races and key rotation do not require restart.

Addressed. Hosted embedding now reads the API key on demand via getHostedEmbeddingAPIKey instead of capturing it once during NewEmbedFn, so secret updates can be picked up without rebuilding EmbedFn.

D3Hunter · 2026-06-12T07:47:57Z

+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+// Package domainadaptor provides access to Domain-owned inference resources


⚠️ [Major] domainadaptor.GetEmbedFn hides a global-default fallback behind a domain-scoped name

Why
The new adaptor API reads like a pure domain lookup, but the implementation also returns the process-wide default embed function when the session is nil, lacks GetDomain, or exposes a nil domain embedder. That gives the same name two meanings and makes the lifecycle and scope contract hard to read at call sites.

Scope
pkg/inference/domainadaptor/adaptor.go:15, pkg/inference/domainadaptor/adaptor.go:32

Risk if unchanged
Future callers can reasonably assume they are always getting a domain-owned resource and then accidentally couple tests or sessions through the global singleton fallback.

Evidence
The package header says this package provides access to domain-owned inference resources, and GetEmbedFn says it returns the embed bridge associated with the session's domain. The body instead returns inference.DefaultEmbedFn() on three non-domain paths at lines 35, 39, and 45.

Change request
Expose the fallback semantics in the name or comment, for example ResolveEmbedFn or GetOrDefaultEmbedFn, or add a short why-comment that this helper falls back to the process-global default when no domain-owned embedder is available.

Thanks. I kept the small Domain.GetEmbedFn() accessor for now. The bridge is intentionally isolated in domainadaptor to avoid an import cycle and keep expression/inference code from importing domain. Moving this into a different session/domain capability interface would be a broader API change; for this PR I kept the minimal accessor and documented the fallback semantics in domainadaptor.GetEmbedFn.

D3Hunter · 2026-06-12T07:47:57Z

 	return defaultVal, nil
 }

+func (e *InsertValues) fillAutoEmbeddingDatum(ctx context.Context, rows [][]types.Datum) ([][]types.Datum, error) {


⚠️ [Major] INSERT-side auto-embedding reimplements the EMBED_TEXT runtime contract

Why
The new insert path now owns starter-mode gating, provider lookup, SQL-kill cancellation, remote embed invocation, and vector conversion in a second place instead of reusing one canonical evaluator.

Scope
pkg/executor/insert_common.go:634, pkg/expression/builtin_inference.go:95

Risk if unchanged
Future changes to EMBED_TEXT semantics can land in only one path, so direct expression evaluation and generated-column INSERT evaluation can drift apart on cancellation, provider selection, null handling, or vector materialization behavior.

Evidence
fillAutoEmbeddingDatum batches work, but the core EMBED_TEXT execution block at pkg/executor/insert_common.go:695-750 repeats the same runtime steps already implemented in BuiltinEmbedTextSig.evalVectorFloat32 at pkg/expression/builtin_inference.go:101-121; both paths call EvalEmbedTextArgs, enforce starter mode, fetch domainadaptor.GetEmbedFn(...), invoke .Embed(...), and convert the returned slice with types.CreateVectorFloat32.

Change request
Keep one canonical definition for EMBED_TEXT execution and let the executor layer handle only batching and orchestration. Extract a shared helper in the expression or inference layer for evaluating prepared embed args to a vector datum, and have both direct expression evaluation and generated-column INSERT reuse it.

D3Hunter · 2026-06-12T07:47:57Z

 	// VirtualExpr is used to save expression for virtual column
 	VirtualExpr Expression

+	// GeneratedExprString stores the generated expression for planner rewrites that need the original AST.


🟡 [Minor] GeneratedExprString comment describes an AST contract even though the field stores SQL text

Why
The comment says the field keeps the original AST, but the type is string and downstream code reparses it before rewrites. That is a direct comment-to-code mismatch on a planner-facing field.

Scope
pkg/expression/column.go:297

Risk if unchanged
Readers may treat this field as an AST-preserving representation and make incorrect assumptions about fidelity, mutability, or reparsing costs in later planner work.

Evidence
The added field is GeneratedExprString string, while nearby consumers call generatedexpr.ParseExpression(col.GeneratedExprString) in pkg/expression/expression.go:1132 and pkg/planner/core/expression_rewriter.go:3040.

Change request
Update the comment to match the real contract, for example that this stores generated-expression text for later parse-and-rewrite flows.

D3Hunter · 2026-06-12T07:47:57Z

+)
+
+const (
+	autoEmbeddingEvalBatchSizeHint        = 16


🟡 [Minor] Auto-embedding insert helper names imply batching and singular datum handling that the code does not do

Why
The helper processes many generated-column cells across rows, yet it is named fillAutoEmbeddingDatum, and the concurrency limit is expressed as autoEmbeddingEvalBatchSizeHint * autoEmbeddingEvalMaxConcurrentBatches even though the implementation launches one goroutine per task rather than batching work.

Scope
pkg/executor/insert_common.go:57, pkg/executor/insert_common.go:634

Risk if unchanged
Tuning or extending this path becomes harder because maintainers have to reverse-engineer whether the knobs are about row batches, task concurrency, or generated-column fanout.

Evidence
The function signature is fillAutoEmbeddingDatum(ctx context.Context, rows [][]types.Datum), it builds a tasks slice over every auto-embedding generated column in every row, and line 701 applies the limit directly to errgroup task concurrency rather than to explicit batches.

Change request
Prefer names that match the behavior, such as fillAutoEmbeddingDatums or populateAutoEmbeddingGeneratedColumns, and rename the limit constants to reflect task concurrency unless real batching is introduced.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

pkg/inference/embedding/cohere/cohere_test.go (1)

123-145: ⚡ Quick win

Unused mock server in this test.

When GetAPIKey() returns an empty string, CreateEmbeddings returns early (before constructing the HTTP request) per the logic in cohere.go. The mock server at lines 127-132 is never called. Consider removing the server setup to clarify that no HTTP request is made in this case.

♻️ Simplified test without unused mock server

 func TestCohereEmbedder_NoAPIKey(t *testing.T) {
-	// Mock no API key response from real Cohere API
-	mockResponse := `{"id":"b6a8a658-261e-4fc2-b44d-0ee3fefa145e","message":"no api key supplied"}`
-
-	server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
-		w.Header().Set("Content-Type", "application/json")
-		w.WriteHeader(http.StatusUnauthorized)
-		w.Write([]byte(mockResponse))
-	}))
-	defer server.Close()
-
 	embedder := NewCohereEmbedder(EmbedderConfig{
 		GetAPIKey:  func() string { return "" },
-		GetBaseURL: func() string { return server.URL },
 	})
 
 	texts := []string{"hello world"}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/inference/embedding/cohere/cohere_test.go` around lines 123 - 145, The
test TestCohereEmbedder_NoAPIKey sets up an httptest.Server that is never used
because NewCohereEmbedder's CreateEmbeddings returns early when GetAPIKey() is
empty; remove the unused mock server setup (the httptest.NewServer handler and
its defer server.Close()) from the test to make it clear no HTTP call is made,
keeping the rest of the test that constructs the embedder via
NewCohereEmbedder(EmbedderConfig{GetAPIKey: func() string { return "" },
GetBaseURL: func() string { return server.URL },}) and asserts embeddings is nil
and an error containing "API key is not configured for cohere".

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/ddl/generated_column.go`:
- Around line 455-469: The checkGeneratedColForAutoEmbedding function currently
validates AST/versions but doesn't enforce the deployment-mode gate used at
runtime; call the same gate (the CheckEmbedTextAllowed function in
pkg/expression/builtin_inference.go) from checkGeneratedColForAutoEmbedding and
return its error when it disallows EMBED_TEXT, so CREATE/ALTER fails during DDL
on non-Starter deployments; place this call early (before accepting the column)
so the unsupported schema isn't persisted.

In `@pkg/executor/insert_common.go`:
- Around line 705-721: The loop that schedules auto-embedding tasks iterates all
entries in rows including nil sentinels from LOAD DATA, causing skipped records
to be re-evaluated; update the outer loop in the insert/common code (the for
rowIdx, row := range rows loop) to skip nil rows immediately (e.g., if row ==
nil { continue }) before computing gIdx or appending autoEmbeddingEvalTask so
that no tasks are enqueued for sentinel nil rows; keep the existing gIdx logic
and e.GenExprs usage unchanged.

---

Nitpick comments:
In `@pkg/inference/embedding/cohere/cohere_test.go`:
- Around line 123-145: The test TestCohereEmbedder_NoAPIKey sets up an
httptest.Server that is never used because NewCohereEmbedder's CreateEmbeddings
returns early when GetAPIKey() is empty; remove the unused mock server setup
(the httptest.NewServer handler and its defer server.Close()) from the test to
make it clear no HTTP call is made, keeping the rest of the test that constructs
the embedder via NewCohereEmbedder(EmbedderConfig{GetAPIKey: func() string {
return "" }, GetBaseURL: func() string { return server.URL },}) and asserts
embeddings is nil and an error containing "API key is not configured for
cohere".

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: a77e591f-3554-42bd-9b9c-c40eccc1d188

📥 Commits

Reviewing files that changed from the base of the PR and between d3f1ba8 and 1d43e61.

📒 Files selected for processing (80)

pkg/config/config.go
pkg/config/config.toml.nextgen.example
pkg/config/config_test.go
pkg/ddl/BUILD.bazel
pkg/ddl/add_column.go
pkg/ddl/create_table.go
pkg/ddl/generated_column.go
pkg/ddl/generated_column_internal_test.go
pkg/domain/BUILD.bazel
pkg/domain/domain.go
pkg/domain/inference.go
pkg/executor/insert_common.go
pkg/executor/load_data.go
pkg/expression/BUILD.bazel
pkg/expression/builtin.go
pkg/expression/builtin_inference.go
pkg/expression/builtin_inference_test.go
pkg/expression/builtin_threadunsafe_generated.go
pkg/expression/column.go
pkg/expression/expression.go
pkg/expression/function_traits.go
pkg/expression/function_traits_test.go
pkg/expression/generator/builtin_threadsafe.go
pkg/expression/inference_helper.go
pkg/expression/inference_helper_test.go
pkg/expression/integration_test/BUILD.bazel
pkg/expression/integration_test/integration_test.go
pkg/inference/BUILD.bazel
pkg/inference/domainadaptor/BUILD.bazel
pkg/inference/domainadaptor/adaptor.go
pkg/inference/embedding/base/BUILD.bazel
pkg/inference/embedding/base/base.go
pkg/inference/embedding/base/base_test.go
pkg/inference/embedding/batcher/BUILD.bazel
pkg/inference/embedding/batcher/batcher.go
pkg/inference/embedding/batcher/batcher_test.go
pkg/inference/embedding/cohere/BUILD.bazel
pkg/inference/embedding/cohere/cohere.go
pkg/inference/embedding/cohere/cohere_test.go
pkg/inference/embedding/cohere/protocol.go
pkg/inference/embedding/gemini/BUILD.bazel
pkg/inference/embedding/gemini/gemini.go
pkg/inference/embedding/gemini/gemini_test.go
pkg/inference/embedding/gemini/protocol.go
pkg/inference/embedding/huggingface/BUILD.bazel
pkg/inference/embedding/huggingface/huggingface.go
pkg/inference/embedding/huggingface/huggingface_test.go
pkg/inference/embedding/huggingface/protocol.go
pkg/inference/embedding/jina/BUILD.bazel
pkg/inference/embedding/jina/jina.go
pkg/inference/embedding/jina/jina_test.go
pkg/inference/embedding/jina/protocol.go
pkg/inference/embedding/mock/BUILD.bazel
pkg/inference/embedding/mock/mock.go
pkg/inference/embedding/mock/mock_test.go
pkg/inference/embedding/nvidia/BUILD.bazel
pkg/inference/embedding/nvidia/nvidia.go
pkg/inference/embedding/nvidia/nvidia_test.go
pkg/inference/embedding/nvidia/protocol.go
pkg/inference/embedding/openai/BUILD.bazel
pkg/inference/embedding/openai/openai.go
pkg/inference/embedding/openai/openai_test.go
pkg/inference/embedding/openai/protocol.go
pkg/inference/embedding/tidbcloud/BUILD.bazel
pkg/inference/embedding/tidbcloud/protocol.go
pkg/inference/embedding/tidbcloud/tidbcloud_free.go
pkg/inference/embedding/tidbcloud/tidbcloud_free_test.go
pkg/inference/manager_test.go
pkg/inference/sqlembed.go
pkg/parser/ast/functions.go
pkg/planner/core/BUILD.bazel
pkg/planner/core/expression_rewriter.go
pkg/planner/core/logical_plan_builder.go
pkg/planner/util/null_misc_test.go
pkg/sessionctx/vardef/tidb_vars.go
pkg/sessionctx/variable/BUILD.bazel
pkg/sessionctx/variable/embedding_vars.go
pkg/sessionctx/variable/embedding_vars_test.go
pkg/sessionctx/variable/sysvar.go
tests/integrationtest/r/executor/show.result

✅ Files skipped from review due to trivial changes (9)

pkg/inference/embedding/cohere/protocol.go
pkg/planner/util/null_misc_test.go
pkg/parser/ast/functions.go
tests/integrationtest/r/executor/show.result
pkg/inference/embedding/tidbcloud/protocol.go
pkg/inference/embedding/openai/BUILD.bazel
pkg/sessionctx/variable/BUILD.bazel
pkg/inference/embedding/huggingface/huggingface.go
pkg/inference/embedding/gemini/BUILD.bazel

🚧 Files skipped from review as they are similar to previous changes (63)

pkg/inference/embedding/huggingface/protocol.go
pkg/expression/builtin_threadunsafe_generated.go
pkg/planner/core/BUILD.bazel
pkg/config/config.toml.nextgen.example
pkg/domain/BUILD.bazel
pkg/inference/embedding/batcher/BUILD.bazel
pkg/inference/embedding/openai/protocol.go
pkg/inference/domainadaptor/BUILD.bazel
pkg/inference/embedding/jina/BUILD.bazel
pkg/inference/embedding/base/BUILD.bazel
pkg/expression/builtin.go
pkg/inference/embedding/nvidia/BUILD.bazel
pkg/inference/embedding/mock/mock.go
pkg/domain/inference.go
pkg/expression/function_traits_test.go
pkg/inference/embedding/huggingface/BUILD.bazel
pkg/inference/embedding/jina/protocol.go
pkg/expression/expression.go
pkg/inference/manager_test.go
pkg/inference/domainadaptor/adaptor.go
pkg/inference/embedding/gemini/protocol.go
pkg/inference/embedding/tidbcloud/BUILD.bazel
pkg/expression/BUILD.bazel
pkg/inference/BUILD.bazel
pkg/domain/domain.go
pkg/sessionctx/variable/embedding_vars_test.go
pkg/sessionctx/variable/embedding_vars.go
pkg/expression/function_traits.go
pkg/expression/generator/builtin_threadsafe.go
pkg/planner/core/logical_plan_builder.go
pkg/expression/integration_test/BUILD.bazel
pkg/inference/embedding/base/base_test.go
pkg/sessionctx/vardef/tidb_vars.go
pkg/ddl/add_column.go
pkg/inference/embedding/mock/BUILD.bazel
pkg/inference/embedding/nvidia/nvidia_test.go
pkg/sessionctx/variable/sysvar.go
pkg/inference/embedding/openai/openai.go
pkg/inference/embedding/tidbcloud/tidbcloud_free.go
pkg/expression/builtin_inference_test.go
pkg/inference/embedding/nvidia/protocol.go
pkg/inference/embedding/gemini/gemini.go
pkg/expression/column.go
pkg/inference/embedding/cohere/cohere.go
pkg/expression/inference_helper_test.go
pkg/inference/embedding/mock/mock_test.go
pkg/inference/embedding/tidbcloud/tidbcloud_free_test.go
pkg/inference/embedding/base/base.go
pkg/inference/embedding/jina/jina_test.go
pkg/inference/embedding/openai/openai_test.go
pkg/config/config.go
pkg/inference/embedding/huggingface/huggingface_test.go
pkg/expression/inference_helper.go
pkg/inference/embedding/batcher/batcher.go
pkg/expression/integration_test/integration_test.go
pkg/inference/embedding/cohere/BUILD.bazel
pkg/planner/core/expression_rewriter.go
pkg/inference/embedding/nvidia/nvidia.go
pkg/inference/embedding/gemini/gemini_test.go
pkg/inference/embedding/batcher/batcher_test.go
pkg/inference/sqlembed.go
pkg/config/config_test.go
pkg/inference/embedding/jina/jina.go

ChangRui-Ryan · 2026-06-15T06:47:01Z

/retest

tiprow · 2026-06-15T06:47:26Z

@ChangRui-Ryan: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

D3Hunter · 2026-06-15T08:52:29Z

+	return checkAutoEmbeddingGeneratedColumnServerVersionsCompatible(versions)
+}
+
+func checkAutoEmbeddingGeneratedColumnServerVersionsCompatible(versions []string) error {


no need to check this as its the first version to introduce it in open source tidb? seems AI is over considering the compatibility issue

D3Hunter · 2026-06-15T09:04:02Z

what's manager

Agreed, the name was misleading. I renamed manager_test.go to sqlembed_test.go since these tests cover the SQL embedding helper / EmbedFn behavior rather than a manager component.

ChangRui-Ryan · 2026-06-16T02:43:36Z

/retest

tiprow · 2026-06-16T02:44:00Z

@ChangRui-Ryan: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

ChangRui-Ryan · 2026-06-16T04:42:28Z

/retest

tiprow · 2026-06-16T04:42:53Z

@ChangRui-Ryan: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

windtalker · 2026-06-15T09:04:34Z

+			break
+		}
+	}
+	if !hasAutoEmbeddingExpr {


It looks to me that hasAutoEmbeddingExpr can be decided during compile time or open(), so there is no need to re-check it for every call of fillAutoEmbeddingDatumsWithRowCount

windtalker · 2026-06-15T09:10:55Z

+				continue
+			}
+			gIdx++
+			if !expression.IsAutoEmbedFnCallAST(gCol.GeneratedExpr.Internal()) {


The same as above, it looks to me we can pre-handle these information so during this row-wise loop, we can just construct the autoEmbeddingEvalTask

windtalker · 2026-06-15T09:16:11Z

+			results[i].err = fmt.Errorf("auto-embedding generated column expects EMBED_TEXT()")
+			continue
+		}
+		embedArgs, isNull, err := expression.EvalEmbedTextArgs(e.Ctx().GetExprCtx().GetEvalCtx(), chunk.MutRowFromDatums(task.row).ToRow(), sf.GetArgs(), embedSig.IsFromVecSearch)


chunk.MutRowFromDatums is very low-efficient, I think we should implement some caching to optimize this logic

windtalker · 2026-06-18T07:23:55Z

 		// NOTE: please make sure there are test cases for all functions here.
 	}
+	specialUnsafeFuncs = map[string]struct{}{
+		"BuiltinEmbedTextSig": {},


Why the name is BuiltinEmbedTextSig, not builtinEmbedTextSig?

ChangRui-Ryan · 2026-06-18T10:51:53Z

/retest

ti-chi-bot Bot added do-not-merge/needs-tests-checked release-note-none Denotes a PR that doesn't merit a release note. labels Jun 8, 2026

ti-chi-bot Bot added sig/planner SIG: Planner size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed do-not-merge/needs-tests-checked labels Jun 8, 2026

coderabbitai Bot reviewed Jun 8, 2026

View reviewed changes

D3Hunter reviewed Jun 9, 2026

View reviewed changes

D3Hunter reviewed Jun 12, 2026

View reviewed changes

ChangRui-Ryan added 6 commits June 12, 2026 17:11

expression: add full SQL auto embedding support

1c6f324

expression: restrict EMBED_TEXT to starter mode

eefa4c3

expression: address auto embedding review issues

9200ed4

test: update auto embedding builtin snapshots

90f7193

config: restrict hosted embedding to starter

49bc880

expression: improve auto embedding robustness

1d43e61

ChangRui-Ryan force-pushed the changrui_cse_embed-text branch from d3f1ba8 to 1d43e61 Compare June 13, 2026 08:59

coderabbitai Bot reviewed Jun 13, 2026

View reviewed changes

Comment thread pkg/ddl/generated_column.go

Comment thread pkg/executor/insert_common.go

ChangRui-Ryan added 3 commits June 13, 2026 18:26

executor: fix loop variable lint

6da71b2

expression: fix integration test lint

bc5a7f1

expression: address auto-embedding review comments

50d8f83

D3Hunter reviewed Jun 15, 2026

View reviewed changes

expression: address auto-embedding review feedback

40cbafc

windtalker reviewed Jun 18, 2026

View reviewed changes

expression: refine auto embedding execution

78aba2b

Conversation

ChangRui-Ryan commented Jun 8, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What changed and how does it work?

Check List

Local E2E validation

Release note

Summary by CodeRabbit

Uh oh!

pantheon-ai Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tiprow Bot commented Jun 8, 2026

Uh oh!

coderabbitai Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

ChangRui-Ryan commented Jun 9, 2026

Uh oh!

tiprow Bot commented Jun 9, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ti-chi-bot Bot commented Jun 10, 2026

Uh oh!

ChangRui-Ryan commented Jun 10, 2026

Uh oh!

tiprow Bot commented Jun 10, 2026

Uh oh!

ChangRui-Ryan commented Jun 10, 2026

Uh oh!

tiprow Bot commented Jun 10, 2026

Uh oh!

ChangRui-Ryan commented Jun 10, 2026

Uh oh!

tiprow Bot commented Jun 10, 2026

Uh oh!

D3Hunter left a comment

Choose a reason for hiding this comment

Summary

🚨 [Blocker] (1)

⚠️ [Major] (7)

🟡 [Minor] (3)

Unanchored findings

⚠️ [Major] (1)

Uh oh!

Choose a reason for hiding this comment

🚨 [Blocker] Auto-embedding table metadata is introduced without an N-1 compatibility gate

ChangRui-Ryan commented Jun 8, 2026 •

edited by coderabbitai Bot

Loading

pantheon-ai Bot commented Jun 8, 2026 •

edited

Loading

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading

codecov Bot commented Jun 8, 2026 •

edited

Loading