Skip to content

GODRIVER-3810 Update WithTransaction to raise timeout error.#2344

Open
qingyang-hu wants to merge 3 commits intomongodb:masterfrom
qingyang-hu:godriver3810
Open

GODRIVER-3810 Update WithTransaction to raise timeout error.#2344
qingyang-hu wants to merge 3 commits intomongodb:masterfrom
qingyang-hu:godriver3810

Conversation

@qingyang-hu
Copy link
Copy Markdown
Contributor

@qingyang-hu qingyang-hu commented Mar 19, 2026

GODRIVER-3810, GODRIVER-3853

NOTE: GODRIVER-3853 supersedes some of the changes introduced by GODRIVER-3810.

Summary

Create a TimeoutError type to be used for backoff timeouts in WithTransaction().

Background & Motivation

@mongodb-drivers-pr-bot
Copy link
Copy Markdown
Contributor

API Change Report

./v2/mongo

compatible changes

TimeoutError: added

@mongodb-drivers-pr-bot
Copy link
Copy Markdown
Contributor

mongodb-drivers-pr-bot bot commented Mar 19, 2026

🧪 Performance Results

Commit SHA: f440be0

The following benchmark tests for version 69dd6d1e2b23cb00078ede23 had statistically significant changes (i.e., |z-score| > 1.96):

Benchmark Measurement % Change Patch Value Stable Region H-Score Z-Score
BenchmarkMultiFindMany ops_per_second_min -89.4025 4673.0501 Avg: 44095.9342
Med: 45057.2227
Stdev: 10053.6625
0.8613 -3.9212
BenchmarkBSONFlatDocumentEncoding ops_per_second_min -45.8028 2011.7891 Avg: 3711.9770
Med: 3840.0541
Stdev: 745.5166
0.8332 -2.2805
BenchmarkSingleFindOneByID ops_per_second_min -35.6396 804.6755 Avg: 1250.2639
Med: 1243.3079
Stdev: 171.5049
0.7945 -2.5981
BenchmarkMultiFindMany ops_per_second_max -3.1391 4098360.6557 Avg: 4231181.9272
Med: 4237288.1356
Stdev: 55172.0294
0.7843 -2.4074

For a comprehensive view of all microbenchmark results for this PR's commit, please check out the Evergreen perf task for this patch.

@qingyang-hu qingyang-hu marked this pull request as ready for review March 20, 2026 12:16
Copilot AI review requested due to automatic review settings March 20, 2026 12:16
@qingyang-hu qingyang-hu requested a review from a team as a code owner March 20, 2026 12:16
@qingyang-hu qingyang-hu requested a review from tadjik1 March 20, 2026 12:16
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new TimeoutError type to better surface WithTransaction() timeouts (with proper timeout labeling), and updates the unified test runner plumbing so spec tests can assert on the WithTransaction() error value rather than treating it as a harness failure.

Changes:

  • Add mongo.TimeoutError (wrapping an underlying cause and reporting ExceededTimeLimitError via HasErrorLabel).
  • Update Session.WithTransaction() to return TimeoutError{Wrapped: err} for timeout-driven retry/backoff exits.
  • Update unified spec runner operation execution to propagate operation results (including Err) and unskip a previously skipped CSOT convenient-transactions spec test.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
mongo/session.go Wraps certain overall-timeout exit paths in TimeoutError when retry/backoff would exceed the transaction timeout.
mongo/errors.go Adds the exported TimeoutError type with Unwrap() and HasErrorLabel() support.
internal/spectest/skip.go Removes a skip entry for the convenient-transactions timeout surfacing test.
internal/integration/unified/unified_spec_runner.go Updates caller to handle operation.execute() returning a result + error.
internal/integration/unified/testrunner_operation.go Updates thread/loop execution to match new operation.execute() signature.
internal/integration/unified/session_operation_execution.go Changes withTransaction operation to return the WithTransaction() error in operationResult.Err for verification.
internal/integration/unified/operation.go Changes operation.execute() signature to return *operationResult alongside an execution error.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread mongo/errors.go
Comment thread mongo/errors.go
Comment on lines +830 to +853
// TimeoutError represents an error that occurred due to a timeout.
type TimeoutError struct {
Wrapped error
}

// Error implements the error interface.
func (e TimeoutError) Error() string {
return e.Wrapped.Error()
}

// Unwrap returns the underlying error.
func (e TimeoutError) Unwrap() error {
return e.Wrapped
}

// HasErrorLabel returns true if the error contains the specified label.
func (e TimeoutError) HasErrorLabel(label string) bool {
if label == "ExceededTimeLimitError" {
return true
} else if le := LabeledError(nil); errors.As(e.Wrapped, &le) {
return le.HasErrorLabel(label)
}
return false
}
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a unit test for TimeoutError to ensure it integrates with existing timeout detection and labeling (e.g., IsTimeout should return true via the ExceededTimeLimitError label, and HasErrorLabel should delegate to the wrapped error for other labels). There are already table-driven tests for IsTimeout in mongo/errors_test.go that can be extended.

Copilot uses AI. Check for mistakes.
Comment on lines +262 to +263
_, err = threadOp.execute(ctx, loopDone)
return err
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runOnThread: the added closure assigns to the outer-scoped err variable (_, err = threadOp.execute(...)). Because the task runs asynchronously, this captures and mutates a variable from the parent stack frame, which is a data race and can also clobber the error used earlier in the case block. Use a new local variable inside the closure (e.g., _, execErr := ...; return execErr) to avoid capturing/mutating outer state.

Suggested change
_, err = threadOp.execute(ctx, loopDone)
return err
_, execErr := threadOp.execute(ctx, loopDone)
return execErr

Copilot uses AI. Check for mistakes.
Comment on lines +108 to 118
var cbErr error
for idx, oper := range operations {
if err := oper.execute(ctx, loopDone); err != nil {
return nil, fmt.Errorf("error executing operation %q at index %d: %v", oper.Name, idx, err)
res, execErr := oper.execute(ctx, loopDone)
if execErr != nil {
err = fmt.Errorf("error executing operation %q at index %d: %v", oper.Name, idx, execErr)
return nil, nil
}
if cbErr == nil && res != nil {
cbErr = res.Err
}
}
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

executeWithTransaction callback: once an operation inside the callback produces a non-nil operation error (res.Err), the callback should return that error immediately so subsequent operations are not executed. The current logic records the first res.Err in cbErr but continues executing later operations, which can change transactional behavior compared to a real application callback and can cause extra side effects or different errors.

Copilot uses AI. Check for mistakes.
Comment on lines +112 to +113
err = fmt.Errorf("error executing operation %q at index %d: %v", oper.Name, idx, execErr)
return nil, nil
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

executeWithTransaction callback: on execErr, the callback currently returns (nil, nil), which causes WithTransaction to commit even though the callback did not successfully execute all operations, and relies on mutating the outer err to fail later. It would be safer to return the wrapped execution error from the callback so WithTransaction aborts/cleans up consistently and avoids committing partial work.

Suggested change
err = fmt.Errorf("error executing operation %q at index %d: %v", oper.Name, idx, execErr)
return nil, nil
return nil, fmt.Errorf("error executing operation %q at index %d: %w", oper.Name, idx, execErr)

Copilot uses AI. Check for mistakes.
Comment thread mongo/session.go
Comment on lines 179 to 183
select {
case <-timeout.C:
return nil, err
return nil, TimeoutError{Wrapped: err}
default:
}
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WithTransaction now wraps some timeout paths with TimeoutError, but other uses of the same overall timeout timer still return the raw err (e.g., the commit retry loop). This can lead to inconsistent timeout signaling/labeling for callers. If the goal is to surface a timeout error for WithTransaction timeouts, consider wrapping all returns caused by the overall timeout timer consistently.

Copilot uses AI. Check for mistakes.
@qingyang-hu qingyang-hu marked this pull request as draft March 20, 2026 12:38
@qingyang-hu qingyang-hu marked this pull request as ready for review March 20, 2026 20:03
Copy link
Copy Markdown
Member

@tadjik1 tadjik1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @qingyang-hu , thanks for your work, it looks good!

I left few comments in the code on places I think should be improved, please let me know what you think about it.

Comment thread mongo/errors.go Outdated
if e.Wrapped == nil {
return "operation timed out"
}
return e.Wrapped.Error()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest add some string prefix so this error is distinguishable (similar to what we do for other errors

Suggested change
return e.Wrapped.Error()
return "operation timed out: " + e.Wrapped.Error()

return nil, fmt.Errorf("error executing operation %q at index %d: %v", oper.Name, idx, err)
res, execErr := oper.execute(ctx, loopDone)
if execErr != nil {
err = fmt.Errorf("error executing operation %q at index %d: %v", oper.Name, idx, execErr)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify: we capture err here because we want to get "3rd type of result" from this function, errors that are not related to WithTransaction logic, right? So we do want to get error, but we don't want driver to handle this as if it's a transaction error.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, we need to capture the error while still completing all operations in the transaction.

Comment thread mongo/session.go
case <-timeout.C:
sleep.Stop()
return nil, err
return nil, TimeoutError{Wrapped: err}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec says we have to distinguish between CSOT and non-CSOT errors:
Note 1: When the TIMEOUT_MS (calculated in step [1.3] is reached we MUST report a timeout error wrapping the last error that was encountered which triggered the retry behavior. If timeoutMS is set, then timeout error is a special type which is defined in CSOT , If timeoutMS is not set, then propagate it as timeout error if the language allows to expose the underlying error as a cause of a timeout error. If timeout error is thrown then it SHOULD expose error label(s) from the transient error.
https://github.com/mongodb/specifications/blob/master/source/transactions-convenient-api/transactions-convenient-api.md#sequence-of-actions

In the current approach we always return TimeoutErrors which are indistinguishable.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tadjik1 The non-CSOT path in WithTransaction uses a manual wall-clock check against the 120-second limit, not a context deadline. AFIAK context.DeadlineExceeded is always CSOT in Go Driver.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, since the original error is wrapped in the new type, the user can always trace back and distinguish the errors.

Copy link
Copy Markdown
Contributor

@matthewdale matthewdale Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably beyond the scope of this PR, but moving forward we should avoid picking which error to return and instead return all errors joined using errutil.Join

E.g.

var errs []error
// ...

select {
case <-timeout.C:
	errs = append(errs, errors.New("default WithTransaction timeout reached"))
	return nil, errutil.Join(errs...)
default:
}
// ...

@matthewdale matthewdale self-requested a review April 6, 2026 20:25
@qingyang-hu qingyang-hu marked this pull request as draft April 10, 2026 14:56
@qingyang-hu qingyang-hu marked this pull request as ready for review April 13, 2026 22:57
Comment thread mongo/errors.go
Comment on lines +831 to +832
type TimeoutError struct {
Wrapped error
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does TimeoutError need to be exported? If not, we shouldn't export it.

Additionally, we need to update IsTimeout to return true if the error is a timeoutError, including a test to confirm IsTimeout works with the new error type.

Comment thread mongo/session.go
Comment on lines +223 to +226
select {
case <-timeout.C:
return res, TimeoutError{Wrapped: err}
default:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intent of this change to only respect the default WithTransaction timeout if the error is a CommandError with label "UnknownTransactionCommitResult"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants