fix(cluster): complete Issue #905 step 2 correctness for Rand and Rou…#915
fix(cluster): complete Issue #905 step 2 correctness for Rand and Rou…#915mochengqian wants to merge 5 commits intoapache:developfrom
Conversation
|
Issue #905 benchmark baseline (Step 1 + Step 2) Environment:
Command: go test ./pkg/server -run '^$' -bench '^BenchmarkCluster' -benchmem -count=5Each Raw output was captured locally for reference. PickEndpoint / Lookup
LB Hot Path / Healthy Filter
CAS / Consistent Hash
Notes
|
There was a problem hiding this comment.
Pull request overview
Implements Issue #905 Step 1 (tests/benchmarks) and Step 2 (correctness) for cluster endpoint picking, focusing on fixing Rand / RoundRobin behavior with unhealthy endpoints and improving concurrency safety for round-robin picks.
Changes:
- Fix
RandandRoundRobinto pick only from healthy endpoints and returnnilwhen none are healthy. - Make round-robin cursor updates concurrency-safe via atomic operations.
- Add regression tests and a benchmark suite covering pick-path behavior and related hot paths.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/server/cluster_manager_test.go | Adds/extends manager-path regression tests (missing cluster, unhealthy endpoints) and a concurrent pick race-style test. |
| pkg/server/cluster_manager_bench_test.go | Introduces a benchmark suite for PickEndpoint, lookup cost, LB hot path, health filtering cost, CAS workload, and consistent-hash resolve. |
| pkg/model/cluster_test.go | Adds unit tests for GetEndpoint health filtering, consistent-hash initialization, and Endpoint.GetHost. |
| pkg/model/cluster.go | Changes PrePickEndpointIndex type to uint32 to support atomic increments. |
| pkg/cluster/loadbalancer/roundrobin/round_robin_test.go | Adds deterministic tests for all-unhealthy behavior and ordering/cursor handling. |
| pkg/cluster/loadbalancer/roundrobin/round_robin.go | Updates round-robin handler to return nil when no healthy endpoints and uses atomic cursor increments. |
| pkg/cluster/loadbalancer/rand/load_balancer_rand_test.go | Adds deterministic tests for Rand correctness and all-unhealthy behavior without panics. |
| pkg/cluster/loadbalancer/rand/load_balancer_rand.go | Fixes Rand to use healthy slice length, return nil on empty, and adds a test hook for deterministic randomness. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #915 +/- ##
===========================================
+ Coverage 22.05% 22.36% +0.30%
===========================================
Files 270 270
Lines 20069 20083 +14
===========================================
+ Hits 4426 4491 +65
+ Misses 15193 15140 -53
- Partials 450 452 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Code reviewFound 1 issue:
🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
|
good catch!我确实没从并发内存模型层面考虑,现已修复问题. |



What this PR does:
This PR delivers Step 1 (tests + benchmarks) and Step 2 (correctness) for Issue #905, and is intended to be the first mergeable slice with a clean boundary.
It:
Randto choose only from the healthy endpoint slice and returnnilwhen no healthy endpoint existsRoundRobinto returnnilwhen no healthy endpoint existsRand/RoundRobincorrectness issuesPickEndpointserial baselinePickEndpointparallel baselineCompareAndSetStoreworkloadThis PR does not start the Step 3+ runtime-consistency / snapshot refactor yet.
Which issue(s) this PR fixes:
Fixes #905
Special notes for your reviewer:
Issue #905 step progress:
Add tests and benchmarks
Fix current correctness issues
Tighten runtime consistency
Switch cluster lookup to O(1)
Introduce healthy endpoint snapshots
Optimize simple LB hot path
Optimize consistent-hash LB last
This PR intentionally covers Step 1 (tests + benchmarks) and Step 2 (correctness) only.
Runtime consistency for
UpdateCluster/CompareAndSetStore, O(1) cluster lookup, healthy endpoint snapshots, simple LB hot-path optimization, and consistent-hash optimization will follow in later PRs.Detailed benchmark baseline numbers are posted in a separate PR comment so later steps can compare against the same baseline.
Local validation completed:
go test ./pkg/model ./pkg/server ./pkg/cluster/loadbalancer/rand ./pkg/cluster/loadbalancer/roundrobin -count=1go test ./pkg/server ./pkg/cluster/loadbalancer/rand ./pkg/cluster/loadbalancer/roundrobin -race -gcflags=-l -count=1go vet ./pkg/server ./pkg/cluster/loadbalancer/... ./pkg/modelgo test ./pkg/server -run '^$' -bench '^BenchmarkCluster' -benchmem -count=5Does this PR introduce a user-facing change?: