fix: stop memory leak from orphaned CR reflector goroutines on repeated CRD discovery#2920
fix: stop memory leak from orphaned CR reflector goroutines on repeated CRD discovery#2920bhope wants to merge 1 commit intokubernetes:mainfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: bhope The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
This issue is currently awaiting triage. If kube-state-metrics contributors determine this is a relevant issue, they will accept it by applying the The DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Pull request overview
Fixes elevated/unbounded memory growth and goroutine leaks when custom resource state config is enabled by making CRD discovery idempotent and ensuring custom-resource reflectors stop on both CRD removal and context cancellation.
Changes:
- Make
CRDiscoverer.AppendToMapidempotent (no duplicate kinds; don’t replace existing stop channels). - Ensure custom-resource reflectors stop when either the GVK stop channel fires or the builder context is cancelled.
- Add/extend tests covering idempotency, channel cleanup, and leak simulations.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/store/builder.go | Updates CR reflector stop behavior to also honor builder context cancellation. |
| internal/store/builder_test.go | Adds unit tests around the combined stop channel behavior for CR reflectors. |
| internal/discovery/types.go | Prevents duplicate kind entries and stop-channel replacement in repeated discovery updates. |
| internal/discovery/types_test.go | Adds deterministic unit tests for Append/Remove idempotency and channel closure. |
| internal/discovery/memleak_test.go | Adds simulation-style tests intended to demonstrate pre/post fix memory & goroutine behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Hi @mrueg addressed the copilot suggestions and CI is now green. Ready for a review when you get a chance. |
…discovery fix gofmt error Co-authored-by: Oleg Zaytsev <1511481+colega@users.noreply.github.com>
|
Any idea when this will be released? |
|
@jullianow This will be included in the upcoming release, we are working towards it. Please stay tuned. Thanks. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| deadline := time.After(2 * time.Second) | ||
| for i, ch := range stopChs { | ||
| select { | ||
| case <-ch: | ||
| case <-deadline: |
There was a problem hiding this comment.
deadline := time.After(2 * time.Second) is created once and reused across the loop. That means later iterations may get less than 2s (or even time out immediately) depending on scheduling, making this test more brittle and the error message misleading. Consider using a per-iteration timeout (create time.After inside the loop) or use a single overall deadline but compare against time.Now()/time.Until() and adjust the message accordingly.
| deadline := time.After(2 * time.Second) | |
| for i, ch := range stopChs { | |
| select { | |
| case <-ch: | |
| case <-deadline: | |
| for i, ch := range stopChs { | |
| select { | |
| case <-ch: | |
| case <-time.After(2 * time.Second): |
Elevated and unbounded memory growth introduced in v2.18.0 when custom resource state config is in use.
Root Causes
Fix
Also, added tests to cover idempotency and cleanup in the discovery package - verifying no duplicate kinds or channel replacement on repeated AppendToMap calls, and that RemoveFromMap closes channels so reflectors stop cleanly.
Test Results:
TestMemoryLeakSimulation- 5 GVKs × 500 poll cyclesTestGoroutineLeakSimulation- 5 GVKs × 20 store rebuildsFixes #2867