fix(cronjob): replace panic on timezone load failure with graceful error skip#2907
fix(cronjob): replace panic on timezone load failure with graceful error skip#2907aryanputta wants to merge 2 commits intokubernetes:mainfrom
Conversation
…graceful skip When getNextScheduledTime fails because cron.ParseStandard cannot resolve a named timezone (e.g. Asia/Singapore without tzdata), the error was passed directly to panic(), crashing the process and taking down all cluster metrics collection. Replace panic(err) with klog.Errorf so the failure is logged at error level and the metric is skipped for that CronJob. All other CronJobs and resource types continue serving metrics normally. Also add "k8s.io/klog/v2" to the import block, which was the missing dependency for structured error logging in this file.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: aryanputta The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
This issue is currently awaiting triage. If kube-state-metrics contributors determine this is a relevant issue, they will accept it by applying the The DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Welcome @aryanputta! |
|
|
@aryanputta can you ensure the CLA is signed? Thanks! |
|
I think I did it, sorry for thr trouble. |
There was a problem hiding this comment.
Pull request overview
Prevents kube-state-metrics from crashing when CronJob.spec.timeZone refers to a named timezone that can’t be resolved (e.g., missing tzdata), by logging the error and skipping kube_cronjob_next_schedule_time for the affected CronJob.
Changes:
- Replaced
panic(err)on schedule parsing failure withklogerror logging and early return (skip metric for that CronJob). - Added
k8s.io/klog/v2import to support the new logging.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
/check-cla |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
What this PR does / why we need it:
When a CronJob has
spec.timeZoneset to a named timezone (e.g.Asia/Singapore) and the kube-state-metrics container image does not includetzdata,cron.ParseStandardcannot resolve the location. The error was passed directly topanic(err)in thekube_cronjob_next_schedule_timegenerator, crashing the process and taking down all cluster metrics collection.This change replaces
panic(err)withklog.Errorfso the failure is logged at error level and thekube_cronjob_next_schedule_timemetric is silently skipped for the affected CronJob. All other CronJobs and resource types continue serving metrics normally. Also adds"k8s.io/klog/v2"to the import block, which was previously absent from this file.How does this change affect the cardinality of KSM: Does not change cardinality. The affected metric is omitted for CronJobs whose timezone cannot be resolved; it was previously not emittable at all because the process crashed.
Which issue(s) this PR fixes: Fixes #2898
Special notes for your reviewer: The fix is two lines in
internal/store/cronjob.go. No new tests needed beyond running existing tests with a CronJob that has a named timezone -- the process no longer panics.