feat: introduce deletion timestamp metric for daemonset, statefulset, deployment, service and pdb#2678
Conversation
63191f9 to
d5bb362
Compare
|
All commits were squashed into one. |
|
Hi, could you share more insights on use cases after these metrics are added? Is it used for monitoring Kubernetes resources that are stuck in a terminating state? |
|
@CatherineF-dev Hi, yes, if the resource deletion process is stuck for some reason or blocked by the finalizer, deletiontimestamp metric can help to detect such a case and raise an alert for investigation. |
|
/assign |
|
@IgorIgnatevBolt How will we know which resource should be deleted? |
Maybe I misunderstood the question, but this PR is exactly about detection for such resources that were nominated by the controller manager for deletion but not deleted for some reason, eq blocked by finalizers |
|
Hi @CatherineF-dev, do you need any more information about PR or anything else that can help you move forward? |
|
/assign @CatherineF-dev |
| | kube_deployment_labels | Gauge | Kubernetes labels converted to Prometheus labels controlled via [--metric-labels-allowlist](../../developer/cli-arguments.md) | `deployment`=<deployment-name> <br> `namespace`=<deployment-namespace> <br> `label_DEPLOYMENT_LABEL`=<DEPLOYMENT_LABEL> | STABLE | | ||
| | kube_deployment_created | Gauge | | `deployment`=<deployment-name> <br> `namespace`=<deployment-namespace> | STABLE | | ||
| | kube_deployment_created | Gauge | | `deployment`=<deployment-name> <br> `namespace`=<deployment-namespace> | STABLE | | ||
| | kube_deployment_deletion_timestamp | Gauge | Unix deletion timestamp | `deployment`=<deployment-name> <br> `namespace`=<deployment-namespace> | EXPIREMENTAL | |
There was a problem hiding this comment.
Should we use kube_deployment_deleted to align with kube_deployment_created?
There was a problem hiding this comment.
I'd like to keep the pattern the same as for other resources like kube_node_deletion_timestamp or kube_pod_deletion_timestamp
There was a problem hiding this comment.
ah so its kube_deployment_created thats not following the _timestamp pattern :/
Looks like we have consciously made the switch to use _timestamp in the past so maybe kube_deployment_deletion_timestamp is the way to go.
Additionally does it make sense to rename the kube_deployment_created to also follow the same? How disruptive is that change?
There was a problem hiding this comment.
I guess renaming any existing metrics would be a breaking change and require a longer release process. I'd like to keep the scope of the current PR only in the current state, with new metrics only.
d5bb362 to
b188b21
Compare
|
rebased, conflicts solved |
This commit adds the kube_*_deletion_timestamp metric for several Kubernetes resources: - Deployments - DaemonSets - StatefulSets - Services - PodDisruptionBudgets The deletion timestamp metric reports the Unix timestamp when a resource was marked for deletion. This helps with monitoring resource lifecycle and cleanup processes. All metrics follow the same pattern: - Help text: 'Unix deletion timestamp' - Type: gauge - Value: Unix timestamp in seconds when DeletionTimestamp is set, otherwise the metric is not emitted Updated documentation and tests are included for all affected resources.
b188b21 to
53ec1de
Compare
|
fixed typo in the depl test |
|
/hold for @CatherineF-dev to further comment. /lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: IgorIgnatevBolt, mrueg The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/hold cancel |
What this PR does / why we need it:
Some resources can be blocked by deletion from
finalizers. To catch this and expose it to metrics, we can use the deletion timestamp metadata field.Introduce a deletion_timestamp metric for the next resources:
kube_deployment_deletion_timestampkube_statefulset_deletion_timestampkube_daemonset_deletion_timestampkube_service_deletion_timestampkube_poddisruptionbudget_deletion_timestampAlso formatting tables in docs
How does this change affect the cardinality of KSM: (increases, decreases or does not change cardinality)
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)format, will close the issue(s) when PR gets merged):Fixes #