You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix silent volume restoration failures for encrypted snapshots
Resolvesvelero-io/velero#3145 and velero-io/velero#9128
Previously, when restoring EBS snapshots with KMS encryption, the plugin
would report success even when volume creation failed due to missing KMS
permissions (kms:Decrypt, kms:ReEncrypt*, kms:CreateGrant). This created
a silent failure scenario where Velero logs showed successful restoration
but the volume was never actually created.
Changes:
- Add volume creation verification with polling in CreateVolumeFromSnapshot
- Wait for volume to reach 'available' state before returning success
- Enhanced error handling for KMS permission failures with actionable messages
- Add configurable timeout (volumeCreationTimeout) and poll interval (volumeCreationPollInterval)
- Comprehensive test coverage for new error handling and configuration
The fix transforms silent failures into clear error messages, helping users
quickly identify and resolve KMS permission issues during volume restoration.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
returnerrors.Errorf("timeout waiting for volume %s to become available after %v. Check AWS CloudTrail for detailed error information", volumeID, b.volumeCreationTimeout)
216
+
default:
217
+
}
218
+
219
+
volume, err:=b.describeVolume(volumeID)
220
+
iferr!=nil {
221
+
// Check if volume doesn't exist yet (still being created)
222
+
varapiErr smithy.APIError
223
+
iferrors.As(err, &apiErr) {
224
+
ifapiErr.ErrorCode() =="InvalidVolume.NotFound" {
225
+
b.log.WithField("volumeID", volumeID).Debug("Volume not found yet, continuing to wait")
226
+
time.Sleep(b.volumePollInterval)
227
+
continue
228
+
}
229
+
}
230
+
231
+
// For other errors, return immediately with enhanced context
232
+
returnb.enhanceVolumeCreationError(err, volumeID)
233
+
}
234
+
235
+
state:=volume.State
236
+
b.log.WithFields(logrus.Fields{
237
+
"volumeID": volumeID,
238
+
"state": state,
239
+
}).Debug("Volume status check")
240
+
241
+
switchstate {
242
+
casetypes.VolumeStateAvailable:
243
+
b.log.WithField("volumeID", volumeID).Info("Volume successfully created and available")
244
+
returnnil
245
+
casetypes.VolumeStateError:
246
+
returnerrors.Errorf("volume %s creation failed with state 'error'. This often indicates KMS permission issues for encrypted snapshots. Required KMS permissions: kms:Decrypt, kms:ReEncrypt*, kms:CreateGrant", volumeID)
247
+
casetypes.VolumeStateCreating:
248
+
// Volume is still being created, continue waiting
249
+
b.log.WithField("volumeID", volumeID).Debug("Volume is still being created")
250
+
default:
251
+
b.log.WithFields(logrus.Fields{
252
+
"volumeID": volumeID,
253
+
"state": state,
254
+
}).Debug("Volume in intermediate state, continuing to wait")
255
+
}
256
+
257
+
time.Sleep(b.volumePollInterval)
258
+
}
259
+
}
260
+
261
+
// enhanceVolumeCreationError provides more detailed error messages for common volume creation failures
0 commit comments