Skip to content

feat: add auto-recovery logic for chat completion clients#7567

Closed
Ricardo-M-L wants to merge 1 commit intomicrosoft:mainfrom
Ricardo-M-L:feat/auto-recovery-logic
Closed

feat: add auto-recovery logic for chat completion clients#7567
Ricardo-M-L wants to merge 1 commit intomicrosoft:mainfrom
Ricardo-M-L:feat/auto-recovery-logic

Conversation

@Ricardo-M-L
Copy link
Copy Markdown

Summary

Implements configurable auto-recovery logic for chat completion clients, addressing the need described in #3632.

  • RetryableChatCompletionClient: A wrapper that adds retry logic with exponential backoff and jitter to any ChatCompletionClient
  • RetryConfig: Dataclass for configuring max retries, base/max delay, exponential base, jitter, retryable status codes, and error types
  • Error classification: Correctly identifies transient errors (429 rate limit, 500/502/503/504 server errors, 424 failed dependency, timeouts, connection errors) vs permanent errors (400, 401, 403, 404, auth errors) that should not be retried
  • Retry-After header support: Respects server-provided Retry-After values as minimum delay
  • Jitter: Randomized delay to prevent thundering herd problems
  • Logging: All retry attempts are logged via the trace logger

Usage

from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_ext.models.retry import RetryableChatCompletionClient, RetryConfig

openai_client = OpenAIChatCompletionClient(model="gpt-4o")
retryable_client = RetryableChatCompletionClient(
    client=openai_client,
    retry_config=RetryConfig(max_retries=5, base_delay=1.0, max_delay=30.0),
)

# Use exactly like any other ChatCompletionClient
result = await retryable_client.create(messages)

Files changed

  • python/packages/autogen-ext/src/autogen_ext/models/retry/__init__.py — Module exports
  • python/packages/autogen-ext/src/autogen_ext/models/retry/_retry_chat_completion_client.py — Core implementation (359 lines)
  • python/packages/autogen-ext/tests/models/test_retry_chat_completion_client.py — 25 test cases (581 lines)

Fixes #3632

Test plan

  • 25 unit tests covering:
    • Successful calls with no retries
    • Retry on rate limit (429), server errors (500/502/503), timeout, connection errors, and 424 failed dependency
    • No retry on permanent errors (400, 401, 403, 404, auth errors)
    • Max retries exhaustion
    • Retry-After header handling
    • Exponential backoff calculation with/without jitter
    • Stream retry behavior
    • Delegation of all ChatCompletionClient methods (close, usage, tokens, model_info)
    • Zero retries configuration

🤖 Generated with Claude Code

…t errors

Implements a wrapper client that adds configurable retry logic with exponential
backoff and jitter to any ChatCompletionClient. Supports rate limit (429),
server errors (500/502/503/504), timeout, and connection errors while correctly
classifying permanent errors (auth, bad request) as non-retryable.

Fixes microsoft#3632

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Ricardo-M-L
Copy link
Copy Markdown
Author

Closing — this feature should have been discussed in an issue first before submitting a large PR. Apologies for the noise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Auto Recovery Logic for chat completion clients on different types of server errors

1 participant