Bedrock invoke_agent streaming starts only after full response generation (delayed first chunk)

### Describe the bug

**Description**

When invoking a Bedrock agent using streaming, the API call blocks until the agent finishes generating the full response. Only after completion does the first chunk event appear, after which remaining chunks are delivered quickly.

This defeats the purpose of streaming because users do not receive any partial output during generation.

**Environment**
SDK: boto3
Service: Amazon Bedrock Agent Runtime
Python: 3.12
boto3:1.42.59
Region: us-east-1

**Minimal Reproducible Code**

```
import boto3
import time

client = boto3.client(
    "bedrock-agent-runtime",
    region_name="us-east-1",
    aws_access_key_id="...",
    aws_secret_access_key="..."
)

start = time.time()
print("Invoking agent...")

response = client.invoke_agent(
    agentId="HBULA1EYN8",
    agentAliasId="92E64FKIFG",
    sessionId="test-session",
    enableTrace=True,
    sessionState={'files': []},
    inputText="Explain quantum computing in simple terms",
    streamingConfigurations={
        "streamFinalResponse": True
    }
)

print(f"Time taken to receive first response: {time.time() - start:.2f}s")

for event in response["completion"]:
    print(f"{time.time() - start:.2f}s -> {event.keys()}")
```
Note: I have added the permission -` bedrock:InvokeModelWithResponseStream` as per this documentation (https://docs.aws.amazon.com/boto3/latest/reference/services/bedrock-agent-runtime/client/invoke_agent.html)


**Observed Output**
```
Invoking agent...
Time taken to receive first response: 1.13s
1.33s -> dict_keys(['trace'])
7.18s -> dict_keys(['chunk'])
7.19s -> dict_keys(['chunk'])
7.19s -> dict_keys(['chunk'])
```
...

**Key observation:**

trace events arrive early
chunk events begin only after ~7 seconds
**Expected Behavior**

Streaming should begin as soon as the model starts generating tokens, for example:

```
0.8s -> dict_keys(['chunk'])
0.9s -> dict_keys(['chunk'])
1.0s -> dict_keys(['chunk'])
```

**Actual Behavior**
```
Agent executes for several seconds
↓
Full response generated internally
↓
Only then streaming of chunks begins
```

**This results in a perceived delay and defeats the purpose of using a streaming API for interactive applications.**

### Regression Issue

- [ ] Select this option if this issue appears to be a regression.

### Expected Behavior

Streaming should begin as soon as the model starts generating tokens, for example:

```
0.8s -> dict_keys(['chunk'])
0.9s -> dict_keys(['chunk'])
1.0s -> dict_keys(['chunk'])
```

### Current Behavior

**Actual Behavior**
```
Agent executes for several seconds
↓
Full response generated internally
↓
Only then streaming of chunks begins
```

### Reproduction Steps

```
import boto3
import time

client = boto3.client(
    "bedrock-agent-runtime",
    region_name="us-east-1",
    aws_access_key_id="...",
    aws_secret_access_key="..."
)

start = time.time()
print("Invoking agent...")

response = client.invoke_agent(
    agentId="HBULA1EYN8",
    agentAliasId="92E64FKIFG",
    sessionId="test-session",
    enableTrace=True,
    sessionState={'files': []},
    inputText="Explain quantum computing in simple terms",
    streamingConfigurations={
        "streamFinalResponse": True
    }
)

print(f"Time taken to receive first response: {time.time() - start:.2f}s")

for event in response["completion"]:
    print(f"{time.time() - start:.2f}s -> {event.keys()}")
```

### Possible Solution

_No response_

### Additional Information/Context

_No response_

### SDK version used

1.42.59

### Environment details (OS name and version, etc.)

Mac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bedrock invoke_agent streaming starts only after full response generation (delayed first chunk) #4744

Describe the bug

Regression Issue

Expected Behavior

Current Behavior

Reproduction Steps

Possible Solution

Additional Information/Context

SDK version used

Environment details (OS name and version, etc.)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bedrock invoke_agent streaming starts only after full response generation (delayed first chunk) #4744

Description

Describe the bug

Regression Issue

Expected Behavior

Current Behavior

Reproduction Steps

Possible Solution

Additional Information/Context

SDK version used

Environment details (OS name and version, etc.)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions