docs: Address PR #381 review feedback by rubambiza · Pull Request #454 · llm-d-incubation/llm-d-fast-model-actuation

rubambiza · 2026-04-24T16:12:48Z

Summary

Follow-up to #381, addressing review feedback from Mike and Ansu on the benchmarking scenarios doc.

Rename lukewarm to "Cold Start (with launcher)": the path is cold, not warm. Metric renamed from T_luke_warm to T_cold_launcher.
Split Cold Start into three variants: Cold Start (no FMA), Cold Start (FMA M2, planned), Cold Start (with launcher). M2 benchmarking is flagged as planned, pending a stable M3 harness.
Add constituent duration metrics table: T_launcher_schedule, T_launcher_startup, T_dpc_react, T_instance_ready as planned sub-metrics with observability sources.
Add L2 to Resource Scaling and Stress Test: TTFT cost per requester is low relative to actuation time, and the data is useful at scale.
Drop LPC attribution from Warm Start: DPC does not care whether the pre-existing launcher was created by LPC or by a prior DPC reconciliation.
Fix node-level language: launchers are on Nodes, not GPUs (Warm Start and Hot Start descriptions).
Remove unused L1+L3 legend entry; expand L1+L2+L3 description.
Fix Phase 2 T_launcher scope: applies to both warm and cold start with launcher paths.
Remove obsolete naming note (no longer needed after rename).

Test plan

Verify markdown tables render correctly
Confirm all 5 actuation path columns match between Paths table and Matrix
Confirm metric names are consistent across definitions, L1 legend, and Integration Phases

…th naming Rename lukewarm start to "Cold Start (with launcher)" per reviewer feedback that the path is cold, not warm. Split Cold Start into three variants: no FMA, FMA M2 (planned), and with launcher. Rename T_luke_warm metric to T_cold_launcher. Add constituent duration metrics table (T_launcher_schedule, T_launcher_startup, T_dpc_react, T_instance_ready) as planned sub-metrics. Remove obsolete naming note. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Gloire Rubambiza <gloire@ibm.com>

Summary of changes: - Add Cold Start (FMA M2) to Purpose bullet list so it matches the 5-column matrix (was "four" conditions, now "different" conditions) - Clarify T_actuation: non-FMA and M2 cold starts have no FMA-specific sub-components - Drop LPC attribution from Warm Start description -- DPC does not care whether the pre-existing launcher was created by LPC or by a prior DPC reconciliation - Replace "on the correct/assigned GPU" with node-level language in Warm Start and Hot Start (launchers are on Nodes, not GPUs) - Add L2 to Resource Scaling and Stress Test (L1+L3 -> L1+L2+L3) since TTFT cost is low relative to actuation and the data is useful at scale - Remove unused L1+L3 legend entry from matrix; expand L1+L2+L3 description to show how it builds on L1+L2 - Fix Phase 2 T_launcher scope: applies to both warm and cold start with launcher, not just warm Assisted-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Gloire Rubambiza <gloire@ibm.com>

aavarghese · 2026-04-24T17:18:12Z

- **Cold start**: creating a new vLLM instance without using a launcher
- **Luke warm start**: DPC creates a new launcher pod, then the launcher creates a new vLLM instance
+- **Cold start without FMA**: creating a new vLLM instance without using a launcher
+- **Cold start (FMA M2)**: DPC creates a standalone server-providing pod directly (planned)


What does (planned) mean here?

Why is it that we even need to mention M2 in these docs? I doubt we will ever deploy FMA M2...

aavarghese · 2026-04-24T17:22:44Z

 **Metric definitions:**

- **T_actuation**: Time from requester pod creation (ReplicaSet scale-up) to requester pod readiness (`/ready` probe passes), which implies the DPC has bound the requester to a server-providing pod and the vLLM instance is serving. Spans different sub-components depending on the actuation path: hot start (T_wake), warm start (T_launcher), or luke warm start (T_luke_warm).
+- **T_actuation**: Time from requester pod creation (ReplicaSet scale-up) to requester pod readiness (`/ready` probe passes), which implies the DPC has bound the requester to a server-providing pod and the vLLM instance is serving. For FMA paths, spans different sub-components depending on the actuation path: hot start (T_wake), warm start (T_launcher), or cold start with launcher (T_cold_launcher). For non-FMA and M2 cold starts, T_actuation is measured directly with no FMA-specific sub-components.


Adding more confusion to new reader...we could remove and M2 cold starts

MikeSpreitzer · 2026-04-24T17:49:53Z

 The goal is to quantify and compare how quickly a model-serving duo (server-requesting
-and server-providing pods) becomes available under four different actuation conditions
+and server-providing pods) becomes available under different actuation conditions
 in order of decreasing latency:


I think that the order in which the cases actually appear is fine, but I suspect that it is not equal to "decreasing latency". The ordering is: first, code complexity, and second, runtime path length (which I expect will correlate with latency).

Drop Cold Start (FMA M2) from the actuation paths table and matrix. M2 is acknowledged as a distinct path via a note under the paths table but excluded from the benchmarking focus. This simplifies the matrix to 4 columns (no FMA, with launcher, warm, hot) and removes all "(planned)" annotations. Assisted-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Gloire Rubambiza <gloire@ibm.com>

MikeSpreitzer · 2026-04-24T18:03:38Z

 - **Hit_rate**: Fraction of server-requesting Pods that get satisfied by waking a sleeping vLLM instance.
- **T_luke_warm**: Time from the DPC requesting launcher pod creation to the new vLLM instance reporting healthy. Covers the full luke warm start span: launcher pod scheduling, launcher readiness, DPC reconciliation, and vLLM instance creation. Measured end-to-end because the boundary between launcher readiness and instance creation is not directly observable from outside the DPC.
- **T_launcher**: Time from the launcher receiving a create request to the new vLLM instance reporting healthy. Includes the benefit of vLLM module preloading. Applies to the warm start path, where a launcher pod already exists.
+- **T_cold_launcher**: Time from the DPC launcher pod creation to the new vLLM instance reporting healthy. Covers the full cold start (with launcher) span: launcher pod scheduling, launcher startup, and vLLM instance creation.


Is this trying to say that T_cold_launch is T_actuation but restricted to cold start with launcher scenarios? In other words, T_actuation is (a) the time from (1) creation of the server-requesting Pod to (2) requester Pod readiness but (b) is only measured for the code start with launcher cases?

If so, then the text currently here is misleading: it suggests a bit less of a span to me. It is also confusing because it says "full ... span".

If not, then this is a different kind of refinement than T_wake: this one covers less of the full path but T_wake is the full path but restricted by actuation case.

MikeSpreitzer · 2026-04-24T18:11:21Z

+| ------ | ---------- | -------------- |
+| **T_launcher_schedule** | Launcher pod `creationTimestamp` to `PodScheduled` condition `lastTransitionTime` | Kube pod status |
+| **T_launcher_startup** | Launcher pod `PodScheduled` to `Ready` condition `lastTransitionTime` | Kube pod status |
+| **T_dpc_react** | Launcher pod `Ready` to DPC issuing `CreateNamedInstance` | DPC logs (V5: "Creating new vLLM instance") |


What is "V5"?

MikeSpreitzer · 2026-04-24T18:14:38Z

+| ------ | ---------- | -------------- |
+| **T_launcher_schedule** | Launcher pod `creationTimestamp` to `PodScheduled` condition `lastTransitionTime` | Kube pod status |
+| **T_launcher_startup** | Launcher pod `PodScheduled` to `Ready` condition `lastTransitionTime` | Kube pod status |
+| **T_dpc_react** | Launcher pod `Ready` to DPC issuing `CreateNamedInstance` | DPC logs (V5: "Creating new vLLM instance") |


To remove the difficulties of parsing the controller log, the controller could produce a Prometheus histogram of this duration. Depending on the level of correlation with other measurements intended, the fact that this is inherently an aggregate may or may not be a problem. If it is a problem then we could consider supporting distributed tracing.

Same for T_instance_ready

MikeSpreitzer · 2026-04-24T18:35:29Z

+
+Relationships:
+- T_cold_launcher ≈ T_launcher_schedule + T_launcher_startup + T_dpc_react + T_instance_ready
+- T_launcher ≈ T_dpc_react + T_instance_ready (launcher already Ready)


In the warm start actuation path, the right starting gun is not "Launcher pod Ready". As noted, the launcher Pod is ready before the serve-requesting Pod is created.

MikeSpreitzer

I left some individual comments.

rubambiza added 2 commits April 23, 2026 18:34

aavarghese reviewed Apr 24, 2026

View reviewed changes

MikeSpreitzer reviewed Apr 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: Address PR #381 review feedback#454

docs: Address PR #381 review feedback#454
rubambiza wants to merge 3 commits intollm-d-incubation:mainfrom
rubambiza:docs/benchmark-review-followup

rubambiza commented Apr 24, 2026

Uh oh!

aavarghese Apr 24, 2026

Uh oh!

aavarghese Apr 24, 2026

Uh oh!

aavarghese Apr 24, 2026

Uh oh!

MikeSpreitzer Apr 24, 2026 •

edited

Loading

Uh oh!

MikeSpreitzer Apr 24, 2026 •

edited

Loading

Uh oh!

MikeSpreitzer Apr 24, 2026

Uh oh!

MikeSpreitzer Apr 24, 2026 •

edited

Loading

Uh oh!

MikeSpreitzer Apr 24, 2026

Uh oh!

MikeSpreitzer left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rubambiza commented Apr 24, 2026

Summary

Test plan

Uh oh!

aavarghese Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

aavarghese Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

aavarghese Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

MikeSpreitzer Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MikeSpreitzer Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MikeSpreitzer Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

MikeSpreitzer Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MikeSpreitzer Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

MikeSpreitzer left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MikeSpreitzer Apr 24, 2026 •

edited

Loading

MikeSpreitzer Apr 24, 2026 •

edited

Loading

MikeSpreitzer Apr 24, 2026 •

edited

Loading