Skip to content

Dynamic port allocation with reverse proxy on server-requesting pod#363

Open
delavet wants to merge 3 commits intollm-d-incubation:mainfrom
delavet:dynamic-port-allocation
Open

Dynamic port allocation with reverse proxy on server-requesting pod#363
delavet wants to merge 3 commits intollm-d-incubation:mainfrom
delavet:dynamic-port-allocation

Conversation

@delavet
Copy link
Copy Markdown

@delavet delavet commented Mar 18, 2026

This PR makes InferenceServerConfig.spec.modelServerConfig.Port optional, implementing the approach discussed in the following Slack conversation to enable integration with InferencePool. Currently, when no port is specified, the dual-pods controller can dynamically allocate a port and record it in the launcher pod’s inference.networking.k8s.io/active-ports annotation.

https://llm-d.slack.com/archives/C09TNPEFJUD/p1769183239461429

A current issue indeed exists: users of InferencePool and maintainers of the dual-pods controller must mutually agree upon a predefined list of available ports. Currently, this port range is hardcoded—from 8005 to 8005 + N - 1—which ironically increases complexity .

I am seriously considering the alternative options proposed earlier: adding TCP proxy functionality to the requester, enabling direct access to the vLLM service via the requester—this should also align all ports. Maybe it will be better.

I would greatly appreciate your comments. @MikeSpreitzer @lionelvillard

@github-actions
Copy link
Copy Markdown

Unsigned commits detected! Please sign your commits.

For instructions on how to set up GPG/SSH signing and verify your commits, please see GitHub Documentation.

Comment on lines -308 to -316
type launcherData struct {
// Instances is a map,
// where key is an instance's ID which is the instance' nominal hash,
// and value is the last used time of the instance.
Instances map[string]time.Time

// Accurate indicates whether the set of nominal hash in Instances is accurate.
Accurate bool
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect that it will be (A) easier to review this largish PR and (B) easier to maintain this long-lived branch if it does not do unnecessary code changes, such as moving code that could stay where it is in the file.

Copy link
Copy Markdown
Collaborator

@diegocastanibm diegocastanibm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not know why the dynamic port controller capability has been documented in the docs/launcher.md. I think a better place would be the docs/dual-pods.md file.
Also, please, remember to sign all your commits.

@delavet
Copy link
Copy Markdown
Author

delavet commented Mar 24, 2026

I do not know why the dynamic port controller capability has been documented in the docs/launcher.md. I think a better place would be the docs/dual-pods.md file. Also, please, remember to sign all your commits.

Good point! Thanks! I will reorganize this PR after successfully running it and resolving all issues, while also addressing Mike’s suggestions.

@delavet delavet force-pushed the dynamic-port-allocation branch from 9086ac2 to 2fc8cde Compare March 31, 2026 02:33
@delavet delavet force-pushed the dynamic-port-allocation branch from 2fc8cde to 6a1f5b8 Compare March 31, 2026 02:38
@delavet delavet marked this pull request as ready for review March 31, 2026 04:09
@delavet delavet changed the title dynamic port allocation Dynamic port allocation with reverse proxy on server-requesting pod Mar 31, 2026
@delavet
Copy link
Copy Markdown
Author

delavet commented Mar 31, 2026

To properly address the integration issue with InferencePool, I finally implemented the following solution:

  • The dual-pods controller dynamically assigns available ports on the launcher for vLLM instances—ensuring that all currently used ports are checked during allocation to prevent port conflicts.
  • A reverse proxy feature has been added to the server-requesting pod, enabling it to forward requests to the bound vLLM instance. Consequently, vLLM services can be accessed directly via the server-requesting pod. I incorporated this component into the final implementation after empirical experiment revealed that the reverse proxy on the requester introduces virtually no performance overhead—likely because the requester and launcher reside on the same node and perform no additional processing on incoming requests.

For experimental results related to the reverse proxy, please refer to:

https://docs.google.com/document/d/1krI8OOOWpGz2Cbb4iZmLJZc5R29_z9gq2y3MkaYuARE/edit?usp=sharing

@delavet delavet requested a review from diegocastanibm March 31, 2026 04:18

// Try initialize server
if instance.initialized.Load() {
http.Error(w, "proxy already intialized", http.StatusConflict)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intialized typo


// Double-check after acquiring write lock
if instance.initialized.Load() {
http.Error(w, "proxy already intialized", http.StatusConflict)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intialized typo

Comment thread cmd/requester/main.go
if proxyPort == "" {
proxyPort = "8082"
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also add

    - name: proxy          
    containerPort: 8082

to mkobjs.sh?

@github-actions
Copy link
Copy Markdown

This PR is marked as stale after 21d of inactivity. After an additional 14d of inactivity (7d to become rotten, then 7d more), it will be closed. To prevent this PR from being closed, add a comment or remove the lifecycle/stale label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants