Dynamic port allocation with reverse proxy on server-requesting pod#363
Dynamic port allocation with reverse proxy on server-requesting pod#363delavet wants to merge 3 commits intollm-d-incubation:mainfrom
Conversation
|
Unsigned commits detected! Please sign your commits. For instructions on how to set up GPG/SSH signing and verify your commits, please see GitHub Documentation. |
| type launcherData struct { | ||
| // Instances is a map, | ||
| // where key is an instance's ID which is the instance' nominal hash, | ||
| // and value is the last used time of the instance. | ||
| Instances map[string]time.Time | ||
|
|
||
| // Accurate indicates whether the set of nominal hash in Instances is accurate. | ||
| Accurate bool | ||
| } |
There was a problem hiding this comment.
I expect that it will be (A) easier to review this largish PR and (B) easier to maintain this long-lived branch if it does not do unnecessary code changes, such as moving code that could stay where it is in the file.
diegocastanibm
left a comment
There was a problem hiding this comment.
I do not know why the dynamic port controller capability has been documented in the docs/launcher.md. I think a better place would be the docs/dual-pods.md file.
Also, please, remember to sign all your commits.
Good point! Thanks! I will reorganize this PR after successfully running it and resolving all issues, while also addressing Mike’s suggestions. |
9086ac2 to
2fc8cde
Compare
2fc8cde to
6a1f5b8
Compare
|
To properly address the integration issue with InferencePool, I finally implemented the following solution:
For experimental results related to the reverse proxy, please refer to: https://docs.google.com/document/d/1krI8OOOWpGz2Cbb4iZmLJZc5R29_z9gq2y3MkaYuARE/edit?usp=sharing |
|
|
||
| // Try initialize server | ||
| if instance.initialized.Load() { | ||
| http.Error(w, "proxy already intialized", http.StatusConflict) |
|
|
||
| // Double-check after acquiring write lock | ||
| if instance.initialized.Load() { | ||
| http.Error(w, "proxy already intialized", http.StatusConflict) |
| if proxyPort == "" { | ||
| proxyPort = "8082" | ||
| } | ||
|
|
There was a problem hiding this comment.
We should also add
- name: proxy
containerPort: 8082
to mkobjs.sh?
|
This PR is marked as stale after 21d of inactivity. After an additional 14d of inactivity (7d to become rotten, then 7d more), it will be closed. To prevent this PR from being closed, add a comment or remove the |
This PR makes
InferenceServerConfig.spec.modelServerConfig.Portoptional, implementing the approach discussed in the following Slack conversation to enable integration with InferencePool. Currently, when no port is specified, the dual-pods controller can dynamically allocate a port and record it in the launcher pod’sinference.networking.k8s.io/active-portsannotation.https://llm-d.slack.com/archives/C09TNPEFJUD/p1769183239461429
A current issue indeed exists: users of InferencePool and maintainers of the dual-pods controller must mutually agree upon a predefined list of available ports. Currently, this port range is hardcoded—from
8005to8005 + N - 1—which ironically increases complexity .I am seriously considering the alternative options proposed earlier: adding TCP proxy functionality to the requester, enabling direct access to the vLLM service via the requester—this should also align all ports. Maybe it will be better.
I would greatly appreciate your comments. @MikeSpreitzer @lionelvillard