You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(api): replace Literal type with str for SchedulingSpec.ray_placement_strategy
The Literal type annotation breaks omegaconf config loading since omegaconf
2.4.0.dev2 (and later dev versions) don't support Literal in structured configs.
This caused ValidationError on any config loading path that touches SchedulingSpec,
including scheduler.type=local which doesn't use Ray.
Changes:
- Change type from Literal["shared", "separate", "deferred"] to str
- Add __post_init__ validation to ensure ray_placement_strategy is valid
- Remove unused Literal import
Fixes#975
|`cpu`| integer |`8`| Number of CPU cores required per GPU |
951
-
|`gpu`| integer |`0`| Number of GPU units required. Used only when allocating pods. |
952
-
|`mem`| integer |`32`| Amount of memory (GB) required per GPU |
953
-
|`port_count`| integer |`2`| Number of ports to expose |
954
-
|`image`| string |`"/storage/openpsi/images/areal-latest.sif"`| Docker/Singularity container image to use. Currently only used by Slurm. Will be potentially used by Kubernetes in the future. |
955
-
|`task_type`| string |`"worker"`| Task type (e.g., worker, engine) **Choices:**`worker`, `engine`|
956
-
|`env_vars`|`dict`|**Required**| Environment variables for the container |
957
-
|`cmd`| string \| None |`None`| Command to execute inside the container. Defaults to AReaL's RPC server. |
958
-
|`srun_additional_args`| string |`"--unbuffered --mpi=pmi2 -K --chdir $PWD"`| Additional arguments to pass to the srun command. Only used by slurm. |
959
-
|`additional_bash_cmds`| list of string \| None |`None`| Additional bash commands to setup the container before running the torchrun command. Only used by slurm. |
960
-
|`container_type`| string |`"apptainer"`| Type of containers used in slurm **Choices:**`apptainer`, `none`|
961
-
|`mount`| string |`"/storage:/storage"`| Mount path for slurm. |
|`cpu`| integer |`8`| Number of CPU cores required per GPU |
951
+
|`gpu`| integer |`0`| Number of GPU units required. Used only when allocating pods. |
952
+
|`mem`| integer |`32`| Amount of memory (GB) required per GPU |
953
+
|`port_count`| integer |`2`| Number of ports to expose |
954
+
|`image`| string |`"/storage/openpsi/images/areal-latest.sif"`| Docker/Singularity container image to use. Currently only used by Slurm. Will be potentially used by Kubernetes in the future. |
955
+
|`task_type`| string |`"worker"`| Task type (e.g., worker, engine) **Choices:**`worker`, `engine`|
956
+
|`env_vars`|`dict`|**Required**| Environment variables for the container |
957
+
|`cmd`| string \| None |`None`| Command to execute inside the container. Defaults to AReaL's RPC server. |
958
+
|`srun_additional_args`| string |`"--unbuffered --mpi=pmi2 -K --chdir $PWD"`| Additional arguments to pass to the srun command. Only used by slurm. |
959
+
|`additional_bash_cmds`| list of string \| None |`None`| Additional bash commands to setup the container before running the torchrun command. Only used by slurm. |
960
+
|`container_type`| string |`"apptainer"`| Type of containers used in slurm **Choices:**`apptainer`, `none`|
961
+
|`mount`| string |`"/storage:/storage"`| Mount path for slurm. |
|`ray_placement_strategy`| string |`"shared"`| Which placement strategy to use for Ray scheduling. Shared will produce 1 placement group for all workers in the role (training). Separate will 1 placement group per worker (rollout). Deferred will do the same as separate but defers accelerator scheduling (multinode rollout). **Choices:**`shared`, `separate`, `deferred`|
0 commit comments