inclusionAI
diff --git a/‎.claude/agents/fsdp-engine-expert.md‎
Lines changed: 15 additions & 15 deletions b/‎.claude/agents/fsdp-engine-expert.md‎
Lines changed: 15 additions & 15 deletions
diff --git a/‎.claude/agents/launcher-scheduler-expert.md‎
Lines changed: 5 additions & 5 deletions b/‎.claude/agents/launcher-scheduler-expert.md‎
Lines changed: 5 additions & 5 deletions
diff --git a/‎.claude/agents/megatron-engine-expert.md‎
Lines changed: 7 additions & 6 deletions b/‎.claude/agents/megatron-engine-expert.md‎
Lines changed: 7 additions & 6 deletions
diff --git a/‎.claude/data/pr-review-change-types.md‎
Lines changed: 26 additions & 26 deletions b/‎.claude/data/pr-review-change-types.md‎
Lines changed: 26 additions & 26 deletions
diff --git a/‎.claude/hooks/check-expert-update.sh‎
Lines changed: 1 addition & 1 deletion b/‎.claude/hooks/check-expert-update.sh‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.claude/rules/distributed.md‎
Lines changed: 1 addition & 1 deletion b/‎.claude/rules/distributed.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.claude/skills/debug-distributed/SKILL.md‎
Lines changed: 1 addition & 1 deletion b/‎.claude/skills/debug-distributed/SKILL.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎AGENTS.md‎
Lines changed: 9 additions & 2 deletions b/‎AGENTS.md‎
Lines changed: 9 additions & 2 deletions
diff --git a/‎CLAUDE.md‎
Lines changed: 7 additions & 1 deletion b/‎CLAUDE.md‎
Lines changed: 7 additions & 1 deletion
diff --git a/‎areal/utils/distributed.py‎ ‎areal/engine/core/distributed.py‎areal/utils/distributed.py renamed to areal/engine/core/distributed.py b/‎areal/utils/distributed.py‎ ‎areal/engine/core/distributed.py‎areal/utils/distributed.py renamed to areal/engine/core/distributed.py
@@ -164,13 +164,13 @@ algorithm-specific subclasses
 
 - `areal/api/alloc_mode.py` - `FSDPParallelStrategy` (inherits from `ParallelStrategy`)
   for FSDP-specific parallel dimensions
-- `areal/utils/fsdp/parallel.py` - `ParallelHelper` class for mesh construction and
-  dimension validation
+- `areal/engine/fsdp_utils/parallel.py` - `ParallelHelper` class for mesh construction
+  and dimension validation
 
 **Model Parallelism Implementations**:
 
-- **Tensor Parallelism**: `areal/utils/fsdp/parallel.py` - `apply_non_moe_tp()` and
-  `parallelize_model()` for TP integration
+- **Tensor Parallelism**: `areal/engine/fsdp_utils/parallel.py` - `apply_non_moe_tp()`
+  and `parallelize_model()` for TP integration
 - **Sequence Parallelism (Ulysses)**: `areal/models/fsdp/ulysses.py` - Ulysses SP
   communication primitives and input preparation
 - **Context Parallelism**: Integrated via Ulysses sequence parallel groups
@@ -183,9 +183,9 @@ algorithm-specific subclasses
 
 **FSDP2 Wrapping and Sharding**:
 
-- `areal/utils/fsdp/__init__.py` - `apply_fsdp2()` for FSDP2 module wrapping with mixed
-  precision and offload policies
-- `areal/utils/fsdp/parallel.py` - `parallelize_model()` orchestrates TP + FSDP2
+- `areal/engine/fsdp_utils/__init__.py` - `apply_fsdp2()` for FSDP2 module wrapping with
+  mixed precision and offload policies
+- `areal/engine/fsdp_utils/parallel.py` - `parallelize_model()` orchestrates TP + FSDP2
   application
 
 **Model-Specific Components**:
@@ -198,15 +198,15 @@ algorithm-specific subclasses
 
 **Utilities**:
 
-- **Checkpointing**: `areal/utils/fsdp/checkpoint.py` - `DCPState` wrapper for
+- **Checkpointing**: `areal/engine/fsdp_utils/checkpoint.py` - `DCPState` wrapper for
   distributed checkpoint (DCP) integration
-- **Gradient Handling**: `areal/utils/fsdp/grad.py` - `fsdp2_clip_grad_norm()` with
-  TP/DP/PP-aware gradient norm computation
-- **Optimizer**: `areal/utils/fsdp/optimizer.py` - `AnyPrecisionAdamW` for
+- **Gradient Handling**: `areal/engine/fsdp_utils/grad.py` - `fsdp2_clip_grad_norm()`
+  with TP/DP/PP-aware gradient norm computation
+- **Optimizer**: `areal/engine/fsdp_utils/optimizer.py` - `AnyPrecisionAdamW` for
   mixed-precision training with Kahan summation
-- **Multi-Tensor Operations**: `areal/utils/fsdp/multi_tensor_apply.py` - Fallback
-  implementations when Transformer Engine/Apex unavailable
-- **State Dict Loading**: `areal/utils/fsdp/__init__.py` -
+- **Multi-Tensor Operations**: `areal/engine/fsdp_utils/multi_tensor_apply.py` -
+  Fallback implementations when Transformer Engine/Apex unavailable
+- **State Dict Loading**: `areal/engine/fsdp_utils/__init__.py` -
   `fsdp2_load_full_state_dict()` for broadcast loading from rank 0
 
 **Algorithm-Specific Subclasses** (in `areal/engine/fsdp_engine.py`):
@@ -249,7 +249,7 @@ algorithm-specific subclasses
 
 - **Main implementation**: `areal/engine/fsdp_engine.py`
 - **Configuration**: `areal/api/cli_args.py` (`TrainEngineConfig`)
-- **Utilities**: `areal/utils/fsdp/` directory for checkpointing, gradients,
+- **Utilities**: `areal/engine/fsdp_utils/` directory for checkpointing, gradients,
   optimization, and parallel helpers
 - **Model components**: `areal/models/fsdp/` for Ulysses sequence parallelism
 - **Examples**: YAML configuration files in `examples/` directory
 
@@ -67,7 +67,7 @@ Located in `areal/api/cli_args.py`:
 ClusterSpecConfig -> Launcher -> BASE_ENVIRONS + thread vars -> Worker processes
 ```
 
-Critical utilities in `areal/utils/launcher.py`:
+Critical utilities in `areal/infra/utils/launcher.py`:
 
 - `BASE_ENVIRONS`: Essential runtime variables (PyTorch cache, Triton, tokenizers)
 - `get_thread_env_vars()`: CPU thread control based on allocated cores
@@ -102,8 +102,8 @@ Critical utilities in `areal/utils/launcher.py`:
   IP/hostname assumptions
 - Raise specific exceptions from `areal.infra.scheduler.exceptions` -> not generic
   exception types
-- Use `areal.utils.proc.kill_process_tree()` for process termination -> not leaving
-  zombie processes
+- Use `areal.infra.utils.proc.kill_process_tree()` for process termination -> not
+  leaving zombie processes
 - Propagate all `BASE_ENVIRONS` variables and thread control variables -> not missing
   environment variable propagation
 - Use `areal.utils.network.find_free_ports()` for port allocation -> not static port
@@ -143,7 +143,7 @@ Critical utilities in `areal/utils/launcher.py`:
 | `areal/infra/scheduler/local.py`        | Local worker scheduling            | GPU round-robin, port allocation, health monitoring       |
 | `areal/infra/scheduler/slurm.py`        | Slurm-integrated scheduling        | Job array coordination, resource reservation              |
 | `areal/infra/scheduler/ray.py`          | Ray cluster scheduling             | Ray placement groups, actor-based worker management       |
-| `areal/utils/launcher.py`               | Shared utilities                   | Environment variable management, configuration validation |
+| `areal/infra/utils/launcher.py`         | Shared utilities                   | Environment variable management, configuration validation |
 
 ______________________________________________________________________
 
@@ -175,7 +175,7 @@ Activation: Manual (when requested) for launcher/scheduler topics
 3. Document any new environment variables or configuration requirements
 
 ### When Utility Functions Change
-1. Update references to `areal/utils/launcher.py` functions
+1. Update references to `areal/infra/utils/launcher.py` functions
 2. Adjust "Environment Variable Propagation Pattern" if BASE_ENVIRONS changes
 3. Update diagnostic steps that rely on specific utility functions
 
 
@@ -47,7 +47,7 @@ Key architectural principles:
   implementing distributed training coordination
 - **`ParallelStrategy`** (`areal/api/alloc_mode.py`): Configuration dataclass for
   parallel dimensions
-- **`MegatronCheckpointer`** (`areal/utils/megatron_checkpointer.py`): Checkpoint
+- **`MegatronCheckpointer`** (`areal/engine/megatron_utils/checkpointer.py`): Checkpoint
   handling for distributed state
 
 ### Key Methods
@@ -210,12 +210,13 @@ distributed training coordination
 
 - `areal/api/alloc_mode.py` - `MegatronParallelStrategy` (inherits from
   `ParallelStrategy`) for Megatron-specific parallel dimensions
-- `areal/utils/megatron.py` - Core Megatron utilities and helper functions
+- `areal/engine/megatron_utils/megatron.py` - Core Megatron utilities and helper
+  functions
 
 **Checkpointing and State Management**:
 
-- `areal/utils/megatron_checkpointer.py` - `MegatronCheckpointer` class for distributed
-  checkpoint handling
+- `areal/engine/megatron_utils/checkpointer.py` - `MegatronCheckpointer` class for
+  distributed checkpoint handling
 - Integrated with Megatron Core checkpoint system for pipeline-parallel models
 
 **Model Parallelism Implementations**:
@@ -257,8 +258,8 @@ distributed training coordination
 - **Main implementation**: `areal/engine/megatron_engine.py`
 - **Configuration**: `areal/api/cli_args.py` (`TrainEngineConfig` with
   `MegatronEngineConfig`)
-- **Checkpointing**: `areal/utils/megatron_checkpointer.py`
-- **Utilities**: `areal/utils/megatron.py` for core Megatron utilities
+- **Checkpointing**: `areal/engine/megatron_utils/checkpointer.py`
+- **Utilities**: `areal/engine/megatron_utils/megatron.py` for core Megatron utilities
 - **Examples**: YAML configuration files in `examples/` and
   `areal/tests/sft/config_megatron.yaml`
 
 
@@ -7,16 +7,16 @@ ______________________________________________________________________
 
 ## CRITICAL Level (Must use Opus)
 
-| Change Type            | File Path Pattern                                             | Code Pattern                                                |
-| ---------------------- | ------------------------------------------------------------- | ----------------------------------------------------------- |
-| **ARCHON_CORE**        | `areal/experimental/models/archon/`                           | -                                                           |
-| **ARCHON_PARALLEL**    | `parallel_dims.py`                                            | `ArchonParallelDims`, `_build_mesh`, `DeviceMesh`           |
-| **ARCHON_MOE**         | `archon/moe/`                                                 | `router`, `grouped_experts`, `TokenReorderer`, `grouped_mm` |
-| **ARCHON_PARALLELIZE** | `qwen*/infra/parallelize.py`                                  | `apply_moe_ep_tp`, `apply_tp`, `apply_cp`                   |
-| **ARCHON_ENGINE**      | `areal/experimental/engine/archon_engine.py`                  | `ArchonEngine`                                              |
-| **FSDP_CORE**          | `areal/utils/fsdp/`, `areal/engine/fsdp_engine.py`            | `FSDP`, `FullyShardedDataParallel`, `fully_shard`           |
-| **MEGATRON_CORE**      | `areal/engine/megatron_engine.py`, `areal/utils/megatron*.py` | `MegatronEngine`                                            |
-| **DCP_CHECKPOINT**     | -                                                             | `DCP`, `DistributedCheckpoint`, `dcp.save`, `dcp.load`      |
+| Change Type            | File Path Pattern                                                 | Code Pattern                                                |
+| ---------------------- | ----------------------------------------------------------------- | ----------------------------------------------------------- |
+| **ARCHON_CORE**        | `areal/experimental/models/archon/`                               | -                                                           |
+| **ARCHON_PARALLEL**    | `parallel_dims.py`                                                | `ArchonParallelDims`, `_build_mesh`, `DeviceMesh`           |
+| **ARCHON_MOE**         | `archon/moe/`                                                     | `router`, `grouped_experts`, `TokenReorderer`, `grouped_mm` |
+| **ARCHON_PARALLELIZE** | `qwen*/infra/parallelize.py`                                      | `apply_moe_ep_tp`, `apply_tp`, `apply_cp`                   |
+| **ARCHON_ENGINE**      | `areal/experimental/engine/archon_engine.py`                      | `ArchonEngine`                                              |
+| **FSDP_CORE**          | `areal/engine/fsdp_utils/`, `areal/engine/fsdp_engine.py`         | `FSDP`, `FullyShardedDataParallel`, `fully_shard`           |
+| **MEGATRON_CORE**      | `areal/engine/megatron_engine.py`, `areal/engine/megatron_utils/` | `MegatronEngine`                                            |
+| **DCP_CHECKPOINT**     | -                                                                 | `DCP`, `DistributedCheckpoint`, `dcp.save`, `dcp.load`      |
 
 ## HIGH Level (Recommend Opus)
 
@@ -33,19 +33,19 @@ ______________________________________________________________________
 
 ## MEDIUM Level (Use Sonnet)
 
-| Change Type             | File Path Pattern                                                                  | Code Pattern                                                             |
-| ----------------------- | ---------------------------------------------------------------------------------- | ------------------------------------------------------------------------ |
-| **TENSOR_OPS**          | -                                                                                  | `.view(`, `.reshape(`, `dtype=`, `.detach()`, `no_grad`, `.contiguous()` |
-| **NUMERICAL**           | -                                                                                  | `log(`, `softmax`, `cross_entropy`, `eps=`, `.clamp(`, `nan`, `inf`      |
-| **WORKFLOW_ENGINE**     | `areal/workflow/`, `areal/engine/`                                                 | `arun_episode`, `agenerate`, `RolloutWorkflow`                           |
-| **API_CONFIG**          | `areal/api/`                                                                       | `@dataclass`, `__post_init__`, `field(`                                  |
-| **COMPILE**             | -                                                                                  | `torch.compile`, `_dynamo`, `mark_dynamic`, `fullgraph`                  |
-| **ACTIVATION_CKPT**     | `activation_checkpoint.py`                                                         | `activation_checkpoint`, `checkpoint_wrapper`, `selective_checkpoint`    |
-| **CHECKPOINT_RECOVERY** | `areal/utils/saver.py`, `areal/utils/recover.py`, `areal/utils/fsdp/checkpoint.py` | `state_dict`, `load_state_dict`, `checkpoint`                            |
-| **REWARD**              | `areal/reward/`                                                                    | `reward_fn`, `AsyncRewardWrapper`, `MathVerifyWorker`                    |
-| **DATASET**             | `areal/dataset/`                                                                   | `get_*_dataset`, `DataLoader`, `IterableDataset`                         |
-| **LAUNCHER_SCHEDULER**  | `areal/infra/launcher/`, `areal/infra/scheduler/`, `areal/infra/rpc/`              | `LaunchConfig`, `Scheduler`, `RayLauncher`, `SlurmLauncher`              |
-| **ATTENTION**           | `attention/`, `attention/sdpa.py`, `attention/varlen.py`                           | `flash_attn`, `sdpa`, `varlen`, `causal_mask`                            |
+| Change Type             | File Path Pattern                                                                         | Code Pattern                                                             |
+| ----------------------- | ----------------------------------------------------------------------------------------- | ------------------------------------------------------------------------ |
+| **TENSOR_OPS**          | -                                                                                         | `.view(`, `.reshape(`, `dtype=`, `.detach()`, `no_grad`, `.contiguous()` |
+| **NUMERICAL**           | -                                                                                         | `log(`, `softmax`, `cross_entropy`, `eps=`, `.clamp(`, `nan`, `inf`      |
+| **WORKFLOW_ENGINE**     | `areal/workflow/`, `areal/engine/`                                                        | `arun_episode`, `agenerate`, `RolloutWorkflow`                           |
+| **API_CONFIG**          | `areal/api/`                                                                              | `@dataclass`, `__post_init__`, `field(`                                  |
+| **COMPILE**             | -                                                                                         | `torch.compile`, `_dynamo`, `mark_dynamic`, `fullgraph`                  |
+| **ACTIVATION_CKPT**     | `activation_checkpoint.py`                                                                | `activation_checkpoint`, `checkpoint_wrapper`, `selective_checkpoint`    |
+| **CHECKPOINT_RECOVERY** | `areal/utils/saver.py`, `areal/utils/recover.py`, `areal/engine/fsdp_utils/checkpoint.py` | `state_dict`, `load_state_dict`, `checkpoint`                            |
+| **REWARD**              | `areal/reward/`                                                                           | `reward_fn`, `AsyncRewardWrapper`, `MathVerifyWorker`                    |
+| **DATASET**             | `areal/dataset/`                                                                          | `get_*_dataset`, `DataLoader`, `IterableDataset`                         |
+| **LAUNCHER_SCHEDULER**  | `areal/infra/launcher/`, `areal/infra/scheduler/`, `areal/infra/rpc/`                     | `LaunchConfig`, `Scheduler`, `RayLauncher`, `SlurmLauncher`              |
+| **ATTENTION**           | `attention/`, `attention/sdpa.py`, `attention/varlen.py`                                  | `flash_attn`, `sdpa`, `varlen`, `causal_mask`                            |
 
 ## LOW Level (Use Haiku)
 
@@ -131,14 +131,14 @@ ______________________________________________________________________
 
 **FSDP Core**:
 
-- `areal/utils/fsdp/`
+- `areal/engine/fsdp_utils/`
 - `areal/engine/fsdp_engine.py`
 
 **Megatron Core**:
 
 - `areal/engine/megatron_engine.py`
-- `areal/utils/megatron.py`
-- `areal/utils/megatron_checkpointer.py`
+- `areal/engine/megatron_utils/megatron.py`
+- `areal/engine/megatron_utils/checkpointer.py`
 
 **Trainer Core**:
 
 
@@ -32,7 +32,7 @@ check_expert_update() {
 
     # FSDP Engine related
     if [[ "$file" == *"areal/engine/fsdp_engine"* ]] || \
-       [[ "$file" == *"areal/utils/fsdp/"* ]]; then
+       [[ "$file" == *"areal/engine/fsdp_utils/"* ]]; then
         reminder_file="fsdp-engine-expert.md"
         reminder_desc="FSDP"
     fi
 
@@ -2,7 +2,7 @@
 paths:
   - areal/engine/**
   - areal/experimental/**
-  - areal/utils/fsdp/**
+  - areal/engine/fsdp_utils/**
 ---
 
 # Distributed Code Rules
 
@@ -98,7 +98,7 @@ dist.barrier()
 **Timeout Adjustment** (for debugging only):
 
 ```python
-from areal.utils.distributed import patch_dist_group_timeout
+from areal.engine.core.distributed import patch_dist_group_timeout
 from datetime import timedelta
 patch_dist_group_timeout(timedelta(minutes=30))
 ```
 
@@ -40,10 +40,17 @@ When unsure, leave a `TODO(agent)` comment and note the constraint in your respo
   - `areal/infra/` - Core infrastructure including single controller implementation,
     placement and allocation policies, async orchestration primitives, and
     hardware/platform abstractions for CPU/GPU/NPU runtimes.
+    - `areal/infra/utils/` - Infrastructure utilities (launcher, process management,
+      HTTP, concurrency, Slurm/Ray helpers).
   - `areal/dataset/` - Stateful dataset loaders (GSM8K, Geometry3K, CLEVR, HH-RLHF,
     TORL, etc.) and utilities that feed rollout jobs safely.
   - `areal/engine/` - Training backends (FSDP2, Megatron, PPO, SFT, reward modeling) and
     inference adapters (SGLang, vLLM remote engines).
+    - `areal/engine/fsdp_utils/` - FSDP2-specific utilities (checkpoint, gradient,
+      optimizer, parallel helpers).
+    - `areal/engine/megatron_utils/` - Megatron/FP8 utilities (checkpoint, pipeline,
+      quantization).
+    - `areal/engine/core/` - Engine-shared utilities (distributed, model helpers).
   - `areal/experimental/` - Prototype engines/workflows that evolve quickly; expect
     breaking changes.
   - `areal/infra/launcher/` - Launch specs for local, Ray, and Slurm clusters, plus
@@ -56,8 +63,8 @@ When unsure, leave a `TODO(agent)` comment and note the constraint in your respo
     distributed backends).
   - `areal/tools/` - Developer utilities and maintenance scripts tied to the core
     package.
-  - `areal/utils/` - Cross-cutting helpers for logging, tensor ops, stats tracking,
-    checkpoints, and recovery.
+  - `areal/utils/` - Cross-cutting helpers for logging, data processing, stats tracking,
+    checkpoints, recovery, network, and RL functional ops.
   - `areal/workflow/` - Concrete rollout agents implementing `RolloutWorkflow`:
     multi-turn, RLVR, vision RLVR workflows, plus `openai_agent/` for OpenAI Agent-style
     implementations.
 
@@ -12,10 +12,16 @@ learning.
 - `areal/` - Core package
   - `api/` - Config dataclasses, workflow/engine contracts
   - `engine/` - FSDP2, Megatron, SGLang/vLLM adapters
+    - `fsdp_utils/` - FSDP2-specific utilities (checkpoint, grad, optimizer, parallel)
+    - `megatron_utils/` - Megatron/FP8 utilities (checkpoint, pipeline, quantization)
+    - `core/` - Engine-shared utilities (distributed, lock, model, offload)
+  - `infra/` - Infrastructure (launcher, scheduler, RPC)
+    - `utils/` - Infrastructure utilities (launcher, proc, http, concurrent, slurm, ray)
   - `workflow/` - RolloutWorkflow implementations
   - `reward/` - Reward functions
   - `dataset/` - Dataset loaders
-  - `utils/` - Logging, tensor ops, checkpoints
+  - `utils/` - Cross-cutting utilities (logging, data, checkpoints, network, RL
+    functional)
 - `examples/` - Training scripts and configs
 - `docs/` - Jupyter Book source