bghira
diff --git a/‎documentation/quickstart/LTXVIDEO2.es.md‎
Lines changed: 5 additions & 4 deletions b/‎documentation/quickstart/LTXVIDEO2.es.md‎
Lines changed: 5 additions & 4 deletions
diff --git a/‎documentation/quickstart/LTXVIDEO2.hi.md‎
Lines changed: 5 additions & 4 deletions b/‎documentation/quickstart/LTXVIDEO2.hi.md‎
Lines changed: 5 additions & 4 deletions
diff --git a/‎documentation/quickstart/LTXVIDEO2.ja.md‎
Lines changed: 5 additions & 4 deletions b/‎documentation/quickstart/LTXVIDEO2.ja.md‎
Lines changed: 5 additions & 4 deletions
diff --git a/‎documentation/quickstart/LTXVIDEO2.md‎
Lines changed: 4 additions & 4 deletions b/‎documentation/quickstart/LTXVIDEO2.md‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎documentation/quickstart/LTXVIDEO2.pt-BR.md‎
Lines changed: 5 additions & 4 deletions b/‎documentation/quickstart/LTXVIDEO2.pt-BR.md‎
Lines changed: 5 additions & 4 deletions
diff --git a/‎documentation/quickstart/LTXVIDEO2.zh.md‎
Lines changed: 5 additions & 4 deletions b/‎documentation/quickstart/LTXVIDEO2.zh.md‎
Lines changed: 5 additions & 4 deletions
diff --git a/‎setup.py‎
Lines changed: 1 addition & 1 deletion b/‎setup.py‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎simpletuner/helpers/models/ltxvideo2/autoencoder.py‎
Lines changed: 20 additions & 3 deletions b/‎simpletuner/helpers/models/ltxvideo2/autoencoder.py‎
Lines changed: 20 additions & 3 deletions
@@ -97,8 +97,8 @@ cp config/config.json.example config/config.json
 Ajustes clave para LTX Video 2:
 
 - `model_family`: `ltxvideo2`
-- `model_flavour`: `dev` (predeterminado), `dev-fp4` o `dev-fp8`.
-- `pretrained_model_name_or_path`: `Lightricks/LTX-2` (repositorio con el checkpoint combinado) o un archivo `.safetensors` local.
+- `model_flavour`: `dev` (predeterminado), `dev-fp4`, `dev-fp8`, `2.3-dev` o `2.3-distilled`.
+- `pretrained_model_name_or_path`: `Lightricks/LTX-2`, `dg845/LTX-2.3-Diffusers`, `dg845/LTX-2.3-Distilled-Diffusers` o un archivo `.safetensors` local.
 - `train_batch_size`: `1`. No aumentes esto a menos que tengas una A100/H100.
 - `validation_resolution`:
   - `512x768` es un valor seguro para pruebas.
@@ -109,8 +109,9 @@ Ajustes clave para LTX Video 2:
 - `validation_guidance`: `5.0`.
 - `frame_rate`: Por defecto es 25.
 
-LTX-2 se distribuye como un único checkpoint `.safetensors` que incluye el transformer, el VAE de video,
-el VAE de audio y el vocoder. SimpleTuner carga directamente desde ese archivo combinado según `model_flavour` (dev/dev-fp4/dev-fp8).
+Las variantes LTX-2 2.0 se distribuyen como un único checkpoint `.safetensors` que incluye el transformer, el VAE de video,
+el VAE de audio y el vocoder. Para LTX-2.3, SimpleTuner carga el repositorio Diffusers correspondiente según `model_flavour`
+(`2.3-dev` o `2.3-distilled`).
 
 ### Opcional: optimizaciones de VRAM
 
 
@@ -97,8 +97,8 @@ cp config/config.json.example config/config.json
 LTX Video 2 के लिए key settings:
 
 - `model_family`: `ltxvideo2`
-- `model_flavour`: `dev` (डिफ़ॉल्ट), `dev-fp4` या `dev-fp8`।
-- `pretrained_model_name_or_path`: `Lightricks/LTX-2` (combined checkpoint वाला repo) या local `.safetensors` फ़ाइल।
+- `model_flavour`: `dev` (डिफ़ॉल्ट), `dev-fp4`, `dev-fp8`, `2.3-dev` या `2.3-distilled`।
+- `pretrained_model_name_or_path`: `Lightricks/LTX-2`, `dg845/LTX-2.3-Diffusers`, `dg845/LTX-2.3-Distilled-Diffusers` या local `.safetensors` फ़ाइल।
 - `train_batch_size`: `1`। इसे तब तक न बढ़ाएँ जब तक आपके पास A100/H100 न हो।
 - `validation_resolution`:
   - `512x768` परीक्षण के लिए सुरक्षित डिफ़ॉल्ट है।
@@ -109,8 +109,9 @@ LTX Video 2 के लिए key settings:
 - `validation_guidance`: `5.0`.
 - `frame_rate`: डिफ़ॉल्ट 25 है।
 
-LTX-2 एक `.safetensors` checkpoint के रूप में आता है जिसमें transformer, video VAE, audio VAE, और vocoder शामिल हैं।
-SimpleTuner इसे `model_flavour` (dev/dev-fp4/dev-fp8) के आधार पर इसी combined फ़ाइल से लोड करता है।
+LTX-2 2.0 variants एक `.safetensors` checkpoint के रूप में आते हैं जिनमें transformer, video VAE, audio VAE, और vocoder शामिल हैं।
+LTX-2.3 के लिए, SimpleTuner `model_flavour` के आधार पर संबंधित Diffusers repo लोड करता है
+(`2.3-dev` या `2.3-distilled`)।
 
 ### वैकल्पिक: VRAM ऑप्टिमाइज़ेशन
 
 
@@ -98,8 +98,8 @@ cp config/config.json.example config/config.json
 LTX Video 2 の主要設定:
 
 - `model_family`: `ltxvideo2`
-- `model_flavour`: `dev` (デフォルト)、`dev-fp4`、`dev-fp8`。
-- `pretrained_model_name_or_path`: `Lightricks/LTX-2`（combined checkpoint の repo）またはローカル `.safetensors` ファイル。
+- `model_flavour`: `dev` (デフォルト)、`dev-fp4`、`dev-fp8`、`2.3-dev`、`2.3-distilled`。
+- `pretrained_model_name_or_path`: `Lightricks/LTX-2`、`dg845/LTX-2.3-Diffusers`、`dg845/LTX-2.3-Distilled-Diffusers`、またはローカル `.safetensors` ファイル。
 - `train_batch_size`: `1`。A100/H100 以外では増やさないでください。
 - `validation_resolution`:
   - `512x768` がテスト向けの安全なデフォルト。
@@ -110,8 +110,9 @@ LTX Video 2 の主要設定:
 - `validation_guidance`: `5.0`。
 - `frame_rate`: デフォルトは 25。
 
-LTX-2 は transformer / video VAE / audio VAE / vocoder を含む `.safetensors` 単体チェックポイントで配布されます。
-SimpleTuner は `model_flavour` (dev/dev-fp4/dev-fp8) に合わせてこの combined ファイルから読み込みます。
+LTX-2 2.0 系は transformer / video VAE / audio VAE / vocoder を含む `.safetensors` 単体チェックポイントで配布されます。
+LTX-2.3 では、SimpleTuner は `model_flavour` に応じた Diffusers リポジトリ
+(`2.3-dev` または `2.3-distilled`) を読み込みます。
 
 ### 任意: VRAM 最適化
 
 
@@ -97,8 +97,8 @@ cp config/config.json.example config/config.json
 Key settings for LTX Video 2:
 
 - `model_family`: `ltxvideo2`
-- `model_flavour`: `dev` (default), `dev-fp4`, or `dev-fp8`.
-- `pretrained_model_name_or_path`: `Lightricks/LTX-2` (Hub repo with the combined checkpoint) or a local `.safetensors` file.
+- `model_flavour`: `dev` (default), `dev-fp4`, `dev-fp8`, `2.3-dev`, or `2.3-distilled`.
+- `pretrained_model_name_or_path`: `Lightricks/LTX-2`, `dg845/LTX-2.3-Diffusers`, `dg845/LTX-2.3-Distilled-Diffusers`, or a local `.safetensors` file.
 - `train_batch_size`: `1`. Do not increase this unless you have an A100/H100.
 - `validation_resolution`:
   - `512x768` is a safe default for testing.
@@ -109,8 +109,8 @@ Key settings for LTX Video 2:
 - `validation_guidance`: `5.0`.
 - `frame_rate`: Default is 25.
 
-LTX-2 ships as a single `.safetensors` checkpoint that includes the transformer, video VAE, audio VAE, and vocoder.
-SimpleTuner loads from this combined file directly based on `model_flavour` (dev/dev-fp4/dev-fp8).
+LTX-2 2.0 flavours ship as a single `.safetensors` checkpoint that includes the transformer, video VAE, audio VAE, and vocoder.
+For LTX-2.3, SimpleTuner loads the matching Diffusers repo selected by `model_flavour` (`2.3-dev` or `2.3-distilled`).
 
 ### Optional: VRAM optimizations
 
 
@@ -97,8 +97,8 @@ cp config/config.json.example config/config.json
 Configurações-chave para LTX Video 2:
 
 - `model_family`: `ltxvideo2`
-- `model_flavour`: `dev` (padrão), `dev-fp4` ou `dev-fp8`.
-- `pretrained_model_name_or_path`: `Lightricks/LTX-2` (repositório com o checkpoint combinado) ou um arquivo `.safetensors` local.
+- `model_flavour`: `dev` (padrão), `dev-fp4`, `dev-fp8`, `2.3-dev` ou `2.3-distilled`.
+- `pretrained_model_name_or_path`: `Lightricks/LTX-2`, `dg845/LTX-2.3-Diffusers`, `dg845/LTX-2.3-Distilled-Diffusers` ou um arquivo `.safetensors` local.
 - `train_batch_size`: `1`. Não aumente isso a menos que você tenha um A100/H100.
 - `validation_resolution`:
   - `512x768` é um padrão seguro para testes.
@@ -109,8 +109,9 @@ Configurações-chave para LTX Video 2:
 - `validation_guidance`: `5.0`.
 - `frame_rate`: O padrão é 25.
 
-O LTX-2 é distribuído como um único checkpoint `.safetensors` que inclui o transformer, o VAE de vídeo,
-o VAE de áudio e o vocoder. O SimpleTuner carrega desse arquivo combinado conforme o `model_flavour` (dev/dev-fp4/dev-fp8).
+As variantes LTX-2 2.0 são distribuídas como um único checkpoint `.safetensors` que inclui o transformer, o VAE de vídeo,
+o VAE de áudio e o vocoder. Para o LTX-2.3, o SimpleTuner carrega o repositório Diffusers correspondente ao `model_flavour`
+(`2.3-dev` ou `2.3-distilled`).
 
 ### Opcional: otimizações de VRAM
 
 
@@ -98,8 +98,8 @@ cp config/config.json.example config/config.json
 LTX Video 2 的关键设置：
 
 - `model_family`: `ltxvideo2`
-- `model_flavour`: `dev`（默认）、`dev-fp4` 或 `dev-fp8`。
-- `pretrained_model_name_or_path`: `Lightricks/LTX-2`（包含 combined checkpoint 的仓库）或本地 `.safetensors` 文件。
+- `model_flavour`: `dev`（默认）、`dev-fp4`、`dev-fp8`、`2.3-dev` 或 `2.3-distilled`。
+- `pretrained_model_name_or_path`: `Lightricks/LTX-2`、`dg845/LTX-2.3-Diffusers`、`dg845/LTX-2.3-Distilled-Diffusers` 或本地 `.safetensors` 文件。
 - `train_batch_size`: `1`。除非有 A100/H100，否则不要提高。
 - `validation_resolution`:
   - `512x768` 是安全的测试默认值。
@@ -110,8 +110,9 @@ LTX Video 2 的关键设置：
 - `validation_guidance`: `5.0`。
 - `frame_rate`: 默认 25。
 
-LTX-2 以单个 `.safetensors` checkpoint 形式发布，包含 transformer、视频 VAE、音频 VAE 和 vocoder。
-SimpleTuner 会根据 `model_flavour`（dev/dev-fp4/dev-fp8）从该 combined 文件加载。
+LTX-2 2.0 变体以单个 `.safetensors` checkpoint 形式发布，包含 transformer、视频 VAE、音频 VAE 和 vocoder。
+对于 LTX-2.3，SimpleTuner 会根据 `model_flavour` 加载对应的 Diffusers 仓库
+（`2.3-dev` 或 `2.3-distilled`）。
 
 ### 可选：VRAM 优化
 
 
@@ -324,7 +324,7 @@ def _collect_package_files(*directories: str):
     "peft-singlora>=0.2.0",
     "vector-quantize-pytorch>=1.27.15",
     "cryptography>=41.0.0",
-    "torchcodec>=0.8.1",
+    "torchcodec>=0.10.0",
     "sdnq>=0.1.2",
     "aiosqlite>=0.19.0",
     "httpx>=0.28.0",
 
@@ -352,7 +352,7 @@ def forward(self, hidden_states: torch.Tensor, causal: bool = True) -> torch.Ten
 
 
 # Like LTX 1.0 LTXVideoUpsampler3d, but uses new causal Conv3d
-class LTXVideoUpsampler3d(nn.Module):
+class LTX2VideoUpsampler3d(nn.Module):
     def __init__(
         self,
         in_channels: int,
@@ -647,6 +647,7 @@ def __init__(
         resnet_eps: float = 1e-6,
         resnet_act_fn: str = "swish",
         spatio_temporal_scale: bool = True,
+        upsample_type: str = "spatiotemporal",
         inject_noise: bool = False,
         timestep_conditioning: bool = False,
         upsample_residual: bool = False,
@@ -676,11 +677,19 @@ def __init__(
 
         self.upsamplers = None
         if spatio_temporal_scale:
+            if upsample_type == "spatial":
+                stride = (1, 2, 2)
+            elif upsample_type == "temporal":
+                stride = (2, 1, 1)
+            elif upsample_type == "spatiotemporal":
+                stride = (2, 2, 2)
+            else:
+                raise ValueError(f"Unsupported upsample_type: {upsample_type}")
             self.upsamplers = nn.ModuleList(
                 [
-                    LTXVideoUpsampler3d(
+                    LTX2VideoUpsampler3d(
                         out_channels * upscale_factor,
-                        stride=(2, 2, 2),
+                        stride=stride,
                         residual=upsample_residual,
                         upscale_factor=upscale_factor,
                         spatial_padding_mode=spatial_padding_mode,
@@ -935,6 +944,7 @@ def __init__(
         is_causal: bool = False,
         inject_noise: Tuple[bool, ...] = (False, False, False),
         timestep_conditioning: bool = False,
+        upsample_type: Tuple[str, ...] = ("spatiotemporal", "spatiotemporal", "spatiotemporal"),
         upsample_residual: Tuple[bool, ...] = (True, True, True),
         upsample_factor: Tuple[bool, ...] = (2, 2, 2),
         spatial_padding_mode: str = "reflect",
@@ -950,6 +960,7 @@ def __init__(
         spatio_temporal_scaling = tuple(reversed(spatio_temporal_scaling))
         layers_per_block = tuple(reversed(layers_per_block))
         inject_noise = tuple(reversed(inject_noise))
+        upsample_type = tuple(reversed(upsample_type))
         upsample_residual = tuple(reversed(upsample_residual))
         upsample_factor = tuple(reversed(upsample_factor))
         output_channel = block_out_channels[0]
@@ -984,6 +995,7 @@ def __init__(
                 num_layers=layers_per_block[i + 1],
                 resnet_eps=resnet_norm_eps,
                 spatio_temporal_scale=spatio_temporal_scaling[i],
+                upsample_type=upsample_type[i],
                 inject_noise=inject_noise[i + 1],
                 timestep_conditioning=timestep_conditioning,
                 upsample_residual=upsample_residual[i],
@@ -1067,6 +1079,9 @@ def forward(
         return hidden_states
 
 
+LTXVideoUpsampler3d = LTX2VideoUpsampler3d
+
+
 class AutoencoderKLLTX2Video(ModelMixin, AutoencoderMixin, ConfigMixin, FromOriginalModelMixin):
     r"""
     A VAE model with KL loss for encoding images into latents and decoding latent representations into images. Used in
@@ -1129,6 +1144,7 @@ def __init__(
         decoder_spatio_temporal_scaling: Tuple[bool, ...] = (True, True, True),
         decoder_inject_noise: Tuple[bool, ...] = (False, False, False, False),
         downsample_type: Tuple[str, ...] = ("spatial", "temporal", "spatiotemporal", "spatiotemporal"),
+        upsample_type: Tuple[str, ...] = ("spatiotemporal", "spatiotemporal", "spatiotemporal"),
         upsample_residual: Tuple[bool, ...] = (True, True, True),
         upsample_factor: Tuple[int, ...] = (2, 2, 2),
         timestep_conditioning: bool = False,
@@ -1171,6 +1187,7 @@ def __init__(
             is_causal=decoder_causal,
             timestep_conditioning=timestep_conditioning,
             inject_noise=decoder_inject_noise,
+            upsample_type=upsample_type,
             upsample_residual=upsample_residual,
             upsample_factor=upsample_factor,
             spatial_padding_mode=decoder_spatial_padding_mode,