Skip to content

Commit c82863f

Browse files
authored
Merge pull request #2663 from bghira/main
merge
2 parents 0e90874 + d50d84b commit c82863f

23 files changed

Lines changed: 1661 additions & 327 deletions

documentation/quickstart/LTXVIDEO2.es.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -97,8 +97,8 @@ cp config/config.json.example config/config.json
9797
Ajustes clave para LTX Video 2:
9898

9999
- `model_family`: `ltxvideo2`
100-
- `model_flavour`: `dev` (predeterminado), `dev-fp4` o `dev-fp8`.
101-
- `pretrained_model_name_or_path`: `Lightricks/LTX-2` (repositorio con el checkpoint combinado) o un archivo `.safetensors` local.
100+
- `model_flavour`: `dev` (predeterminado), `dev-fp4`, `dev-fp8`, `2.3-dev` o `2.3-distilled`.
101+
- `pretrained_model_name_or_path`: `Lightricks/LTX-2`, `dg845/LTX-2.3-Diffusers`, `dg845/LTX-2.3-Distilled-Diffusers` o un archivo `.safetensors` local.
102102
- `train_batch_size`: `1`. No aumentes esto a menos que tengas una A100/H100.
103103
- `validation_resolution`:
104104
- `512x768` es un valor seguro para pruebas.
@@ -109,8 +109,9 @@ Ajustes clave para LTX Video 2:
109109
- `validation_guidance`: `5.0`.
110110
- `frame_rate`: Por defecto es 25.
111111

112-
LTX-2 se distribuye como un único checkpoint `.safetensors` que incluye el transformer, el VAE de video,
113-
el VAE de audio y el vocoder. SimpleTuner carga directamente desde ese archivo combinado según `model_flavour` (dev/dev-fp4/dev-fp8).
112+
Las variantes LTX-2 2.0 se distribuyen como un único checkpoint `.safetensors` que incluye el transformer, el VAE de video,
113+
el VAE de audio y el vocoder. Para LTX-2.3, SimpleTuner carga el repositorio Diffusers correspondiente según `model_flavour`
114+
(`2.3-dev` o `2.3-distilled`).
114115

115116
### Opcional: optimizaciones de VRAM
116117

documentation/quickstart/LTXVIDEO2.hi.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -97,8 +97,8 @@ cp config/config.json.example config/config.json
9797
LTX Video 2 के लिए key settings:
9898

9999
- `model_family`: `ltxvideo2`
100-
- `model_flavour`: `dev` (डिफ़ॉल्ट), `dev-fp4` या `dev-fp8`
101-
- `pretrained_model_name_or_path`: `Lightricks/LTX-2` (combined checkpoint वाला repo) या local `.safetensors` फ़ाइल।
100+
- `model_flavour`: `dev` (डिफ़ॉल्ट), `dev-fp4`, `dev-fp8`, `2.3-dev` या `2.3-distilled`
101+
- `pretrained_model_name_or_path`: `Lightricks/LTX-2`, `dg845/LTX-2.3-Diffusers`, `dg845/LTX-2.3-Distilled-Diffusers` या local `.safetensors` फ़ाइल।
102102
- `train_batch_size`: `1`। इसे तब तक न बढ़ाएँ जब तक आपके पास A100/H100 न हो।
103103
- `validation_resolution`:
104104
- `512x768` परीक्षण के लिए सुरक्षित डिफ़ॉल्ट है।
@@ -109,8 +109,9 @@ LTX Video 2 के लिए key settings:
109109
- `validation_guidance`: `5.0`.
110110
- `frame_rate`: डिफ़ॉल्ट 25 है।
111111

112-
LTX-2 एक `.safetensors` checkpoint के रूप में आता है जिसमें transformer, video VAE, audio VAE, और vocoder शामिल हैं।
113-
SimpleTuner इसे `model_flavour` (dev/dev-fp4/dev-fp8) के आधार पर इसी combined फ़ाइल से लोड करता है।
112+
LTX-2 2.0 variants एक `.safetensors` checkpoint के रूप में आते हैं जिनमें transformer, video VAE, audio VAE, और vocoder शामिल हैं।
113+
LTX-2.3 के लिए, SimpleTuner `model_flavour` के आधार पर संबंधित Diffusers repo लोड करता है
114+
(`2.3-dev` या `2.3-distilled`)।
114115

115116
### वैकल्पिक: VRAM ऑप्टिमाइज़ेशन
116117

documentation/quickstart/LTXVIDEO2.ja.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -98,8 +98,8 @@ cp config/config.json.example config/config.json
9898
LTX Video 2 の主要設定:
9999

100100
- `model_family`: `ltxvideo2`
101-
- `model_flavour`: `dev` (デフォルト)、`dev-fp4``dev-fp8`
102-
- `pretrained_model_name_or_path`: `Lightricks/LTX-2`(combined checkpoint の repo)またはローカル `.safetensors` ファイル。
101+
- `model_flavour`: `dev` (デフォルト)、`dev-fp4``dev-fp8``2.3-dev``2.3-distilled`
102+
- `pretrained_model_name_or_path`: `Lightricks/LTX-2``dg845/LTX-2.3-Diffusers``dg845/LTX-2.3-Distilled-Diffusers`またはローカル `.safetensors` ファイル。
103103
- `train_batch_size`: `1`。A100/H100 以外では増やさないでください。
104104
- `validation_resolution`:
105105
- `512x768` がテスト向けの安全なデフォルト。
@@ -110,8 +110,9 @@ LTX Video 2 の主要設定:
110110
- `validation_guidance`: `5.0`
111111
- `frame_rate`: デフォルトは 25。
112112

113-
LTX-2 は transformer / video VAE / audio VAE / vocoder を含む `.safetensors` 単体チェックポイントで配布されます。
114-
SimpleTuner は `model_flavour` (dev/dev-fp4/dev-fp8) に合わせてこの combined ファイルから読み込みます。
113+
LTX-2 2.0 系は transformer / video VAE / audio VAE / vocoder を含む `.safetensors` 単体チェックポイントで配布されます。
114+
LTX-2.3 では、SimpleTuner は `model_flavour` に応じた Diffusers リポジトリ
115+
(`2.3-dev` または `2.3-distilled`) を読み込みます。
115116

116117
### 任意: VRAM 最適化
117118

documentation/quickstart/LTXVIDEO2.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -97,8 +97,8 @@ cp config/config.json.example config/config.json
9797
Key settings for LTX Video 2:
9898

9999
- `model_family`: `ltxvideo2`
100-
- `model_flavour`: `dev` (default), `dev-fp4`, or `dev-fp8`.
101-
- `pretrained_model_name_or_path`: `Lightricks/LTX-2` (Hub repo with the combined checkpoint) or a local `.safetensors` file.
100+
- `model_flavour`: `dev` (default), `dev-fp4`, `dev-fp8`, `2.3-dev`, or `2.3-distilled`.
101+
- `pretrained_model_name_or_path`: `Lightricks/LTX-2`, `dg845/LTX-2.3-Diffusers`, `dg845/LTX-2.3-Distilled-Diffusers`, or a local `.safetensors` file.
102102
- `train_batch_size`: `1`. Do not increase this unless you have an A100/H100.
103103
- `validation_resolution`:
104104
- `512x768` is a safe default for testing.
@@ -109,8 +109,8 @@ Key settings for LTX Video 2:
109109
- `validation_guidance`: `5.0`.
110110
- `frame_rate`: Default is 25.
111111

112-
LTX-2 ships as a single `.safetensors` checkpoint that includes the transformer, video VAE, audio VAE, and vocoder.
113-
SimpleTuner loads from this combined file directly based on `model_flavour` (dev/dev-fp4/dev-fp8).
112+
LTX-2 2.0 flavours ship as a single `.safetensors` checkpoint that includes the transformer, video VAE, audio VAE, and vocoder.
113+
For LTX-2.3, SimpleTuner loads the matching Diffusers repo selected by `model_flavour` (`2.3-dev` or `2.3-distilled`).
114114

115115
### Optional: VRAM optimizations
116116

documentation/quickstart/LTXVIDEO2.pt-BR.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -97,8 +97,8 @@ cp config/config.json.example config/config.json
9797
Configurações-chave para LTX Video 2:
9898

9999
- `model_family`: `ltxvideo2`
100-
- `model_flavour`: `dev` (padrão), `dev-fp4` ou `dev-fp8`.
101-
- `pretrained_model_name_or_path`: `Lightricks/LTX-2` (repositório com o checkpoint combinado) ou um arquivo `.safetensors` local.
100+
- `model_flavour`: `dev` (padrão), `dev-fp4`, `dev-fp8`, `2.3-dev` ou `2.3-distilled`.
101+
- `pretrained_model_name_or_path`: `Lightricks/LTX-2`, `dg845/LTX-2.3-Diffusers`, `dg845/LTX-2.3-Distilled-Diffusers` ou um arquivo `.safetensors` local.
102102
- `train_batch_size`: `1`. Não aumente isso a menos que você tenha um A100/H100.
103103
- `validation_resolution`:
104104
- `512x768` é um padrão seguro para testes.
@@ -109,8 +109,9 @@ Configurações-chave para LTX Video 2:
109109
- `validation_guidance`: `5.0`.
110110
- `frame_rate`: O padrão é 25.
111111

112-
O LTX-2 é distribuído como um único checkpoint `.safetensors` que inclui o transformer, o VAE de vídeo,
113-
o VAE de áudio e o vocoder. O SimpleTuner carrega desse arquivo combinado conforme o `model_flavour` (dev/dev-fp4/dev-fp8).
112+
As variantes LTX-2 2.0 são distribuídas como um único checkpoint `.safetensors` que inclui o transformer, o VAE de vídeo,
113+
o VAE de áudio e o vocoder. Para o LTX-2.3, o SimpleTuner carrega o repositório Diffusers correspondente ao `model_flavour`
114+
(`2.3-dev` ou `2.3-distilled`).
114115

115116
### Opcional: otimizações de VRAM
116117

documentation/quickstart/LTXVIDEO2.zh.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -98,8 +98,8 @@ cp config/config.json.example config/config.json
9898
LTX Video 2 的关键设置:
9999

100100
- `model_family`: `ltxvideo2`
101-
- `model_flavour`: `dev`(默认)、`dev-fp4``dev-fp8`
102-
- `pretrained_model_name_or_path`: `Lightricks/LTX-2`(包含 combined checkpoint 的仓库)或本地 `.safetensors` 文件。
101+
- `model_flavour`: `dev`(默认)、`dev-fp4``dev-fp8``2.3-dev``2.3-distilled`
102+
- `pretrained_model_name_or_path`: `Lightricks/LTX-2``dg845/LTX-2.3-Diffusers``dg845/LTX-2.3-Distilled-Diffusers` 或本地 `.safetensors` 文件。
103103
- `train_batch_size`: `1`。除非有 A100/H100,否则不要提高。
104104
- `validation_resolution`:
105105
- `512x768` 是安全的测试默认值。
@@ -110,8 +110,9 @@ LTX Video 2 的关键设置:
110110
- `validation_guidance`: `5.0`
111111
- `frame_rate`: 默认 25。
112112

113-
LTX-2 以单个 `.safetensors` checkpoint 形式发布,包含 transformer、视频 VAE、音频 VAE 和 vocoder。
114-
SimpleTuner 会根据 `model_flavour`(dev/dev-fp4/dev-fp8)从该 combined 文件加载。
113+
LTX-2 2.0 变体以单个 `.safetensors` checkpoint 形式发布,包含 transformer、视频 VAE、音频 VAE 和 vocoder。
114+
对于 LTX-2.3,SimpleTuner 会根据 `model_flavour` 加载对应的 Diffusers 仓库
115+
`2.3-dev``2.3-distilled`)。
115116

116117
### 可选:VRAM 优化
117118

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -324,7 +324,7 @@ def _collect_package_files(*directories: str):
324324
"peft-singlora>=0.2.0",
325325
"vector-quantize-pytorch>=1.27.15",
326326
"cryptography>=41.0.0",
327-
"torchcodec>=0.8.1",
327+
"torchcodec>=0.10.0",
328328
"sdnq>=0.1.2",
329329
"aiosqlite>=0.19.0",
330330
"httpx>=0.28.0",

simpletuner/helpers/models/ltxvideo2/autoencoder.py

Lines changed: 20 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -352,7 +352,7 @@ def forward(self, hidden_states: torch.Tensor, causal: bool = True) -> torch.Ten
352352

353353

354354
# Like LTX 1.0 LTXVideoUpsampler3d, but uses new causal Conv3d
355-
class LTXVideoUpsampler3d(nn.Module):
355+
class LTX2VideoUpsampler3d(nn.Module):
356356
def __init__(
357357
self,
358358
in_channels: int,
@@ -647,6 +647,7 @@ def __init__(
647647
resnet_eps: float = 1e-6,
648648
resnet_act_fn: str = "swish",
649649
spatio_temporal_scale: bool = True,
650+
upsample_type: str = "spatiotemporal",
650651
inject_noise: bool = False,
651652
timestep_conditioning: bool = False,
652653
upsample_residual: bool = False,
@@ -676,11 +677,19 @@ def __init__(
676677

677678
self.upsamplers = None
678679
if spatio_temporal_scale:
680+
if upsample_type == "spatial":
681+
stride = (1, 2, 2)
682+
elif upsample_type == "temporal":
683+
stride = (2, 1, 1)
684+
elif upsample_type == "spatiotemporal":
685+
stride = (2, 2, 2)
686+
else:
687+
raise ValueError(f"Unsupported upsample_type: {upsample_type}")
679688
self.upsamplers = nn.ModuleList(
680689
[
681-
LTXVideoUpsampler3d(
690+
LTX2VideoUpsampler3d(
682691
out_channels * upscale_factor,
683-
stride=(2, 2, 2),
692+
stride=stride,
684693
residual=upsample_residual,
685694
upscale_factor=upscale_factor,
686695
spatial_padding_mode=spatial_padding_mode,
@@ -935,6 +944,7 @@ def __init__(
935944
is_causal: bool = False,
936945
inject_noise: Tuple[bool, ...] = (False, False, False),
937946
timestep_conditioning: bool = False,
947+
upsample_type: Tuple[str, ...] = ("spatiotemporal", "spatiotemporal", "spatiotemporal"),
938948
upsample_residual: Tuple[bool, ...] = (True, True, True),
939949
upsample_factor: Tuple[bool, ...] = (2, 2, 2),
940950
spatial_padding_mode: str = "reflect",
@@ -950,6 +960,7 @@ def __init__(
950960
spatio_temporal_scaling = tuple(reversed(spatio_temporal_scaling))
951961
layers_per_block = tuple(reversed(layers_per_block))
952962
inject_noise = tuple(reversed(inject_noise))
963+
upsample_type = tuple(reversed(upsample_type))
953964
upsample_residual = tuple(reversed(upsample_residual))
954965
upsample_factor = tuple(reversed(upsample_factor))
955966
output_channel = block_out_channels[0]
@@ -984,6 +995,7 @@ def __init__(
984995
num_layers=layers_per_block[i + 1],
985996
resnet_eps=resnet_norm_eps,
986997
spatio_temporal_scale=spatio_temporal_scaling[i],
998+
upsample_type=upsample_type[i],
987999
inject_noise=inject_noise[i + 1],
9881000
timestep_conditioning=timestep_conditioning,
9891001
upsample_residual=upsample_residual[i],
@@ -1067,6 +1079,9 @@ def forward(
10671079
return hidden_states
10681080

10691081

1082+
LTXVideoUpsampler3d = LTX2VideoUpsampler3d
1083+
1084+
10701085
class AutoencoderKLLTX2Video(ModelMixin, AutoencoderMixin, ConfigMixin, FromOriginalModelMixin):
10711086
r"""
10721087
A VAE model with KL loss for encoding images into latents and decoding latent representations into images. Used in
@@ -1129,6 +1144,7 @@ def __init__(
11291144
decoder_spatio_temporal_scaling: Tuple[bool, ...] = (True, True, True),
11301145
decoder_inject_noise: Tuple[bool, ...] = (False, False, False, False),
11311146
downsample_type: Tuple[str, ...] = ("spatial", "temporal", "spatiotemporal", "spatiotemporal"),
1147+
upsample_type: Tuple[str, ...] = ("spatiotemporal", "spatiotemporal", "spatiotemporal"),
11321148
upsample_residual: Tuple[bool, ...] = (True, True, True),
11331149
upsample_factor: Tuple[int, ...] = (2, 2, 2),
11341150
timestep_conditioning: bool = False,
@@ -1171,6 +1187,7 @@ def __init__(
11711187
is_causal=decoder_causal,
11721188
timestep_conditioning=timestep_conditioning,
11731189
inject_noise=decoder_inject_noise,
1190+
upsample_type=upsample_type,
11741191
upsample_residual=upsample_residual,
11751192
upsample_factor=upsample_factor,
11761193
spatial_padding_mode=decoder_spatial_padding_mode,

0 commit comments

Comments
 (0)