An opinionated, perpetual WIP project aimed at hacking WanVideo 2.1(2)-T2V-(A)14B LoRA training.
It is intended as a playground for experimenting with new ideas and testing various training features, including some that might ultimately turn out to be useless. The configuration file structure may change at any time, and some non-functioning options may still be present. It only supports Wan2.1-T2V-14B and Wan2.2-T2V-A14B training.
This project would not have been possible without musubi-tuner. Although extensively refactored and reworked (to the point where upstream merge is no longer possible), the original project provided the foundation on which Takenoko was built. By reusing an existing and proven codebase, I was able to focus more on experimentation and learning instead of reinventing the wheel. Thanks to kohya-ss for the awesome work.
Since this project is mostly aimed at personal use and is in a state of constant improvement (without guaranteeing backwards compatibility), it probably won't have comprehensive documentation in the near future (unless it somehow becomes popular, which I hope it does not). I've tried to provide detailed comments in the config template, but they can't cover everything. As a workaround, I recommend using repomix to compress the entire repository into a single XML AI-readable file (will take around 1M tokens), then feeding it into the free Grok 4 Fast with 2M context window and asking questions about various aspects of the project.
- Clone the repository.
- Run
install.bat. - Create configuration file (you can copy sample config from
configs/examplesfolder). - Place it into the
configsdirectory. - Launch
run_trainer.batand follow the instructions.
This project borrows code from various sources, which use different types of licenses, mostly Apache 2.0, MIT, and AGPLv3. Since AGPLv3 is a strong copyleft license, including any AGPLv3 code likely means the entire project must be released under AGPLv3. This understanding is based on publicly available licensing information.
Takenoko draws inspiration from and incorporates code, ideas, and techniques from various open-source projects and publications. I thank the authors and maintainers for their contributions. Below is a list of all sources and papers (in no particular order). I have tried to reference all sources, but if I happen to miss any (or if more specific credits are warranted), please let me know.
Keep in mind that work on some features is not yet complete due to time and hardware constraints. If a feature is not working or is not implemented exactly as in the original work, all responsibility lies with my implementation, not with the authors of the original code or paper.
| Source | Type | What was borrowed | Author(s) | License | Comment |
|---|---|---|---|---|---|
| musubi-tuner | repo | - Original codebase | kohya-ss | Apache 2.0 | |
| blissful-tuner | repo | - Several optimization techniques | Sarania | Apache 2.0 | |
| diffusion-pipe | repo | - Pre-computed timestep distribution algorithm - AdamW8bitKahan optimizer - Automagic optimizer modifications |
tdrussell | MIT | |
| WanTraining | repo | - Control LoRA training - DWT loss |
spacepxl | Apache 2.0 | |
| ai-toolkit | repo | - Differential output preservation - Adafactor optimizer - Prodigy 8-bit optimizer - Automagic optimizer - EMA implementation - Concept slider training - Stepped loss |
ostris | MIT | |
| musubi-tuner (pr) | repo | - Initial implementation of validation datasets | NSFW-API | Apache 2.0 | |
| Timestep-Attention-and-other-shenanigans | repo | - Clustered MSE Loss - EW loss |
Anzhc | AGPL-3.0 | |
| Diffuse and Disperse: Image Generation with Representation Regularization | paper | - Dispersive loss | Runqian Wang, Kaiming He | CC BY 4.0 | |
| DispLoss | repo | - Dispersive loss PyTorch implementation | raywang4 | MIT | |
| sd-scripts | repo | - Regularization datasets - LoRA-GGPO - Validation loss |
kohya-ss | Apache 2.0 | |
| wan2.1-dilated-controlnet | repo | - ControlNET training | TheDenk | Apache 2.0 | |
| T-LoRA | repo | - T-LoRA training | ControlGenAI | MIT | see also paper |
| sd-scripts (fork) | repo | - Fourier loss - HinaAdaptive optimizer |
hinablue | Apache 2.0 | |
| Muon | repo | - Muon optimizer | KellerJordan | MIT | |
| dion | repo | - DION2-inspired reduced orthonormal optimizer integration | microsoft | MIT | |
| Sana | repo | - CAME 8-bit optimizer | NVlabs | Apache 2.0 | see also paper |
| SimpleTuner | repo | - Routed TREAD - SOAP optimizer - Masked training (spatial-first loss, area interpolation, proper normalization, auto mask generation) - Advanced EMA features - CREPA/LayerSync improvements - Scheduled rollout probability ramping |
bghira | AGPL-3.0 | |
| diffusion-pipe (pr) | repo | - Frame-based TREAD | Ada123-a | MIT | |
| Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think | paper | - Representational alignment loss, 3-layer MLP projection head, forward hook-based feature capture | Sihyun Yu, Sangkyung Kwak, Huiwon Jang, Jongheon Jeong, Jonathan Huang, Jinwoo Shin, Saining Xie | CC BY 4.0 | |
| REPA | repo | - Representation Alignment implementation | sihyun-yu | MIT | |
| dino | repo | - VisionTransformer implementation | facebookresearch | MIT | |
| Sophia | repo | - Sophia optimizer | Liuhong99 | MIT | see also paper |
| Adaptive Non-Uniform Timestep Sampling for Diffusion Model Training | repo | - Adaptive timestep sampling | KU-DMLab | MIT | see also paper |
| Temporal Regularization Makes Your Video Generator Stronger | paper | - Temporal regularization via perturbation | Harold Haodong Chen, Haojian Huang, Xianfeng Wu, Yexin Liu, Yajing Bai, Wen-Jie Shu, Harry Yang, Ser-Nam Lim | arXiv 1.0 | |
| AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion | paper | - Frame-oriented Probability Propagation (FoPP) scheduler | Mingzhen Sun, Weining Wang, Gen Li, Jiawei Liu, Jiahui Sun, Wanquan Feng, Shanshan Lao, SiYu Zhou, Qian He, Jing Liu | arXiv 1.0 | |
| Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep Approach | paper | - Vectorized timestep scheduling | Yaofang Liu, Yumeng Ren, Xiaodong Cun, Aitor Artola, Yang Liu, Tieyong Zeng, Raymond H. Chan, Jean-michel Morel | arXiv 1.0 | |
| Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion | paper | - Post-training autoregressive self-rollout method | Xun Huang, Zhengqi Li, Guande He, Mingyuan Zhou, Eli Shechtman | CC BY-NC-SA 4.0 | |
| Wan2.1-NABLA | repo | - Dynamic sparse attention | gen-ai-team | Apache 2.0 | see also paper |
| VideoX-Fun | repo | - Reward LoRA training | aigc-apps | Apache 2.0 | |
| Fira | repo | - Fira optimizer | xichen-fy | Apache 2.0 | see also paper |
| google-research | repo | - Frechet Video Distance (FVD) implementation | google-research | Apache 2.0 | |
| Mixture of Contexts for Long Video Generation | paper | - Mixture of Contexts (MoC) sparse attention routing | Shengqu Cai, Ceyuan Yang, Lvmin Zhang, Yuwei Guo, Junfei Xiao, Ziyan Yang, Yinghao Xu, Zhenheng Yang, Alan Yuille, Leonidas Guibas, Maneesh Agrawala, Lu Jiang, Gordon Wetzstein | CC BY-SA 4.0 | |
| SPHL-for-stable-diffusion | code | - Pseudo-Huber loss implementation | kabachuha | see also paper | |
| Context as Memory: Scene-Consistent Interactive Long Video Generation with Memory Retrieval | paper | - Context-as-Memory integration | Jiwen Yu, Jianhong Bai, Yiran Qin, Quande Liu, Xintao Wang, Pengfei Wan, Di Zhang, Xihui Liu | CC BY 4.0 | |
| SingLoRA | repo | - SingLoRA implementation | kyegomez | MIT | see also paper |
| PEFT-SingLoRA | repo | - Enhanced non-square matrix handling | bghira | BSD 2-clause | |
| sd-scripts(pr) | repo | - Latent quality analysis | araleza | Apache 2.0 | |
| Contrastive Flow Matching | paper | - Contrastive loss | George Stoica, Vivek Ramanujan, Xiang Fan, Ali Farhadi, Ranjay Krishna, Judy Hoffman | CC BY 4.0 | |
| DeltaFM | repo | - Contrastive Flow Matching implementation (class-conditioned sampling, unconditional handling) | gstoica27 | MIT | |
| OneTrainer | repo | - Masked training (prior preservation, unmasked weight, random mask removal) - OFTv2 orthogonal finetuning integration reference |
Nerogar | AGPL-3.0 | |
| Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion | paper | - Frequency-domain temporal consistency | Jingyuan Chen, Fuchen Long, Jie An, Zhaofan Qiu, Ting Yao, Jiebo Luo, Tao Mei | CC BY-SA 4.0 | |
| mmgp | repo | - Memory-mapped safetensors loading | deepbeepmeep | GNU GPL | |
| attention-map-diffusers | repo | - Cross-attention map visualization | wooyeolbaek | MIT | |
| musubi-tuner (fork) | repo | - Full model fine-tuning - Row-based TREAD |
betterftr | Apache 2.0 | |
| stochastic_round_cuda | repo | - Stochastic rounding CUDA implementation | ethansmith2000 | MIT | |
| simplevae | repo | - VAE training enhancements | AiArtLab | ||
| RamTorch | repo | - RamTorch CPU-bouncing linear layers | lodestone-rock | Apache 2.0 | |
| Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference | repo | - SRPO preference optimization | Tencent-Hunyuan | SRPO Non-Commercial License | see also paper |
| SARA: Structural and Adversarial Representation Alignment for Training-efficient Diffusion Models | paper | - Autocorrelation matrix alignment - Adversarial distribution alignment - Multi-level hierarchical representation loss |
Hesen Chen, Junyan Wang, Zhiyu Tan, Hao Li | CC BY 4.0 | |
| Scion | repo | - Scion optimizer | LIONS-EPFL | MIT | see also paper |
| EqM | repo | - Equilibrium matching adaptation | raywang4 | MIT | see also paper |
| NorMuon | repo | - Neuron-wise Normalized Muon implementation | CoffeeVampir3 | MIT | |
| TiM | repo | - Transition training objective (paired timesteps, transports, weighting, EMA) | WZDTHU | Apache 2.0 | see also paper |
| rcm | repo | - rCM distillation algorithm reference | NVlabs | Apache 2.0 | see also paper |
| Aozora_SDXL_Training | repo | - Raven optimizer | Hysocs | ||
| Sprint: Sparse-Dense Residual Fusion for Efficient Diffusion Transformers | paper | - Sparse-dense residual fusion with token dropping - Path-drop learning with token regularization - Two-stage training scheduler |
Dogyun Park, Moayed Haji-Ali, Yanyu Li, Willi Menapace, Sergey Tulyakov, Hyunwoo J. Kim, Aliaksandr Siarohin, Anil Kag | CC BY 4.0 | |
| AdaMuon | repo | - Adaptive Muon optimizer implementation | Chongjie-Si | Apache 2.0 | see also paper |
| Cross-Frame Representation Alignment for Fine-Tuning Video Diffusion Models | paper | - Cross-frame representation alignment | Sungwon Hwang, Hyojin Jang, Kinam Kim, Minho Park, Jaegul Choo | CC BY 4.0 | |
| LayerSync: Self-aligning Intermediate Layers | repo | - Inter-layer alignment loss | vita-epfl | MIT | see also paper |
| HyperLoRA | repo | - HyperLoRA concept | bytedance | GPL-3.0 | see also paper |
| Qwen-Image-i2L | repo | - Trainable single-pass LoRA weight prediction hypernetwork concept - Residual-conditioned branch with cached auxiliary embeddings - Optional multi-encoder auxiliary embedding fusion |
DiffSynth-Studio | Apache 2.0 | see also article |
| iREPA | repo | - Convolutional projector for spatial preservation - Spatial z-score normalization for sharper alignment |
End2End-Diffusion | MIT | see also paper |
| SpeedrunDiT | repo | - Dim-aware timestep shift - Cross-batch CFM regularizer - Sprint uncond-only path drop for sampling |
SwayStar123 | MIT | |
| Improved Variational Online Newton (IVON) | repo | - IVON implementation | team-approx-bayes | GPL-3.0 | with code from PR by rockerBOO |
| MemFlow | repo | - Memory bank - Sparse memory activation guidance |
KlingTeam | Apache 2.0 | see also paper |
| HASTE | repo | - Holistic alignment loss - Semantic anchor feature projections - Attention alignment with teacher offset - Stage‑wise termination |
NUS-HPC-AI-Lab | Apache 2.0 | see also paper |
| sd-scripts (pr) | repo | - CDC-FM flow matching | rockerBOO | Apache 2.0 | see also paper |
| GaLore | repo | - GaLore optimizer | jiaweizzhao | Apache 2.0 | see also paper |
| REG | repo | - Class‑token entanglement - Class‑token denoising loss - Alignment loss to encoder features |
Martinser | MIT | see also paper |
| Q-GaLore | repo | - Q-GaLore optimizer | VITA-Group | Apache 2.0 | see also paper |
| SemanticGen: Video Generation in Semantic Space | paper | - Semantic token conditioning - Feature‑representation cross‑alignment loss |
Jianhong Bai, Xiaoshi Wu, Xintao Wang, Xiao Fu, Yuanxing Zhang, Qinghe Wang, Xiaoyu Shi, Menghan Xia, Zuozhu Liu, Haoji Hu, Pengfei Wan, Kun Gai | ||
| transformers (pr) | repo | - Implementation of Q-GaLore optimizer | SunMarc | Apache 2.0 | |
| Glance | repo | - Fixed-timestep distillation mode | CSU-JPG | Apache 2.0 | see also paper |
| Stable-Video-Infinity | repo | - Error‑recycling fine‑tuning - Timestep‑grid replay buffers - Buffer replacement strategies - Warmup distributed buffer fill - Probabilistic error injection and modulation - Anchor‑conditioned motion replay - Sequence‑aware batching for replay continuity |
vita-epfl | Apache 2.0 | see also paper |
| EquiVDM: Equivariant Video Diffusion Models with Temporally Consistent Noise | paper | - Temporally consistent noise with flow caching | Chao Liu, Arash Vahdat | CC BY 4.0 | |
| catlvdm | repo | - BCNI/SACN corruption for T5 conditioning - Structured corruption robustness boost - Mask‑aware embedding noise injection |
chikap421 | MIT | see also paper |
| TPDiff: Temporal Pyramid Video Diffusion Model | paper | - Temporal pyramid bounded sampling - Stage‑wise temporal resampling - Stage‑specific scheduler‑aware gamma/sigma |
Lingmin Ran, Mike Zheng Shou | CC BY 4.0 | |
| relora | repo | - ReLoRA pipeline | Guitaricet | Apache 2.0 | see also paper |
| DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models | paper | - DenseDPO training method | Ziyi Wu, Anil Kag, Ivan Skorokhodov, Willi Menapace, Ashkan Mirzaei, Igor Gilitschenski, Sergey Tulyakov, Aliaksandr Siarohin | CC BY 4.0 | |
| Blockwise-Flow-Matching | repo | - Blockwise timestep segment objective - SemFeat alignment conditioning - SemFeat time-embedding injection - FRN loss |
mlvlab | see also paper | |
| MuonClip | repo | - MuonClip | kyegomez | Apache 2.0 | see also paper |
| mHC: Manifold-Constrained Hyper-Connections | paper | - Multi-path residual stream with learnable residual mixing matrix - Doubly-stochastic manifold constraint - Identity-mapping preservation across depth - Sinkhorn-Knopp normalization enforcing constraint - Norm-preserving cross-stream residual propagation |
Zhenda Xie, Yixuan Wei, Huanqi Cao, Chenggang Zhao, Chengqi Deng, Jiashi Li, Damai Dai, Huazuo Gao, Jiang Chang, Liang Zhao, Shangyan Zhou, Zhean Xu, Zhengyan Zhang, Wangding Zeng, Shengding Hu, Yuqing Wang, Jingyang Yuan, Lean Wang, Wenfeng Liang | arXiv 1.0 | |
| manifolds | repo | - Manifold Muon integration | thinking-machines-lab | MIT | see also blogpost |
| LoRA meets Riemannion: Muon Optimizer for Parametrization-independent Low-Rank Adapters | paper | - Riemannion fixed‑rank optimizer - Manifold momentum/transport - Manifold‑aware LoRA tangent projection and retraction - One‑step gradient locally optimal initialization |
Vladimir Bogachev, Vladimir Aletov, Alexander Molozhavenko, Denis Bobkov, Vera Soboleva, Aibek Alanov, Maxim Rakhuba | arXiv 1.0 | |
| pico-relora | repo | - Optimizer reset via random pruning - Jagged cosine scheduler |
Yu-val-weiss | Apache 2.0 | see also paper |
| Physics-Guided Motion Loss for Video Generation Model | paper | - Physics-guided motion loss | Bowen Xue, Giuseppe Claudio Guarnera, Shuang Zhao, Zahra Montazeri | arXiv 1.0 | |
| optimizers | repo | - Original implementation of Kron, Conda, VSGD, RangerVA and NvNovoGrad optimizers | NoteDance | Apache 2.0 | |
| clora | repo | - Cross-attention capture - Token-focused attention - Spatial attention masking - Contrastive attention separation |
gemlab-vt | MIT | see also paper |
| splus | repo | - SPlus optimizer | kvfrans | see also paper | |
| Internal-Guidance | repo | - Auxiliary supervision on intermediate layers - Internal dynamics guidance - Target shifting mechanism |
CVL-UESTC | MIT | see also paper |
| Beyond External Guidance: Unleashing the Semantic Richness Inside Diffusion Transformers for Improved Training | paper | - Two‑stage self‑guidance - Feature‑space CFG semantic enrichment - Frozen internal teacher stabilization - Lightweight projection alignment |
Lingchen Sun, Rongyuan Wu, Zhengqiang Zhang, Ruibin Li, Yujing Sun, Shuaizheng Liu, Lei Zhang | CC BY 4.0 | |
| FreeFuse | repo | - Subject-mask training with auxiliary consistency losses | yaoliliu | Apache 2.0 | see also paper |
| Immiscible-Diffusion | repo | - KNN candidate noise selection implementation - Linear assignment noise matching reference |
yhli123 | MIT | see also paper |
| MixFlow | repo | - Slowed interpolation mixture objective - Beta-style timestep remapping |
fudan-generative-vision | see also paper | |
| Video Consistency Distance: Enhancing Temporal Consistency for Image-to-Video Generation via Reward-Based Fine-Tuning | paper | - VCD temporal-consistency objective - Frequency-domain amplitude/phase consistency distance |
Takehiro Aoshima, Yusuke Shinohara, Byeongseon Park | arXiv 1.0 | |
| MoAlign: Motion-Centric Representation Alignment for Video Diffusion Models | paper | - Two-stage motion-centric alignment - Spatial/temporal relational alignment loss with temporal weighting |
Aritra Bhowmik, Denis Korzhenkov, Cees G. M. Snoek, Amirhossein Habibian, Mohsen Ghafoorian | CC BY 4.0 | |
| End-to-End Training for Autoregressive Video Diffusion via Self-Resampling | paper | - Self-resampling history corruption - History token routing - Autoregressive rollout with KV-cache acceleration |
Yuwei Guo, Ceyuan Yang, Hao He, Yang Zhao, Meng Wei, Zhenheng Yang, Weilin Huang, Dahua Lin | CC BY 4.0 | |
| VideoREPA | repo | - Video teacher integration patterns - TRD objective implementation |
aHapBean | Apache 2.0 | see also paper |
| Structure From Tracking: Distilling Structure-Preserving Motion for Video Generation | paper | - Bidirectional teacher-feature fusion for structure-preserving motion distillation - Local Gram Flow (LGF) alignment objective - SFT pipeline with optional SAM2 tracker-memory backend |
Yang Fei, George Stoica, Jingyuan Liu, Qifeng Chen, Ranjay Krishna, Xiaojuan Wang, Benlin Liu | CC BY-NC-ND 4.0 | |
| CAMEO | repo | - Attention distillation techniques | cvlab-kaist | see also paper | |
| VAE-REPA: Variational Autoencoder Representation Alignment for Efficient Diffusion Training | paper | - VAE-latent representation alignment objective - Configurable projector depth |
Mengmeng Wang, Dengyang Jiang, Liuzhuozheng Li, Yucheng Lin, Guojiang Shen, Xiangjie Kong, Yong Liu, Guang Dai, Jingdong Wang | CC BY 4.0 | |
| DisMo | repo | - Conditional LoRA modulation reference - Stochastic delta-time sampling reference - Motion/appearance disentanglement diagnostics direction |
CompVis | MIT | see also paper |
| ReflexFlow: Rethinking Learning Objective for Exposure Bias Alleviation in Flow Matching | paper | - Anti-Drift Rectification (ADR) objective - Frequency Compensation (FC) loss reweighting - Scheduled sampling strategy for biased-input training |
Guanbo Huang, Jingjia Mao, Fanding Huang, Fengkai Liu, Xiangyang Luo, Yaoyuan Liang, Jiasheng Lu, Xiaoe Wang, Pei Liu, Ruiliu Fu, Shao-Lun Huang | arXiv 1.0 | |
| StableVelocity | repo | - VA-REPA timestep-aware weighting schedules - StableVM memory-bank target construction - Class-aware bank sampling |
linYDTHU | MIT | see also paper |
| LTX-2 | repo | - IC-LoRA trainer/pipeline structure references - Reference-target conditioning flow design - IC-LoRA network module conventions |
Lightricks | LTX-2 Community License | |
| In-Context LoRA for Diffusion Transformers | paper | - In-context concatenation objective for condition/target layouts | Lianghua Huang, Wei Wang, Zhi-Fan Wu, Yupeng Shi, Huanzhang Dou, Chen Liang, Yutong Feng, Yu Liu, Jingren Zhou | CC BY 4.0 | |
| In-Context Sync-LoRA for Portrait Video Editing | paper | - Sync-aware paired-video curation concept - Motion-preserving in-context edit objective |
Sagi Polaczek, Or Patashnik, Ali Mahdavi-Amiri, Daniel Cohen-Or | arXiv 1.0 | |
| Generative Modeling via Drifting | paper | - Drifting auxiliary objective - Mean-shift drifting field with kernel normalization |
Mingyang Deng, He Li, Tianhong Li, Yilun Du, Kaiming He | CC BY 4.0 | |
| DeT | repo | - Motion-transfer enhancement integration - Local temporal-kernel and dense-trajectory supervision objectives |
Shi-qingyu | see also paper | |
| Mano-Restriking-Manifold-Optimization-for-LLM-Training | repo | - Mano optimizer implementation - Tangent-space manifold update - Matrix/aux-Adam parameter split |
xie-lab-ml | Apache 2.0 | see also paper |
| Euphonium | repo | - SRPO process-reward guidance - Dual outcome/process reward modes - KL-auto scaling - Optional latent SPSA gradients |
zerzerzerz | Apache 2.0 | see also paper |
| ShortFT: Diffusion Model Alignment via Shortcut-based Fine-Tuning | paper | - Progressive shortcut backprop schedule for reward LoRA training - Segment/anchor-based denoising-chain backprop control |
Xiefan Guo, Miaomiao Cui, Liefeng Bo, Di Huang | arXiv 1.0 | |
| FlexAM | repo | - FlexAM conditioning - Density-guided timestep conditioning concept |
IGL-HKUST | Apache 2.0 | see also paper |
| UFO | repo | - Static-clip training - Frame-correlated autoregressive noise sharing - Motion-sub frame-delta loss - Temporal-attention LoRA targeting |
Delong-liu-bupt | MIT | see also paper |
| PiSSA | repo | - Principal/residual decomposition | GraphPKU | Apache 2.0 | see also paper |
| sd-scripts (pr) | repo | - PiSSA initialization and integration patterns | rockerBOO | Apache 2.0 | |
| AdaLoRA | repo | - Adaptive rank-budget allocation workflow - Rank-importance scoring and masking flow |
QingruZhang | MIT | see also paper |
| MoRA | repo | - High-rank square adapter update - Type-based projection/expansion mapping |
kongds | see also paper | |
| VeRA: Vector-based Random Matrix Adaptation | paper | - Shared frozen random projection matrices across adapted layers - Trainable per-layer scaling vectors with minimal parameter overhead |
Dawid J. Kopiczko, Tijmen Blankevoort, Yuki M. Asano | CC BY 4.0 | |
| S2D: Selective Spectral Decay for Quantization-Friendly Conditioning of Neural Activations | paper | - Selective dominant-spectrum regularization - Amortized spectral updates with thresholded top-component targeting |
Arnav Chavan, Nahush Lele, Udbhav Bamba, Sankalp Dayal, Aditi Raghunathan, Deepak Gupta | arXiv 1.0 | |
| LoRWeB | repo | - Dynamic LoRA basis with query-conditioned weight mixing - Query-mode/runtime wiring patterns for visual analogy triplets |
NVlabs | NVIDIA License | see also paper |
| Growing with the Generator: Self-paced GRPO for Video Generation | paper | - Self-paced reward progression - Sparsity-aware reward mixing |
Rui Li, Yuanzhi Liang, Ziqi Ni, Haibing Huang, Chi Zhang, Xuelong Li | arXiv 1.0 | |
| CDKA | repo | - Reference implementation for CDKA | rainstonee | see also paper | |
| QLoRA | repo | - 4-bit NF4/FP4 quantized base-model loading - Double/nested quantization flow - Paged bitsandbytes optimizer integration - k-bit preparation patterns |
artidoro | MIT | see also paper |
| Mode Seeking meets Mean Seeking for Fast Long Video Generation | paper | - Decoupled global/local dual-head auxiliary objective - Sliding-window local teacher-alignment approximation - Reverse-KL local behavior-matching term |
Shengqu Cai, Weili Nie, Chao Liu, Julius Berner, Lvmin Zhang, Nanye Ma, Hansheng Chen, Maneesh Agrawala, Leonidas Guibas, Gordon Wetzstein, Arash Vahdat | CC BY-SA 4.0 | |
| VB-LoRA | repo | - Vector-bank LoRA composition - Top-k sparse logits composition and compact checkpoint strategy |
leo-yangli | see also paper | |
| A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA | paper | - Rank-stabilized adapter scaling | Damjan Kalajdzievski | arXiv 1.0 | |
| Self-Supervised Flow Matching for Scalable Multi-Modal Synthesis | paper | - Dual-timestep masked noising - EMA teacher feature alignment with cosine objective - Combined training objective |
Hila Chefer, Patrick Esser, Dominik Lorenz, Dustin Podell, Vikash Raja, Vinh Tong, Antonio Torralba, Robin Rombach | arXiv 1.0 | |
| Video2LoRA | repo | - LightLoRA auxiliary-factor decomposition - Reference-video-conditioned hypernetwork for runtime LoRA prediction - Iterative latent-token decoder structure and paired-reference training flow - End-to-end diffusion training without pre-trained semantic LoRA supervision |
BerserkerVV | see also paper | |
| StelLA | repo | - Three-factor LoRA decomposition - Stiefel-manifold constrained adapter updates - Euclidean-to-Riemannian gradient conversion with retraction |
SonyResearch | Apache 2.0 | see also paper |
| Disentangling Task Conflicts in Multi-Task LoRA via Orthogonal Gradient Projection | paper | - Oorthogonal gradient projection for shared multi-task LoRA - Separate conflict projection for LoRA low-rank factors |
Ziyu Yang, Guibin Chen, Yuxin Yang, Aoxiong Zeng, Xiangquan Yang | CC BY 4.0 | |
| Helios: Real Real-Time Long Video Generation Model | paper | - Frame-aware historical-context corruption - First-frame history anchoring for anti-drift training |
Shenghai Yuan, Yuanyang Yin, Zongjian Li, Xinwei Huang, Xiao Yang, Li Yuan | CC BY 4.0 | |
| OmniTransfer: All-in-one Framework for Spatio-temporal Video Transfer | paper | - Reference positional-bias direction for IC-LoRA - Task-type conditioning token for SemanticGen-style routing |
Pengze Zhang, Yanze Wu, Mengtian Li, Xu Bai, Songtao Zhao, Fulong Ye, Chong Mou, Xinghui Li, Zhuowei Chen, Qian He, Mingyuan Gao | arXiv 1.0 | |
| Demystifing Video Reasoning | paper | - Early-step multi-view consensus - Mid-layer merge-and-continue training - High-noise timestep transfer |
Ruisi Wang, Zhongang Cai, Fanyi Pu, Junxiang Xu, Wanqi Yin, Maijunxian Wang, Ran Ji, Chenyang Gu, Bo Li, Ziqi Huang, Hokin Deng, Dahua Lin, Ziwei Liu, Lei Yang | CC BY 4.0 | |
| ViBe: Ultra-High-Resolution Video Synthesis Born from Pure Images | paper | - High-frequency-aware training objective (HFATO) - Downsample-upsample latent degradation |
Yunfeng Wu, Hongying Cheng, Zihao He, Songhua Liu | arXiv 1.0 | |
| FLeX: Fourier-based Low-rank EXpansion for multilingual transfer | paper | - Fourier-domain regularization - High-frequency-weighted spectral penalty with optional FFN-focused targeting |
Gaurav Narasimhan | CC BY 4.0 | |
| Isokinetic Flow Matching for Pathwise Straightening of Generative Flows | paper | - Jacobian-free lookahead velocity-consistency regularizer for flow matching - Time-weighted, speed-normalized pathwise acceleration penalty |
Tauhid Khan | arXiv 1.0 | Train-time only Iso-FM auxiliary loss; inference unchanged |
| URSA | repo | - Split anchor-vs-continuation video loss reduction - Separate anchor and temporal loss telemetry - Spatiotemporal guidance weighting via anchor reconstruction and frame-delta consistency |
baaivision | Apache 2.0 | see also paper |
| DeCo | repo | - DCT-based low/high-frequency energy diagnostics - Band-balanced DCT reconstruction auxiliary loss with separate low/high-frequency weights |
Zehong-Ma | see also paper | |
| LoRA-drop: Efficient LoRA Parameter Pruning based on Output Evaluation | paper | - Output-magnitude per-layer EMA tracking - Automatic low-importance same-shape adapter sharing |
Hongyun Zhou, Xiangyu Lu, Wang Xu, Conghui Zhu, Tiejun Zhao, Muyun Yang | arXiv 1.0 | |
| DyPE | repo | - Dynamic RoPE index scaling for oversized spatial and temporal token grids | guyyariv | MIT | see also paper |
| TwinFlow | repo | - Parallel distillation pipeline - Signed-timestep conditioning for negative-time self-adversarial passes - TwinFlow-controlled beta sigma sampling and enhancement-window gating - Recursive consistency target with optional target enhancement, adversarial, and rectification losses |
inclusionAI | Apache 2.0 | see also paper |
| rectified-flow-pytorch | repo | - Self-Flow RMSNorm + GELU projector design | lucidrains | MIT | |
| HY-SOAR | repo | - HY-SOAR auxiliary trajectory-correction objective - Same-noise off-trajectory supervision with detached CFG rollout |
Tencent-Hunyuan | Apache 2.0 | see also paper |
| FlowC2S | repo | - Current-succeeding transport supervision - Chunk-pairing training layout - Target-inversion scoping reference |
marghovo | see also paper |
