Skip to content

seruva19/takenoko

Repository files navigation

Takenoko Logo

Takenoko

An opinionated, perpetual WIP project aimed at hacking WanVideo 2.1(2)-T2V-(A)14B LoRA training.

It is intended as a playground for experimenting with new ideas and testing various training features, including some that might ultimately turn out to be useless. The configuration file structure may change at any time, and some non-functioning options may still be present. It only supports Wan2.1-T2V-14B and Wan2.2-T2V-A14B training.

☄️ Disclaimer

This project would not have been possible without musubi-tuner. Although extensively refactored and reworked (to the point where upstream merge is no longer possible), the original project provided the foundation on which Takenoko was built. By reusing an existing and proven codebase, I was able to focus more on experimentation and learning instead of reinventing the wheel. Thanks to kohya-ss for the awesome work.

☄️ Docs

Since this project is mostly aimed at personal use and is in a state of constant improvement (without guaranteeing backwards compatibility), it probably won't have comprehensive documentation in the near future (unless it somehow becomes popular, which I hope it does not). I've tried to provide detailed comments in the config template, but they can't cover everything. As a workaround, I recommend using repomix to compress the entire repository into a single XML AI-readable file (will take around 1M tokens), then feeding it into the free Grok 4 Fast with 2M context window and asking questions about various aspects of the project.

☄️ Quick Start (Windows)

  1. Clone the repository.
  2. Run install.bat.
  3. Create configuration file (you can copy sample config from configs/examples folder).
  4. Place it into the configs directory.
  5. Launch run_trainer.bat and follow the instructions.

☄️ License

This project borrows code from various sources, which use different types of licenses, mostly Apache 2.0, MIT, and AGPLv3. Since AGPLv3 is a strong copyleft license, including any AGPLv3 code likely means the entire project must be released under AGPLv3. This understanding is based on publicly available licensing information.

☄️ Acknowledgments

Takenoko draws inspiration from and incorporates code, ideas, and techniques from various open-source projects and publications. I thank the authors and maintainers for their contributions. Below is a list of all sources and papers (in no particular order). I have tried to reference all sources, but if I happen to miss any (or if more specific credits are warranted), please let me know.

Keep in mind that work on some features is not yet complete due to time and hardware constraints. If a feature is not working or is not implemented exactly as in the original work, all responsibility lies with my implementation, not with the authors of the original code or paper.

Source Type What was borrowed Author(s) License Comment
musubi-tuner repo - Original codebase kohya-ss Apache 2.0
blissful-tuner repo - Several optimization techniques Sarania Apache 2.0
diffusion-pipe repo - Pre-computed timestep distribution algorithm
- AdamW8bitKahan optimizer
- Automagic optimizer modifications
tdrussell MIT
WanTraining repo - Control LoRA training
- DWT loss
spacepxl Apache 2.0
ai-toolkit repo - Differential output preservation
- Adafactor optimizer
- Prodigy 8-bit optimizer
- Automagic optimizer
- EMA implementation
- Concept slider training
- Stepped loss
ostris MIT
musubi-tuner (pr) repo - Initial implementation of validation datasets NSFW-API Apache 2.0
Timestep-Attention-and-other-shenanigans repo - Clustered MSE Loss
- EW loss
Anzhc AGPL-3.0
Diffuse and Disperse: Image Generation with Representation Regularization paper - Dispersive loss Runqian Wang, Kaiming He CC BY 4.0
DispLoss repo - Dispersive loss PyTorch implementation raywang4 MIT
sd-scripts repo - Regularization datasets
- LoRA-GGPO
- Validation loss
kohya-ss Apache 2.0
wan2.1-dilated-controlnet repo - ControlNET training TheDenk Apache 2.0
T-LoRA repo - T-LoRA training ControlGenAI MIT see also paper
sd-scripts (fork) repo - Fourier loss
- HinaAdaptive optimizer
hinablue Apache 2.0
Muon repo - Muon optimizer KellerJordan MIT
dion repo - DION2-inspired reduced orthonormal optimizer integration microsoft MIT
Sana repo - CAME 8-bit optimizer NVlabs Apache 2.0 see also paper
SimpleTuner repo - Routed TREAD
- SOAP optimizer
- Masked training (spatial-first loss, area interpolation, proper normalization, auto mask generation)
- Advanced EMA features
- CREPA/LayerSync improvements
- Scheduled rollout probability ramping
bghira AGPL-3.0
diffusion-pipe (pr) repo - Frame-based TREAD Ada123-a MIT
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think paper - Representational alignment loss, 3-layer MLP projection head, forward hook-based feature capture Sihyun Yu, Sangkyung Kwak, Huiwon Jang, Jongheon Jeong, Jonathan Huang, Jinwoo Shin, Saining Xie CC BY 4.0
REPA repo - Representation Alignment implementation sihyun-yu MIT
dino repo - VisionTransformer implementation facebookresearch MIT
Sophia repo - Sophia optimizer Liuhong99 MIT see also paper
Adaptive Non-Uniform Timestep Sampling for Diffusion Model Training repo - Adaptive timestep sampling KU-DMLab MIT see also paper
Temporal Regularization Makes Your Video Generator Stronger paper - Temporal regularization via perturbation Harold Haodong Chen, Haojian Huang, Xianfeng Wu, Yexin Liu, Yajing Bai, Wen-Jie Shu, Harry Yang, Ser-Nam Lim arXiv 1.0
AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion paper - Frame-oriented Probability Propagation (FoPP) scheduler Mingzhen Sun, Weining Wang, Gen Li, Jiawei Liu, Jiahui Sun, Wanquan Feng, Shanshan Lao, SiYu Zhou, Qian He, Jing Liu arXiv 1.0
Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep Approach paper - Vectorized timestep scheduling Yaofang Liu, Yumeng Ren, Xiaodong Cun, Aitor Artola, Yang Liu, Tieyong Zeng, Raymond H. Chan, Jean-michel Morel arXiv 1.0
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion paper - Post-training autoregressive self-rollout method Xun Huang, Zhengqi Li, Guande He, Mingyuan Zhou, Eli Shechtman CC BY-NC-SA 4.0
Wan2.1-NABLA repo - Dynamic sparse attention gen-ai-team Apache 2.0 see also paper
VideoX-Fun repo - Reward LoRA training aigc-apps Apache 2.0
Fira repo - Fira optimizer xichen-fy Apache 2.0 see also paper
google-research repo - Frechet Video Distance (FVD) implementation google-research Apache 2.0
Mixture of Contexts for Long Video Generation paper - Mixture of Contexts (MoC) sparse attention routing Shengqu Cai, Ceyuan Yang, Lvmin Zhang, Yuwei Guo, Junfei Xiao, Ziyan Yang, Yinghao Xu, Zhenheng Yang, Alan Yuille, Leonidas Guibas, Maneesh Agrawala, Lu Jiang, Gordon Wetzstein CC BY-SA 4.0
SPHL-for-stable-diffusion code - Pseudo-Huber loss implementation kabachuha see also paper
Context as Memory: Scene-Consistent Interactive Long Video Generation with Memory Retrieval paper - Context-as-Memory integration Jiwen Yu, Jianhong Bai, Yiran Qin, Quande Liu, Xintao Wang, Pengfei Wan, Di Zhang, Xihui Liu CC BY 4.0
SingLoRA repo - SingLoRA implementation kyegomez MIT see also paper
PEFT-SingLoRA repo - Enhanced non-square matrix handling bghira BSD 2-clause
sd-scripts(pr) repo - Latent quality analysis araleza Apache 2.0
Contrastive Flow Matching paper - Contrastive loss George Stoica, Vivek Ramanujan, Xiang Fan, Ali Farhadi, Ranjay Krishna, Judy Hoffman CC BY 4.0
DeltaFM repo - Contrastive Flow Matching implementation (class-conditioned sampling, unconditional handling) gstoica27 MIT
OneTrainer repo - Masked training (prior preservation, unmasked weight, random mask removal)
- OFTv2 orthogonal finetuning integration reference
Nerogar AGPL-3.0
Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion paper - Frequency-domain temporal consistency Jingyuan Chen, Fuchen Long, Jie An, Zhaofan Qiu, Ting Yao, Jiebo Luo, Tao Mei CC BY-SA 4.0
mmgp repo - Memory-mapped safetensors loading deepbeepmeep GNU GPL
attention-map-diffusers repo - Cross-attention map visualization wooyeolbaek MIT
musubi-tuner (fork) repo - Full model fine-tuning
- Row-based TREAD
betterftr Apache 2.0
stochastic_round_cuda repo - Stochastic rounding CUDA implementation ethansmith2000 MIT
simplevae repo - VAE training enhancements AiArtLab
RamTorch repo - RamTorch CPU-bouncing linear layers lodestone-rock Apache 2.0
Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference repo - SRPO preference optimization Tencent-Hunyuan SRPO Non-Commercial License see also paper
SARA: Structural and Adversarial Representation Alignment for Training-efficient Diffusion Models paper - Autocorrelation matrix alignment
- Adversarial distribution alignment
- Multi-level hierarchical representation loss
Hesen Chen, Junyan Wang, Zhiyu Tan, Hao Li CC BY 4.0
Scion repo - Scion optimizer LIONS-EPFL MIT see also paper
EqM repo - Equilibrium matching adaptation raywang4 MIT see also paper
NorMuon repo - Neuron-wise Normalized Muon implementation CoffeeVampir3 MIT
TiM repo - Transition training objective (paired timesteps, transports, weighting, EMA) WZDTHU Apache 2.0 see also paper
rcm repo - rCM distillation algorithm reference NVlabs Apache 2.0 see also paper
Aozora_SDXL_Training repo - Raven optimizer Hysocs
Sprint: Sparse-Dense Residual Fusion for Efficient Diffusion Transformers paper - Sparse-dense residual fusion with token dropping
- Path-drop learning with token regularization
- Two-stage training scheduler
Dogyun Park, Moayed Haji-Ali, Yanyu Li, Willi Menapace, Sergey Tulyakov, Hyunwoo J. Kim, Aliaksandr Siarohin, Anil Kag CC BY 4.0
AdaMuon repo - Adaptive Muon optimizer implementation Chongjie-Si Apache 2.0 see also paper
Cross-Frame Representation Alignment for Fine-Tuning Video Diffusion Models paper - Cross-frame representation alignment Sungwon Hwang, Hyojin Jang, Kinam Kim, Minho Park, Jaegul Choo CC BY 4.0
LayerSync: Self-aligning Intermediate Layers repo - Inter-layer alignment loss vita-epfl MIT see also paper
HyperLoRA repo - HyperLoRA concept bytedance GPL-3.0 see also paper
Qwen-Image-i2L repo - Trainable single-pass LoRA weight prediction hypernetwork concept
- Residual-conditioned branch with cached auxiliary embeddings
- Optional multi-encoder auxiliary embedding fusion
DiffSynth-Studio Apache 2.0 see also article
iREPA repo - Convolutional projector for spatial preservation
- Spatial z-score normalization for sharper alignment
End2End-Diffusion MIT see also paper
SpeedrunDiT repo - Dim-aware timestep shift
- Cross-batch CFM regularizer
- Sprint uncond-only path drop for sampling
SwayStar123 MIT
Improved Variational Online Newton (IVON) repo - IVON implementation team-approx-bayes GPL-3.0 with code from PR by rockerBOO
MemFlow repo - Memory bank
- Sparse memory activation guidance
KlingTeam Apache 2.0 see also paper
HASTE repo - Holistic alignment loss
- Semantic anchor feature projections
- Attention alignment with teacher offset
- Stage‑wise termination
NUS-HPC-AI-Lab Apache 2.0 see also paper
sd-scripts (pr) repo - CDC-FM flow matching rockerBOO Apache 2.0 see also paper
GaLore repo - GaLore optimizer jiaweizzhao Apache 2.0 see also paper
REG repo - Class‑token entanglement
- Class‑token denoising loss
- Alignment loss to encoder features
Martinser MIT see also paper
Q-GaLore repo - Q-GaLore optimizer VITA-Group Apache 2.0 see also paper
SemanticGen: Video Generation in Semantic Space paper - Semantic token conditioning
- Feature‑representation cross‑alignment loss
Jianhong Bai, Xiaoshi Wu, Xintao Wang, Xiao Fu, Yuanxing Zhang, Qinghe Wang, Xiaoyu Shi, Menghan Xia, Zuozhu Liu, Haoji Hu, Pengfei Wan, Kun Gai
transformers (pr) repo - Implementation of Q-GaLore optimizer SunMarc Apache 2.0
Glance repo - Fixed-timestep distillation mode CSU-JPG Apache 2.0 see also paper
Stable-Video-Infinity repo - Error‑recycling fine‑tuning
- Timestep‑grid replay buffers
- Buffer replacement strategies
- Warmup distributed buffer fill
- Probabilistic error injection and modulation
- Anchor‑conditioned motion replay
- Sequence‑aware batching for replay continuity
vita-epfl Apache 2.0 see also paper
EquiVDM: Equivariant Video Diffusion Models with Temporally Consistent Noise paper - Temporally consistent noise with flow caching Chao Liu, Arash Vahdat CC BY 4.0
catlvdm repo - BCNI/SACN corruption for T5 conditioning
- Structured corruption robustness boost
- Mask‑aware embedding noise injection
chikap421 MIT see also paper
TPDiff: Temporal Pyramid Video Diffusion Model paper - Temporal pyramid bounded sampling
- Stage‑wise temporal resampling
- Stage‑specific scheduler‑aware gamma/sigma
Lingmin Ran, Mike Zheng Shou CC BY 4.0
relora repo - ReLoRA pipeline Guitaricet Apache 2.0 see also paper
DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models paper - DenseDPO training method Ziyi Wu, Anil Kag, Ivan Skorokhodov, Willi Menapace, Ashkan Mirzaei, Igor Gilitschenski, Sergey Tulyakov, Aliaksandr Siarohin CC BY 4.0
Blockwise-Flow-Matching repo - Blockwise timestep segment objective
- SemFeat alignment conditioning
- SemFeat time-embedding injection
- FRN loss
mlvlab see also paper
MuonClip repo - MuonClip kyegomez Apache 2.0 see also paper
mHC: Manifold-Constrained Hyper-Connections paper - Multi-path residual stream with learnable residual mixing matrix
- Doubly-stochastic manifold constraint
- Identity-mapping preservation across depth
- Sinkhorn-Knopp normalization enforcing constraint
- Norm-preserving cross-stream residual propagation
Zhenda Xie, Yixuan Wei, Huanqi Cao, Chenggang Zhao, Chengqi Deng, Jiashi Li, Damai Dai, Huazuo Gao, Jiang Chang, Liang Zhao, Shangyan Zhou, Zhean Xu, Zhengyan Zhang, Wangding Zeng, Shengding Hu, Yuqing Wang, Jingyang Yuan, Lean Wang, Wenfeng Liang arXiv 1.0
manifolds repo - Manifold Muon integration thinking-machines-lab MIT see also blogpost
LoRA meets Riemannion: Muon Optimizer for Parametrization-independent Low-Rank Adapters paper - Riemannion fixed‑rank optimizer
- Manifold momentum/transport
- Manifold‑aware LoRA tangent projection and retraction
- One‑step gradient locally optimal initialization
Vladimir Bogachev, Vladimir Aletov, Alexander Molozhavenko, Denis Bobkov, Vera Soboleva, Aibek Alanov, Maxim Rakhuba arXiv 1.0
pico-relora repo - Optimizer reset via random pruning
- Jagged cosine scheduler
Yu-val-weiss Apache 2.0 see also paper
Physics-Guided Motion Loss for Video Generation Model paper - Physics-guided motion loss Bowen Xue, Giuseppe Claudio Guarnera, Shuang Zhao, Zahra Montazeri arXiv 1.0
optimizers repo - Original implementation of Kron, Conda, VSGD, RangerVA and NvNovoGrad optimizers NoteDance Apache 2.0
clora repo - Cross-attention capture
- Token-focused attention
- Spatial attention masking
- Contrastive attention separation
gemlab-vt MIT see also paper
splus repo - SPlus optimizer kvfrans see also paper
Internal-Guidance repo - Auxiliary supervision on intermediate layers
- Internal dynamics guidance
- Target shifting mechanism
CVL-UESTC MIT see also paper
Beyond External Guidance: Unleashing the Semantic Richness Inside Diffusion Transformers for Improved Training paper - Two‑stage self‑guidance
- Feature‑space CFG semantic enrichment
- Frozen internal teacher stabilization
- Lightweight projection alignment
Lingchen Sun, Rongyuan Wu, Zhengqiang Zhang, Ruibin Li, Yujing Sun, Shuaizheng Liu, Lei Zhang CC BY 4.0
FreeFuse repo - Subject-mask training with auxiliary consistency losses yaoliliu Apache 2.0 see also paper
Immiscible-Diffusion repo - KNN candidate noise selection implementation
- Linear assignment noise matching reference
yhli123 MIT see also paper
MixFlow repo - Slowed interpolation mixture objective
- Beta-style timestep remapping
fudan-generative-vision see also paper
Video Consistency Distance: Enhancing Temporal Consistency for Image-to-Video Generation via Reward-Based Fine-Tuning paper - VCD temporal-consistency objective
- Frequency-domain amplitude/phase consistency distance
Takehiro Aoshima, Yusuke Shinohara, Byeongseon Park arXiv 1.0
MoAlign: Motion-Centric Representation Alignment for Video Diffusion Models paper - Two-stage motion-centric alignment
- Spatial/temporal relational alignment loss with temporal weighting
Aritra Bhowmik, Denis Korzhenkov, Cees G. M. Snoek, Amirhossein Habibian, Mohsen Ghafoorian CC BY 4.0
End-to-End Training for Autoregressive Video Diffusion via Self-Resampling paper - Self-resampling history corruption
- History token routing
- Autoregressive rollout with KV-cache acceleration
Yuwei Guo, Ceyuan Yang, Hao He, Yang Zhao, Meng Wei, Zhenheng Yang, Weilin Huang, Dahua Lin CC BY 4.0
VideoREPA repo - Video teacher integration patterns
- TRD objective implementation
aHapBean Apache 2.0 see also paper
Structure From Tracking: Distilling Structure-Preserving Motion for Video Generation paper - Bidirectional teacher-feature fusion for structure-preserving motion distillation
- Local Gram Flow (LGF) alignment objective
- SFT pipeline with optional SAM2 tracker-memory backend
Yang Fei, George Stoica, Jingyuan Liu, Qifeng Chen, Ranjay Krishna, Xiaojuan Wang, Benlin Liu CC BY-NC-ND 4.0
CAMEO repo - Attention distillation techniques cvlab-kaist see also paper
VAE-REPA: Variational Autoencoder Representation Alignment for Efficient Diffusion Training paper - VAE-latent representation alignment objective
- Configurable projector depth
Mengmeng Wang, Dengyang Jiang, Liuzhuozheng Li, Yucheng Lin, Guojiang Shen, Xiangjie Kong, Yong Liu, Guang Dai, Jingdong Wang CC BY 4.0
DisMo repo - Conditional LoRA modulation reference
- Stochastic delta-time sampling reference
- Motion/appearance disentanglement diagnostics direction
CompVis MIT see also paper
ReflexFlow: Rethinking Learning Objective for Exposure Bias Alleviation in Flow Matching paper - Anti-Drift Rectification (ADR) objective
- Frequency Compensation (FC) loss reweighting
- Scheduled sampling strategy for biased-input training
Guanbo Huang, Jingjia Mao, Fanding Huang, Fengkai Liu, Xiangyang Luo, Yaoyuan Liang, Jiasheng Lu, Xiaoe Wang, Pei Liu, Ruiliu Fu, Shao-Lun Huang arXiv 1.0
StableVelocity repo - VA-REPA timestep-aware weighting schedules
- StableVM memory-bank target construction
- Class-aware bank sampling
linYDTHU MIT see also paper
LTX-2 repo - IC-LoRA trainer/pipeline structure references
- Reference-target conditioning flow design
- IC-LoRA network module conventions
Lightricks LTX-2 Community License
In-Context LoRA for Diffusion Transformers paper - In-context concatenation objective for condition/target layouts Lianghua Huang, Wei Wang, Zhi-Fan Wu, Yupeng Shi, Huanzhang Dou, Chen Liang, Yutong Feng, Yu Liu, Jingren Zhou CC BY 4.0
In-Context Sync-LoRA for Portrait Video Editing paper - Sync-aware paired-video curation concept
- Motion-preserving in-context edit objective
Sagi Polaczek, Or Patashnik, Ali Mahdavi-Amiri, Daniel Cohen-Or arXiv 1.0
Generative Modeling via Drifting paper - Drifting auxiliary objective
- Mean-shift drifting field with kernel normalization
Mingyang Deng, He Li, Tianhong Li, Yilun Du, Kaiming He CC BY 4.0
DeT repo - Motion-transfer enhancement integration
- Local temporal-kernel and dense-trajectory supervision objectives
Shi-qingyu see also paper
Mano-Restriking-Manifold-Optimization-for-LLM-Training repo - Mano optimizer implementation
- Tangent-space manifold update
- Matrix/aux-Adam parameter split
xie-lab-ml Apache 2.0 see also paper
Euphonium repo - SRPO process-reward guidance
- Dual outcome/process reward modes
- KL-auto scaling
- Optional latent SPSA gradients
zerzerzerz Apache 2.0 see also paper
ShortFT: Diffusion Model Alignment via Shortcut-based Fine-Tuning paper - Progressive shortcut backprop schedule for reward LoRA training
- Segment/anchor-based denoising-chain backprop control
Xiefan Guo, Miaomiao Cui, Liefeng Bo, Di Huang arXiv 1.0
FlexAM repo - FlexAM conditioning
- Density-guided timestep conditioning concept
IGL-HKUST Apache 2.0 see also paper
UFO repo - Static-clip training
- Frame-correlated autoregressive noise sharing
- Motion-sub frame-delta loss
- Temporal-attention LoRA targeting
Delong-liu-bupt MIT see also paper
PiSSA repo - Principal/residual decomposition GraphPKU Apache 2.0 see also paper
sd-scripts (pr) repo - PiSSA initialization and integration patterns rockerBOO Apache 2.0
AdaLoRA repo - Adaptive rank-budget allocation workflow
- Rank-importance scoring and masking flow
QingruZhang MIT see also paper
MoRA repo - High-rank square adapter update
- Type-based projection/expansion mapping
kongds see also paper
VeRA: Vector-based Random Matrix Adaptation paper - Shared frozen random projection matrices across adapted layers
- Trainable per-layer scaling vectors with minimal parameter overhead
Dawid J. Kopiczko, Tijmen Blankevoort, Yuki M. Asano CC BY 4.0
S2D: Selective Spectral Decay for Quantization-Friendly Conditioning of Neural Activations paper - Selective dominant-spectrum regularization
- Amortized spectral updates with thresholded top-component targeting
Arnav Chavan, Nahush Lele, Udbhav Bamba, Sankalp Dayal, Aditi Raghunathan, Deepak Gupta arXiv 1.0
LoRWeB repo - Dynamic LoRA basis with query-conditioned weight mixing
- Query-mode/runtime wiring patterns for visual analogy triplets
NVlabs NVIDIA License see also paper
Growing with the Generator: Self-paced GRPO for Video Generation paper - Self-paced reward progression
- Sparsity-aware reward mixing
Rui Li, Yuanzhi Liang, Ziqi Ni, Haibing Huang, Chi Zhang, Xuelong Li arXiv 1.0
CDKA repo - Reference implementation for CDKA rainstonee see also paper
QLoRA repo - 4-bit NF4/FP4 quantized base-model loading
- Double/nested quantization flow
- Paged bitsandbytes optimizer integration
- k-bit preparation patterns
artidoro MIT see also paper
Mode Seeking meets Mean Seeking for Fast Long Video Generation paper - Decoupled global/local dual-head auxiliary objective
- Sliding-window local teacher-alignment approximation
- Reverse-KL local behavior-matching term
Shengqu Cai, Weili Nie, Chao Liu, Julius Berner, Lvmin Zhang, Nanye Ma, Hansheng Chen, Maneesh Agrawala, Leonidas Guibas, Gordon Wetzstein, Arash Vahdat CC BY-SA 4.0
VB-LoRA repo - Vector-bank LoRA composition
- Top-k sparse logits composition and compact checkpoint strategy
leo-yangli see also paper
A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA paper - Rank-stabilized adapter scaling Damjan Kalajdzievski arXiv 1.0
Self-Supervised Flow Matching for Scalable Multi-Modal Synthesis paper - Dual-timestep masked noising
- EMA teacher feature alignment with cosine objective
- Combined training objective
Hila Chefer, Patrick Esser, Dominik Lorenz, Dustin Podell, Vikash Raja, Vinh Tong, Antonio Torralba, Robin Rombach arXiv 1.0
Video2LoRA repo - LightLoRA auxiliary-factor decomposition
- Reference-video-conditioned hypernetwork for runtime LoRA prediction
- Iterative latent-token decoder structure and paired-reference training flow
- End-to-end diffusion training without pre-trained semantic LoRA supervision
BerserkerVV see also paper
StelLA repo - Three-factor LoRA decomposition
- Stiefel-manifold constrained adapter updates
- Euclidean-to-Riemannian gradient conversion with retraction
SonyResearch Apache 2.0 see also paper
Disentangling Task Conflicts in Multi-Task LoRA via Orthogonal Gradient Projection paper - Oorthogonal gradient projection for shared multi-task LoRA
- Separate conflict projection for LoRA low-rank factors
Ziyu Yang, Guibin Chen, Yuxin Yang, Aoxiong Zeng, Xiangquan Yang CC BY 4.0
Helios: Real Real-Time Long Video Generation Model paper - Frame-aware historical-context corruption
- First-frame history anchoring for anti-drift training
Shenghai Yuan, Yuanyang Yin, Zongjian Li, Xinwei Huang, Xiao Yang, Li Yuan CC BY 4.0
OmniTransfer: All-in-one Framework for Spatio-temporal Video Transfer paper - Reference positional-bias direction for IC-LoRA
- Task-type conditioning token for SemanticGen-style routing
Pengze Zhang, Yanze Wu, Mengtian Li, Xu Bai, Songtao Zhao, Fulong Ye, Chong Mou, Xinghui Li, Zhuowei Chen, Qian He, Mingyuan Gao arXiv 1.0
Demystifing Video Reasoning paper - Early-step multi-view consensus
- Mid-layer merge-and-continue training
- High-noise timestep transfer
Ruisi Wang, Zhongang Cai, Fanyi Pu, Junxiang Xu, Wanqi Yin, Maijunxian Wang, Ran Ji, Chenyang Gu, Bo Li, Ziqi Huang, Hokin Deng, Dahua Lin, Ziwei Liu, Lei Yang CC BY 4.0
ViBe: Ultra-High-Resolution Video Synthesis Born from Pure Images paper - High-frequency-aware training objective (HFATO)
- Downsample-upsample latent degradation
Yunfeng Wu, Hongying Cheng, Zihao He, Songhua Liu arXiv 1.0
FLeX: Fourier-based Low-rank EXpansion for multilingual transfer paper - Fourier-domain regularization
- High-frequency-weighted spectral penalty with optional FFN-focused targeting
Gaurav Narasimhan CC BY 4.0
Isokinetic Flow Matching for Pathwise Straightening of Generative Flows paper - Jacobian-free lookahead velocity-consistency regularizer for flow matching
- Time-weighted, speed-normalized pathwise acceleration penalty
Tauhid Khan arXiv 1.0 Train-time only Iso-FM auxiliary loss; inference unchanged
URSA repo - Split anchor-vs-continuation video loss reduction
- Separate anchor and temporal loss telemetry
- Spatiotemporal guidance weighting via anchor reconstruction and frame-delta consistency
baaivision Apache 2.0 see also paper
DeCo repo - DCT-based low/high-frequency energy diagnostics
- Band-balanced DCT reconstruction auxiliary loss with separate low/high-frequency weights
Zehong-Ma see also paper
LoRA-drop: Efficient LoRA Parameter Pruning based on Output Evaluation paper - Output-magnitude per-layer EMA tracking
- Automatic low-importance same-shape adapter sharing
Hongyun Zhou, Xiangyu Lu, Wang Xu, Conghui Zhu, Tiejun Zhao, Muyun Yang arXiv 1.0
DyPE repo - Dynamic RoPE index scaling for oversized spatial and temporal token grids guyyariv MIT see also paper
TwinFlow repo - Parallel distillation pipeline
- Signed-timestep conditioning for negative-time self-adversarial passes
- TwinFlow-controlled beta sigma sampling and enhancement-window gating
- Recursive consistency target with optional target enhancement, adversarial, and rectification losses
inclusionAI Apache 2.0 see also paper
rectified-flow-pytorch repo - Self-Flow RMSNorm + GELU projector design lucidrains MIT
HY-SOAR repo - HY-SOAR auxiliary trajectory-correction objective
- Same-noise off-trajectory supervision with detached CFG rollout
Tencent-Hunyuan Apache 2.0 see also paper
FlowC2S repo - Current-succeeding transport supervision
- Chunk-pairing training layout
- Target-inversion scoping reference
marghovo see also paper

About

A research project exploring WanVideo LoRA training.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages