-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Pull requests: NVIDIA/cutlass
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[CuTeDSL] Update atomic_max_float32 to atomic_fmax in blockscaled GEMM amax example
#3155
opened Apr 8, 2026 by
questa-wang
Contributor
Loading…
[fix][CuTeDSL] Fix dynamic shape args not passed to JIT kernel
#3148
opened Apr 5, 2026 by
Flink-ddd
Loading…
[CuTeDSL] Fix incorrect package-data key in pyproject.toml
#3145
opened Apr 3, 2026 by
Johnsonms
Contributor
Loading…
[Fix] distributed gemm all reduce and reduce scatter examples
#3143
opened Apr 1, 2026 by
Nicyzk
Loading…
Fix Hopper FMHA performance regression on CUDA < 13.1
#3137
opened Mar 31, 2026 by
arvin-chou
Loading…
5 of 6 tasks
feat(CuTeDSL): print benchmark time from Blackwell dense_gemm CLI
#3136
opened Mar 30, 2026 by
aidando73
Contributor
Loading…
[CuTeDSL] Add a render function hook to allow render layout natively
#3135
opened Mar 30, 2026 by
kainzhong
Loading…
Fix incorrect warp arrangement in PitchLinearWarpStripedThreadMap
#3131
opened Mar 25, 2026 by
HaoYuan-Gao
Loading…
[CuTeDSL] Add SM103 grouped block-scaled GEMM kernel and tests
#3124
opened Mar 23, 2026 by
Johnsonms
Contributor
Loading…
Enable strict C++ compiler warnings with -Werror
#3123
opened Mar 22, 2026 by
maxwbuckley
Loading…
3 of 4 tasks
[bugfix] use acquire to prevent reordering.
#3118
opened Mar 20, 2026 by
shubaoyu2
Contributor
Loading…
[FMHA] Add SM110 support for Blackwell FMHA example (77_blackwell_fmha)
#3112
opened Mar 18, 2026 by
LiangSu8899
Loading…
fix(CuTeDSL): correct FP4 tensor K dimension in grouped blockscaled GEMM
inactive-30d
#3102
opened Mar 13, 2026 by
Hale423
Loading…
Previous Next
ProTip!
Adding no:label will show everything without a label.