Skip to content

[BUG]: cuda::launch doesn't work correctly with kernels compiled with .blocksareclusters #8327

@davebayer

Description

@davebayer

Is this a duplicate?

Type of Bug

Silent Failure

Component

libcu++

Describe the bug

When .blocksareclusters directive is specified for a kernel (can come from __cluster_dims__ and __block_size__) and the kernel is launched by cuda::launch with a hierarchy that contains the cluster description, too many blocks (clusters) are being launched.

How to Reproduce

__global__ __cluster_dims__(2) void kernel() {}

int main()
{
  cuda::stream stream{cuda::device_ref{0}};
  
  const auto config = cuda::make_config(cuda::grid_dims<2>(), cuda::cluster_dims<3>(), cuda::block_dims<4>());

  // launches 6 clusters instead of 2
  cuda::launch(stream, config, kernel);
  stream.sync();
}

Expected behavior

Should launch the right number of clusters.

Metadata

Metadata

Labels

No labels
No labels

Type

Projects

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions