Is this a duplicate?
Type of Bug
Silent Failure
Component
libcu++
Describe the bug
When .blocksareclusters directive is specified for a kernel (can come from __cluster_dims__ and __block_size__) and the kernel is launched by cuda::launch with a hierarchy that contains the cluster description, too many blocks (clusters) are being launched.
How to Reproduce
__global__ __cluster_dims__(2) void kernel() {}
int main()
{
cuda::stream stream{cuda::device_ref{0}};
const auto config = cuda::make_config(cuda::grid_dims<2>(), cuda::cluster_dims<3>(), cuda::block_dims<4>());
// launches 6 clusters instead of 2
cuda::launch(stream, config, kernel);
stream.sync();
}
Expected behavior
Should launch the right number of clusters.
Is this a duplicate?
Type of Bug
Silent Failure
Component
libcu++
Describe the bug
When
.blocksareclustersdirective is specified for a kernel (can come from__cluster_dims__and__block_size__) and the kernel is launched bycuda::launchwith a hierarchy that contains the cluster description, too many blocks (clusters) are being launched.How to Reproduce
Expected behavior
Should launch the right number of clusters.