[Fix] Fix get_valid_count flaky test for cuda (#4901)
* get_valid_count accuracy issue fixed for individual tests but not for all tests running together * minor fix * initialize valid_count and PrefixSum buffers * test updated * udpate relay test as well * update document * fix lint * address comment * fix lint * correct atomicAdd identifier name
Showing
This diff is collapsed.
Click to expand it.
Please
register
or
sign in
to comment