* Don't replace reduction init axis with new axis if bound to a thread. * Linter. * Reduce bind test case. * Guard test on CUDA support. * [CUDA TE TESTS] Add rfactor predicate test, add global bx and tx. * [CUDA TE TESTS] Add loop partition test for simple rfactor case.