1. 19 Jan, 2020 1 commit
  2. 18 Jan, 2020 1 commit
    • [CodeGen][CUDA] Improve CUDA vectorizer (#4736) · 2630ffcb
      - Fixes issues to enable fp16 vectorizer. Now correct packing and
        unpacking CUDA code will be emitted. Enabled more unit tests.
      
      - Do not emit code to read the first lane from an undef variable
      
        int _3;
        _3 = _3 & ~(0x000000ff << 0) | ...
      
        and emit the following code instead:
      
        _3 = (((0x000000ff & (_1 >> 0))+(0x000000ff & (_2 >> 0))) << 0);
      
        Note that nvcc 10.2 is forgiving and emits the same code for both cases.
        A warning appears in test_codegen_cuda.py.
      
      Signed-off-by: Wei Pan <weip@nvidia.com>
      wpan11nv committed
  3. 10 Dec, 2019 1 commit
  4. 24 Nov, 2019 1 commit
  5. 14 Nov, 2019 1 commit
  6. 10 Nov, 2019 1 commit