Name |
Last commit
|
Last update |
---|---|---|
.. | ||
include/topi | ||
python | ||
recipe | ||
src | ||
tests/python | ||
README.md |
- This allows to better utilize the memory bandwidth - Note that not all cases are vectorized for fp16 datatype. For instance, when the size is not a multiple of 1024, the inner loop may be an expression that cannot be vectorized. In this case, a small inner loop is still benefical for latency hidding. Signed-off-by: Wei Pan <weip@nvidia.com>
Name |
Last commit
|
Last update |
---|---|---|
.. | ||
include/topi | Loading commit data... | |
python | Loading commit data... | |
recipe | Loading commit data... | |
src | Loading commit data... | |
tests/python | Loading commit data... | |
README.md | Loading commit data... |