[QNN] Requantize operator (#3531)
* [Relay] [Quantization] WIP - Common files for the qauntization work.
* [Relay] [Quantization] WIP - Prototyping requantize op.
* Requantize operator implementation.
Requantize converts one quantized tensor representation to another quantized
representation. The PR has following implementation features
- Requantize operator defined in qnn namespace - relay.qnn.requantize
- Lowering of the requantize to exisiting Relay operators
- Integer fixed point implementation of requantize
- Two rounding modes - FE_UPWARDS (round towards infinity) and
FE_AWAY_FROM_ZERO (std::round behavior)
- Floating point implementation as well, that can act as reference or can be
used for devices when FP32 computation is not used.
- Unit test cases
Relevant Issue - https://github.com/dmlc/tvm/issues/2351
Credit to TFLite and GemmLowp to provide reference implementations.
* Typo and lint fixes.
* Doc fix.
* Uncommenting the lint script (fixing mistake).
* Modifying the unit tests.
* Moving C++ files into src/relay/qnn
* Moving python files to python/tvm/relay/qnn. Some minor fixes.
* Moving the attrs.h inside the include directory.
* Pushing files that I forgot earlier. Changing util location.
* Incorporating comments. API change. Lint fixes.
* Modifying the GetFixedPointMultiplierShift API as per comments.
* Forgot the dialect change.
* Changing rewrite to qnn_lower.
* Renaming Quantize to Qnn for clarity.
* Remove use_int_domain.
* Incorportaing review comments.
* Adding API doc for QNN dialect.
* Move the qnn_lower pass to transform namespace.
* Moving from expr to module. Adding namespace in C++.
* Minor sentence rewrites. Added qnn namespace.
* Added the API doc.
* Chanding default out_dtype to int8. Adding a test with in/out_dtype as uint8.
* Style fixes. Better error messages.
* Adding documentation.
* More documentation fixes.
* Adding out dtype check for requantize.
* Adding corner case for FP32 to fixed point conversion.
* Adding extra line.
* Documentation fix.
* Adding static inline.
* Incorporating jackwish comment. Removed idtype from requantize lowering.
* Removing Quantize/Dequantize code. Restricting Requantize to (u)int8/int32.
* Style fixes.
* Fix the docs.
* Move to Legalize API.
Showing
include/tvm/relay/qnn/attrs.h
0 → 100644
python/tvm/relay/qnn/__init__.py
0 → 100644
python/tvm/relay/qnn/op/__init__.py
0 → 100644
python/tvm/relay/qnn/op/_make.py
0 → 100644
python/tvm/relay/qnn/op/qnn.py
0 → 100644
src/relay/qnn/op/requantize.cc
0 → 100644
This diff is collapsed.
Click to expand it.
src/relay/qnn/util.h
0 → 100644
tests/python/relay/test_qnn_requantize.py
0 → 100644
This diff is collapsed.
Click to expand it.
Please
register
or
sign in
to comment