Commit c6cb641c by Klin

doc: use png to show table

parent 81b919fa
......@@ -3,12 +3,18 @@
## ptq部分
+ INT/POT/FLOAT量化都采用相同的框架,可以通过`quant_type`进行确定
+ 量化范围:均采用有符号对称量化,且将zeropoint定为0
+ 量化策略:在第一次forward执行伪量化,统计每层的x和weight范围;后续interface在卷积/池化层中使用量化后的值进行运算。量化均通过放缩至对应范围后从量化点列表取最近点进行实现。
+ bias说明:每种量化模式下,bias采用相同的量化策略(INT:32bit量化,POT:8bit量化,FP8:FP16-E7量化)。bias量化损失对结果影响小,该量化策略也对运算硬件实现影响不大,但是在代码实现上可以更加高效,故采用。(英伟达的量化策略甚至直接舍弃了bias)
+ 关于量化策略设置,可以更改`module.py`中的`bias_qmax`函数和`utils.py`中的`build_bias_list`函数
+ 由于INT量化位宽较高,使用量化表开销过大,直接使用round_操作即可
+ 量化点选择:
+ INT:取INT2-INT16(INT16后相比全精度无损失)
+ POT:取POT2-POT8 (POT8之后容易出现Overflow)
+ FP8:取E1-E6 (E0相当于INT量化,E7相当于POT量化,直接取相应策略效果更好)
......@@ -19,36 +25,7 @@
FP32-acc:85.08
| title | js_flops | js_param | ptq_acc | acc_loss |
| ---------- | ----------- | ----------- | ------- | ----------- |
| INT_2 | 7507.750226 | 7507.750226 | 10 | 0.882463564 |
| INT_3 | 2739.698391 | 2739.698391 | 10.16 | 0.880582981 |
| INT_4 | 602.561331 | 602.561331 | 51.21 | 0.39809591 |
| INT_5 | 140.9219722 | 140.9219722 | 77.39 | 0.09038552 |
| INT_6 | 34.51721888 | 34.51721888 | 83.03 | 0.024094969 |
| INT_7 | 8.518508719 | 8.518508719 | 84.73 | 0.004113775 |
| INT_8 | 2.135373288 | 2.135373288 | 84.84 | 0.002820874 |
| INT_9 | 0.531941163 | 0.531941163 | 85.01 | 0.000822755 |
| INT_10 | 0.131627102 | 0.131627102 | 85.08 | 0 |
| INT_11 | 0.032495647 | 0.032495647 | 85.07 | 0.000117536 |
| INT_12 | 0.008037284 | 0.008037284 | 85.06 | 0.000235073 |
| INT_13 | 0.00204601 | 0.00204601 | 85.08 | 0 |
| INT_14 | 0.000418678 | 0.000418678 | 85.08 | 0 |
| INT_15 | 0.000132161 | 0.000132161 | 85.08 | 0 |
| INT_16 | 5.84143E-06 | 5.84143E-06 | 85.08 | 0 |
| POT_2 | 7507.667349 | 7507.667349 | 10 | 0.882463564 |
| POT_3 | 1654.377593 | 1654.377593 | 14.32 | 0.831687823 |
| POT_4 | 136.7401731 | 136.7401731 | 72.49 | 0.147978373 |
| POT_5 | 134.578297 | 134.578297 | 72.65 | 0.14609779 |
| POT_6 | 134.5784142 | 134.5784142 | 72.95 | 0.142571697 |
| POT_7 | 134.5783939 | 134.5783939 | 72.08 | 0.152797367 |
| POT_8 | 134.5782946 | 134.5782946 | 72.23 | 0.151034321 |
| FLOAT_8_E1 | 33.31638902 | 33.31638902 | 82.73 | 0.027621063 |
| FLOAT_8_E2 | 32.12034309 | 32.12034309 | 83.3 | 0.020921486 |
| FLOAT_8_E3 | 0.654188087 | 0.654188087 | 85.01 | 0.000822755 |
| FLOAT_8_E4 | 2.442034365 | 2.442034365 | 84.77 | 0.00364363 |
| FLOAT_8_E5 | 9.68811736 | 9.68811736 | 59.86 | 0.296426892 |
| FLOAT_8_E6 | 37.70544899 | 37.70544899 | 51.87 | 0.390338505 |
<img src="image/table.png" alt="table" style="zoom: 33%;" />
+ 数据拟合:
......@@ -76,5 +53,4 @@
- [x] center and scale
![fig4](image/fig4.png)
![fig4](image/fig4.png)
\ No newline at end of file
from model import *
from utils import *
import gol
import sys
import argparse
......@@ -10,29 +10,6 @@ from torchvision import datasets, transforms
import os
import os.path as osp
def build_list(num_bits,e_bits):
m_bits = num_bits - 1 - e_bits
plist = [0.]
# 相邻尾数的差值
dist_m = 2 ** (-m_bits)
e = -2 ** (e_bits - 1) + 1
for m in range(1, 2 ** m_bits):
frac = m * dist_m # 尾数部分
expo = 2 ** e # 指数部分
flt = frac * expo
plist.append(flt)
plist.append(-flt)
for e in range(-2 ** (e_bits - 1) + 2, 2 ** (e_bits - 1) + 1):
expo = 2 ** e
for m in range(0, 2 ** m_bits):
frac = 1. + m * dist_m
flt = frac * expo
plist.append(flt)
plist.append(-flt)
plist = torch.Tensor(list(set(plist)))
return plist
def quantize_aware_training(model, device, train_loader, optimizer, epoch):
lossLayer = torch.nn.CrossEntropyLoss()
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment