Commit 38c41268 by Klin

feat: ALL: new frame and ptq for all model

parent 55f47e27
# 模型整合说明
+ 该文件夹下实现了基于cifar10数据集的AlexNet、AlexNet_BN、VGG_16、VGG_19、Inception_BN的整合。
### 部署说明
#### cfg_table
在该通用框架下,当前所有模型部署只需要提供一个cfg_table,如包含特殊结构(如Inception_BN的inception多分支结构),额外针对特殊结构提供cfg_table即可。详见`cfg.py`
cfg_table书写规则说明如下:
+ 每项根据进行量化的合并单位给出,例如Conv后接BN或BN+ReLu时,将被统一合并为一个量化层,则在cfg_table中表现为一项,''/'B'/'BR'参数可选。
+ 针对不涉及量化,但位置需特别指定的操作,如flatten、drop,同样在cfg_table中单独指定。
根据cfg_table,也相应简化了fold_ratio和fold_model方法,合并与量化层进行对比的全精度层的参数值、参数量、计算量。
#### 训练方案
+ 为每个模型提供了梯度学习率的训练方案,以获得更好的全精度精确度。根据传入的模型名字会自动从epochs_cfg_table和lr_cfg_table中提取,详见`train.py`
### PTQ部分概述
#### matlab脚本
+ `flops\param``flops\param_all``flops_all_weight`分别是对单个模型、全体模型、计算量加权后全体模型的拟合。
针对单个模型,脚本使用不同颜色标记不同量化类别的点;针对全体模型,脚本使用不同颜色标记不同模型的点。
+ 脚本内添加了拟合条件限制,以保证拟合曲线一定是单调不降的。如果想允许拟合曲线在某个位置出现略微的转折下降,可以改动tolerance为一个较小的负数,如-1e-5。
+ 拟合条件限制基于采样实现,对于fakefreeze,采样步长设为0.1,对于L2方法,采样步长设为10
+ 脚本同时在图像上输出拟合效果评估,如SSE、R方、RMSE指标。
+ 支持选择拟合模型,当前可选择poly为2/3/4分别表示rational模型分子分母的多项式次数
+ 由于每次拟合结果会具有细微差别,后续可以更改脚本,使得脚本在每次拟合完成后再进行三次符合约束的拟合,效果均不好于当前拟合才终止,否则取效果更好的拟合结果,并重复该过程。该步骤目前由人工完成
#### 拟合方式及结果简述
+ 分别使用了fakefreeze方法和取L2范数的方法将量化层分布rescale到和全精度层一致
+ L2范数在针对平面结构的单个模型的拟合上能取得更好的效果
+ fakefreeze在对非平面结构模型(Inception_BN)和所有模型整合上取得较好效果
+ 为了对多模型的普适性,综合考虑下选择fakefreeze方法
+ 在对fakefreeze方法的flops模型整合分布观察可以发现,计算量较大的模型在较小js散度时就达到了较大的精度损失,因此同时将模型计算量考虑对散度加权。
具体而言,分别将LOG10(Mac)和Mac^(1/3)作为散度系数,拟合效果均取得了一定提升。
PS:对param使用相同方法拟合效果反而变差,需要后续实验。对L2方法采用了乘sqrt(Mac)方案,拟合效果也有一定提高。
#### 基于fakefreeze方法的其他尝试
+ 根据weight和bias参数比
已初步试验,模型整合加权前后flops拟合效果R方分别为0.78和0.81。可能是由于weight和bias的参数比过于接近1,导致bias的js散度损失被忽略。
+ 将weight和bias从0.5:0.5变为1:1,即有bias的层将增大影响。
经过试验,取得了比原方案更好的拟合效果,将单个模型拟合结果和模型整合拟合结果罗列于下。
#### 拟合中发现的问题
在VGG_16 VGG_19 Inception_BN的fakefreeze方式中,都观察到POT量化点扎堆(acc_loss具有略小差距,js_div相近,在图上表现为连续的一竖列点),影响了量化效果。
观察这些模型的权重参数分布,可以发现出现问题的模型存在着无尖端的权重分布。而有尖无尖的分布在面对不同量化方式的分布如下:
![diff1](image/diff1.png)
![diff2](image/diff2.png)
根据不同模型权重分布的特点,可以推测出现问题的模型POT量化散度较大且集中的重要原因是量化后分布与原分布趋势不同。基于此,我们可能需要在相似度之外额外的考虑模型参数分布与量化方式的适配性。这需要进行实验的验证,例如,直接度量全精度模型-量化模型用于衡量分布趋势的系数;度量全精度权重的尖锐程度和量化表的尖锐程度等。并将所得值作用于原先所求js散度上。
+ 方案一:度量全精度模型、量化模型分布趋势相似度
使用pearson相关系数或余弦相似度,并作用于js散度。例如,若POT量化的余弦相似度较小(趋势差异较大),考虑将js散度乘余弦相似度,从而矫正因趋势过大的散度。
+ 方案二:考虑尖锐程度
考虑到无尖端分布遇到有极大尖端的POT量化点列表会产生不同趋势的问题,从分布和量化点的角度入手。例如,衡量在均值范围内的比例,差异较大可能说明尖锐程度差异大,从而矫正js散度。或者可以考虑对原分布做bins切割,若某个bins有量化点则统计该bins内元素,考虑所有和量化点在同一bins的点数以衡量分布与量化方式的适配度。
#### 后续增强拟合效果的方案
+ 针对POT量化点扎堆,可以考虑使用更关注趋势的Pearson相关系数、余弦相似度等对js散度进行修正,或者考虑将量化范围切分多个bins评估量化点覆盖率的方式修正。
+ 对weight和bias采取更合理的加权方式
+ 根据对精度的影响(不易衡量,不易确定基准)
+ 在模型整合上,尝试更有效的加权方式
+ 考虑到js散度达到一定值后acc_loss不会再上升(因为最差效果是随机分类,准确度也有10%),采取分段拟合的方式。
## ptq拟合结果图示
+ 数据拟合
+ L2:使用L2范数将量化层参数rescale
+ fakefreeze:使用dequantize_tensor将量化层参数rescale
+ fakefreeze-nodiv:weight和bias不再是0.5:0.5而是1:1
+ fakefreeze-weightratio:weight和bias按照参数比加权,该参数比通常接近于1
### L2
+ 所有模型拟合
![L2_param](image/L2_param.png)
![L2_flops](image/L2_flops.png)
![L2_flops_weighted](image/L2_flops_weighted.png)
+ 单个模型拟合
![L2_AlexNet](image/L2_AlexNet.png)
![L2_AlexNet_BN](image/L2_AlexNet_BN.png)
![L2_VGG_16](image/L2_VGG_16.png)
![L2_VGG_19](image/L2_VGG_19.png)
![L2_Inception_BN](image/L2_Inception_BN.png)
### fakefreeze
+ 所有模型拟合
![fakefreeze_param](image/fakefreeze_param.png)
![fakefreeze_flops](image/fakefreeze_flops.png)
![fakefreeze_flops_weighted_log](image/fakefreeze_flops_weighted_log.png)
![fakefreeze_flops_weighted_cuberoot](image/fakefreeze_flops_weighted_cuberoot.png)
+ 单个模型拟合
![fakefreeze_AlexNet](image/fakefreeze_AlexNet.png)
![fakefreeze_AlexNet_BN](image/fakefreeze_AlexNet_BN.png)
![fakefreeze_VGG_16](image/fakefreeze_VGG_16.png)
![fakefreeze_VGG_19](image/fakefreeze_VGG_19.png)
![fakefreeze_Inception_BN](image/fakefreeze_Inception_BN.png)
#### fakefreeze_nodiv
+ 所有模型拟合
![fakefreeze_nodiv_param](image/fakefreeze_nodiv_param.png)
![fakefreeze_nodiv_flops](image/fakefreeze_nodiv_flops.png)
![fakefreeze_nodiv_flops_weighted_log](image/fakefreeze_nodiv_flops_weighted_log.png)
![fakefreeze_nodiv_flops_weighted_cuderoot](image/fakefreeze_nodiv_flops_weighted_cuderoot.png)
+ 单个模型拟合
![fakefreeze_nodiv_AlexNet](image/fakefreeze_nodiv_AlexNet.png)
![fakefreeze_nodiv_AlexNet_BN](image/fakefreeze_nodiv_AlexNet_BN.png)
![fakefreeze_nodiv_VGG_16](image/fakefreeze_nodiv_VGG_16.png)
![fakefreeze_nodiv_VGG_19](image/fakefreeze_nodiv_VGG_19.png)
![fakefreeze_nodiv_Inception_BN](image/fakefreeze_nodiv_Inception_BN.png)
# conv: 'C',''/'B'/'BR',qi,in_ch,out_ch,kernel_size,stirde,padding,bias
# relu: 'R'
# inception: 'Inc'
# maxpool: 'MP',kernel_size,stride,padding
# adaptiveavgpool: 'AAP',output_size
# flatten: 'FT'
# dropout: 'D'
# class 10
AlexNet_cfg_table = [
['C','',True,3,32,3,1,1,True],
['R'],
['MP',2,2,0],
['C','',False,32,64,3,1,1,True],
['R'],
['MP',2,2,0],
['C','',False,64,128,3,1,1,True],
['R'],
['C','',False,128,256,3,1,1,True],
['R'],
['C','',False,256,256,3,1,1,True],
['R'],
['MP',3,2,0],
['FT'],
['D',0.5],
['FC',2304,1024,True],
['R'],
['D',0.5],
['FC',1024,512,True],
['R'],
['FC',512,10,True]
]
AlexNet_BN_cfg_table = [
['C','BR',True,3,32,3,1,1,True],
['MP',2,2,0],
['C','BR',False,32,64,3,1,1,True],
['MP',2,2,0],
['C','BR',False,64,128,3,1,1,True],
['C','BR',False,128,256,3,1,1,True],
['C','BR',False,256,256,3,1,1,True],
['MP',3,2,0],
['FT'],
['D',0.5],
['FC',2304,1024,True],
['R'],
['D',0.5],
['FC',1024,512,True],
['R'],
['FC',512,10,True]
]
VGG_16_cfg_table = [
['C','BR',True,3,64,3,1,1,True],
['C','BR',False,64,64,3,1,1,True],
['MP',2,2,0],
['C','BR',False,64,128,3,1,1,True],
['C','BR',False,128,128,3,1,1,True],
['MP',2,2,0],
['C','BR',False,128,256,3,1,1,True],
['C','BR',False,256,256,3,1,1,True],
['C','BR',False,256,256,3,1,1,True],
['MP',2,2,0],
['C','BR',False,256,512,3,1,1,True],
['C','BR',False,512,512,3,1,1,True],
['C','BR',False,512,512,3,1,1,True],
['MP',2,2,0],
['C','BR',False,512,512,3,1,1,True],
['C','BR',False,512,512,3,1,1,True],
['C','BR',False,512,512,3,1,1,True],
['MP',2,2,0],
['FT'],
['FC',512,4096,True],
['R'],
['D',0.5],
['FC',4096,4096,True],
['R'],
['D',0.5],
['FC',4096,10,True]
]
VGG_19_cfg_table = [
['C','BR',True,3,64,3,1,1,True],
['C','BR',False,64,64,3,1,1,True],
['MP',2,2,0],
['C','BR',False,64,128,3,1,1,True],
['C','BR',False,128,128,3,1,1,True],
['MP',2,2,0],
['C','BR',False,128,256,3,1,1,True],
['C','BR',False,256,256,3,1,1,True],
['C','BR',False,256,256,3,1,1,True],
['C','BR',False,256,256,3,1,1,True],
['MP',2,2,0],
['C','BR',False,256,512,3,1,1,True],
['C','BR',False,512,512,3,1,1,True],
['C','BR',False,512,512,3,1,1,True],
['C','BR',False,512,512,3,1,1,True],
['MP',2,2,0],
['C','BR',False,512,512,3,1,1,True],
['C','BR',False,512,512,3,1,1,True],
['C','BR',False,512,512,3,1,1,True],
['C','BR',False,512,512,3,1,1,True],
['MP',2,2,0],
['FT'],
['FC',512,4096,True],
['R'],
['D',0.5],
['FC',4096,4096,True],
['R'],
['D',0.5],
['FC',4096,10,True]
]
Inception_BN_cfg_table = [
['C','',True,3,64,3,1,1,True],
['R'],
['C','',False,64,64,3,1,1,True],
['R'],
['Inc',0],
['Inc',1],
['MP',3,2,1],
['Inc',2],
['Inc',3],
['Inc',4],
['Inc',5],
['Inc',6],
['MP',3,2,1],
['Inc',7],
['Inc',8],
['AAP',1],
['C','',False,1024,10,1,1,0,True],
['FT']
]
model_cfg_table = {
'AlexNet' : AlexNet_cfg_table,
'AlexNet_BN' : AlexNet_BN_cfg_table,
'VGG_16' : VGG_16_cfg_table,
'VGG_19' : VGG_19_cfg_table,
'Inception_BN' : Inception_BN_cfg_table
}
inc_ch_table=[
[64, 64, 96,128, 16, 32, 32],#3a
[256,128,128,192, 32, 96, 64],#3b
[480,192, 96,208, 16, 48, 64],#4a
[512,160,112,224, 24, 64, 64],#4b
[512,128,128,256, 24, 64, 64],#4c
[512,112,144,288, 32, 64, 64],#4d
[528,256,160,320, 32,128,128],#4e
[832,256,160,320, 32,128,128],#5a
[832,384,192,384, 48,128,128] #5b
]
# br0,br1,br2,br3 <- br1x1,br3x3,br5x5,brM
# 这里的第2,3个参数是channel中的索引
# 对于cfg拓展,可以认为'C'有'BR'参数,且bias为false。这里的一个项根据量化后可融合结构指定
inc_cfg_table = [
[['C',0,1,1,1,0]],
[['C',0,2,1,1,0],
['C',2,3,3,1,1]],
[['C',0,4,1,1,0],
['C',4,5,5,1,2]],
[['MP',3,1,1],
['R'],
['C',0,6,1,1,0]]
]
\ No newline at end of file
import sys
import os
# 从get_param.py输出重定向文件val.txt中提取参数量和计算量
def extract_ratio(model_name):
fr = open('param_flops/'+model_name+'.txt','r')
lines = fr.readlines()
Mac = lines[1].split('Mac,')[0].split(',')[-1]
if 'M' in Mac:
Mac = Mac.split('M')[0]
Mac = float(Mac)
elif 'G' in Mac:
Mac = Mac.split('G')[0]
Mac = float(Mac)
Mac *= 1024
Param = lines[1].split('M,')[0]
Param = float(Param)
layer = []
par_ratio = []
flop_ratio = []
weight_ratio = []
for line in lines:
if '(' in line and ')' in line:
layer.append(line.split(')')[0].split('(')[1])
r1 = line.split('%')[0].split(',')[-1]
r1 = float(r1)
par_ratio.append(r1)
r2 = line.split('%')[-2].split(',')[-1]
r2 = float(r2)
flop_ratio.append(r2)
if 'conv' in line:
#无论是否bias=false都计算,fold之后直接使用conv的近似计算
inch = line.split(',')[4]
# outch = line.split(',')[5]
klsz = line.split(',')[6].split('(')[-1]
inch = float(inch)
# outch = float(outch)
klsz = float(klsz)
wr = inch * klsz * klsz
wr = wr / (1+wr)
weight_ratio.append(wr)
elif 'fc' in line:
inch = line.split(',')[4].split('=')[-1]
inch = float(inch)
wr = inch / (1+inch)
weight_ratio.append(wr)
else:
weight_ratio.append(0)
return Mac, Param, layer, par_ratio, flop_ratio, weight_ratio
if __name__ == "__main__":
Mac, Param, layer, par_ratio, flop_ratio, weight_ratio = extract_ratio('Inception_BN')
print(Mac)
print(Param)
print(layer)
print(par_ratio)
print(flop_ratio)
print(weight_ratio)
\ No newline at end of file
from torch.autograd import Function
class FakeQuantize(Function):
@staticmethod
def forward(ctx, x, qparam):
x = qparam.quantize_tensor(x)
x = qparam.dequantize_tensor(x)
return x
@staticmethod
def backward(ctx, grad_output):
return grad_output, None
\ No newline at end of file
from model import *
import sys
import torch
from ptflops import get_model_complexity_info
if __name__ == "__main__":
model_name = sys.argv[1]
model = Model(model_name)
full_file = 'ckpt/cifar10_'+model_name+'.pt'
model.load_state_dict(torch.load(full_file))
flops, params = get_model_complexity_info(model, (3, 32, 32), as_strings=True, print_per_layer_stat=True)
#!/bin/bash
#- Job parameters
# (TODO)
# Please modify job name
#SBATCH -J ALL # The job name
#SBATCH -o ret/ret-%j.out # Write the standard output to file named 'ret-<job_number>.out'
#SBATCH -e ret/ret-%j.err # Write the standard error to file named 'ret-<job_number>.err'
#- Resources
# (TODO)
# Please modify your requirements
#SBATCH -p nv-gpu # Submit to 'nv-gpu' Partitiion
#SBATCH -t 0-01:30:00 # Run for a maximum time of 0 days, 12 hours, 00 mins, 00 secs
#SBATCH --nodes=1 # Request N nodes
#SBATCH --gres=gpu:1 # Request M GPU per node
#SBATCH --gres-flags=enforce-binding # CPU-GPU Affinity
#SBATCH --qos=gpu-debug # Request QOS Type
###
### The system will alloc 8 or 16 cores per gpu by default.
### If you need more or less, use following:
### #SBATCH --cpus-per-task=K # Request K cores
###
###
### Without specifying the constraint, any available nodes that meet the requirement will be allocated
### You can specify the characteristics of the compute nodes, and even the names of the compute nodes
###
### #SBATCH --nodelist=gpu-v00 # Request a specific list of hosts
### #SBATCH --constraint="Volta|RTX8000" # Request GPU Type: Volta(V100 or V100S) or RTX8000
###
# set constraint for RTX8000 to meet my cuda
#SBATCH --constraint="Ampere|RTX8000|T4"
#- Log information
echo "Job start at $(date "+%Y-%m-%d %H:%M:%S")"
echo "Job run at:"
echo "$(hostnamectl)"
#- Load environments
source /tools/module_env.sh
module list # list modules loaded
##- Tools
module load cluster-tools/v1.0
module load slurm-tools/v1.0
module load cmake/3.15.7
module load git/2.17.1
module load vim/8.1.2424
##- language
module load python3/3.6.8
##- CUDA
# module load cuda-cudnn/10.2-7.6.5
# module load cuda-cudnn/11.2-8.2.1
module load cuda-cudnn/11.1-8.2.1
##- virtualenv
# source xxxxx/activate
echo $(module list) # list modules loaded
echo $(which gcc)
echo $(which python)
echo $(which python3)
cluster-quota # nas quota
nvidia-smi --format=csv --query-gpu=name,driver_version,power.limit # gpu info
#- Warning! Please not change your CUDA_VISIBLE_DEVICES
#- in `.bashrc`, `env.sh`, or your job script
echo "Use GPU ${CUDA_VISIBLE_DEVICES}" # which gpus
#- The CUDA_VISIBLE_DEVICES variable is assigned and specified by SLURM
#- Job step
# [EDIT HERE(TODO)]
name_list="AlexNet AlexNet_BN VGG_16 VGG_19 Inception_BN"
for name in $name_list; do
if [ -f "param_flops/$name.txt" ];then
echo "$name: param_flops exists"
elif [ ! -f "ckpt/cifar10_$name.pt" ];then
echo "$name: ckpt not exists"
else
python get_param_flops.py $name > param_flops/$name.txt
fi
done
#- End
echo "Job end at $(date "+%Y-%m-%d %H:%M:%S")"
# -*- coding: utf-8 -*-
# 用于多个module之间共享全局变量
def _init(): # 初始化
global _global_dict
_global_dict = {}
def set_value(value,is_bias=False):
# 定义一个全局变量
if is_bias:
_global_dict[0] = value
else:
_global_dict[1] = value
def get_value(is_bias=False): # 给bias独立于各变量外的精度
if is_bias:
return _global_dict[0]
else:
return _global_dict[1]
Model(
3.87 M, 100.000% Params, 70.08 MMac, 100.000% MACs,
(conv0): Conv2d(896, 0.023% Params, 917.5 KMac, 1.309% MACs, 3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu1): ReLU(0, 0.000% Params, 32.77 KMac, 0.047% MACs, inplace=True)
(pool2): MaxPool2d(0, 0.000% Params, 32.77 KMac, 0.047% MACs, kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv3): Conv2d(18.5 k, 0.478% Params, 4.73 MMac, 6.756% MACs, 32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu4): ReLU(0, 0.000% Params, 16.38 KMac, 0.023% MACs, inplace=True)
(pool5): MaxPool2d(0, 0.000% Params, 16.38 KMac, 0.023% MACs, kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv6): Conv2d(73.86 k, 1.909% Params, 4.73 MMac, 6.745% MACs, 64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu7): ReLU(0, 0.000% Params, 8.19 KMac, 0.012% MACs, inplace=True)
(conv8): Conv2d(295.17 k, 7.630% Params, 18.89 MMac, 26.955% MACs, 128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu9): ReLU(0, 0.000% Params, 16.38 KMac, 0.023% MACs, inplace=True)
(conv10): Conv2d(590.08 k, 15.252% Params, 37.77 MMac, 53.887% MACs, 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu11): ReLU(0, 0.000% Params, 16.38 KMac, 0.023% MACs, inplace=True)
(pool12): MaxPool2d(0, 0.000% Params, 16.38 KMac, 0.023% MACs, kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
(drop14): Dropout(0, 0.000% Params, 0.0 Mac, 0.000% MACs, p=0.5, inplace=False)
(fc15): Linear(2.36 M, 61.010% Params, 2.36 MMac, 3.368% MACs, in_features=2304, out_features=1024, bias=True)
(relu16): ReLU(0, 0.000% Params, 1.02 KMac, 0.001% MACs, inplace=True)
(drop17): Dropout(0, 0.000% Params, 0.0 Mac, 0.000% MACs, p=0.5, inplace=False)
(fc18): Linear(524.8 k, 13.565% Params, 524.8 KMac, 0.749% MACs, in_features=1024, out_features=512, bias=True)
(relu19): ReLU(0, 0.000% Params, 512.0 Mac, 0.001% MACs, inplace=True)
(fc20): Linear(5.13 k, 0.133% Params, 5.13 KMac, 0.007% MACs, in_features=512, out_features=10, bias=True)
)
Model(
3.87 M, 100.000% Params, 70.26 MMac, 100.000% MACs,
(conv0): Conv2d(896, 0.023% Params, 917.5 KMac, 1.306% MACs, 3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn0): BatchNorm2d(64, 0.002% Params, 65.54 KMac, 0.093% MACs, 32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu0): ReLU(0, 0.000% Params, 32.77 KMac, 0.047% MACs, inplace=True)
(pool1): MaxPool2d(0, 0.000% Params, 32.77 KMac, 0.047% MACs, kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv2): Conv2d(18.5 k, 0.478% Params, 4.73 MMac, 6.739% MACs, 32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn2): BatchNorm2d(128, 0.003% Params, 32.77 KMac, 0.047% MACs, 64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu2): ReLU(0, 0.000% Params, 16.38 KMac, 0.023% MACs, inplace=True)
(pool3): MaxPool2d(0, 0.000% Params, 16.38 KMac, 0.023% MACs, kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv4): Conv2d(73.86 k, 1.908% Params, 4.73 MMac, 6.727% MACs, 64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn4): BatchNorm2d(256, 0.007% Params, 16.38 KMac, 0.023% MACs, 128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu4): ReLU(0, 0.000% Params, 8.19 KMac, 0.012% MACs, inplace=True)
(conv5): Conv2d(295.17 k, 7.627% Params, 18.89 MMac, 26.886% MACs, 128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn5): BatchNorm2d(512, 0.013% Params, 32.77 KMac, 0.047% MACs, 256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu5): ReLU(0, 0.000% Params, 16.38 KMac, 0.023% MACs, inplace=True)
(conv6): Conv2d(590.08 k, 15.247% Params, 37.77 MMac, 53.748% MACs, 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn6): BatchNorm2d(512, 0.013% Params, 32.77 KMac, 0.047% MACs, 256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu6): ReLU(0, 0.000% Params, 16.38 KMac, 0.023% MACs, inplace=True)
(pool7): MaxPool2d(0, 0.000% Params, 16.38 KMac, 0.023% MACs, kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
(drop9): Dropout(0, 0.000% Params, 0.0 Mac, 0.000% MACs, p=0.5, inplace=False)
(fc10): Linear(2.36 M, 60.987% Params, 2.36 MMac, 3.359% MACs, in_features=2304, out_features=1024, bias=True)
(relu11): ReLU(0, 0.000% Params, 1.02 KMac, 0.001% MACs, inplace=True)
(drop12): Dropout(0, 0.000% Params, 0.0 Mac, 0.000% MACs, p=0.5, inplace=False)
(fc13): Linear(524.8 k, 13.560% Params, 524.8 KMac, 0.747% MACs, in_features=1024, out_features=512, bias=True)
(relu14): ReLU(0, 0.000% Params, 512.0 Mac, 0.001% MACs, inplace=True)
(fc15): Linear(5.13 k, 0.133% Params, 5.13 KMac, 0.007% MACs, in_features=512, out_features=10, bias=True)
)
Model(
33.65 M, 100.000% Params, 333.36 MMac, 100.000% MACs,
(conv0): Conv2d(1.79 k, 0.005% Params, 1.84 MMac, 0.550% MACs, 3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn0): BatchNorm2d(128, 0.000% Params, 131.07 KMac, 0.039% MACs, 64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu0): ReLU(0, 0.000% Params, 65.54 KMac, 0.020% MACs, inplace=True)
(conv1): Conv2d(36.93 k, 0.110% Params, 37.81 MMac, 11.343% MACs, 64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn1): BatchNorm2d(128, 0.000% Params, 131.07 KMac, 0.039% MACs, 64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu1): ReLU(0, 0.000% Params, 65.54 KMac, 0.020% MACs, inplace=True)
(pool2): MaxPool2d(0, 0.000% Params, 65.54 KMac, 0.020% MACs, kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv3): Conv2d(73.86 k, 0.220% Params, 18.91 MMac, 5.672% MACs, 64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn3): BatchNorm2d(256, 0.001% Params, 65.54 KMac, 0.020% MACs, 128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu3): ReLU(0, 0.000% Params, 32.77 KMac, 0.010% MACs, inplace=True)
(conv4): Conv2d(147.58 k, 0.439% Params, 37.78 MMac, 11.334% MACs, 128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn4): BatchNorm2d(256, 0.001% Params, 65.54 KMac, 0.020% MACs, 128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu4): ReLU(0, 0.000% Params, 32.77 KMac, 0.010% MACs, inplace=True)
(pool5): MaxPool2d(0, 0.000% Params, 32.77 KMac, 0.010% MACs, kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv6): Conv2d(295.17 k, 0.877% Params, 18.89 MMac, 5.667% MACs, 128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn6): BatchNorm2d(512, 0.002% Params, 32.77 KMac, 0.010% MACs, 256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu6): ReLU(0, 0.000% Params, 16.38 KMac, 0.005% MACs, inplace=True)
(conv7): Conv2d(590.08 k, 1.754% Params, 37.77 MMac, 11.329% MACs, 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn7): BatchNorm2d(512, 0.002% Params, 32.77 KMac, 0.010% MACs, 256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu7): ReLU(0, 0.000% Params, 16.38 KMac, 0.005% MACs, inplace=True)
(conv8): Conv2d(590.08 k, 1.754% Params, 37.77 MMac, 11.329% MACs, 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn8): BatchNorm2d(512, 0.002% Params, 32.77 KMac, 0.010% MACs, 256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu8): ReLU(0, 0.000% Params, 16.38 KMac, 0.005% MACs, inplace=True)
(pool9): MaxPool2d(0, 0.000% Params, 16.38 KMac, 0.005% MACs, kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv10): Conv2d(1.18 M, 3.508% Params, 18.88 MMac, 5.664% MACs, 256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn10): BatchNorm2d(1.02 k, 0.003% Params, 16.38 KMac, 0.005% MACs, 512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu10): ReLU(0, 0.000% Params, 8.19 KMac, 0.002% MACs, inplace=True)
(conv11): Conv2d(2.36 M, 7.013% Params, 37.76 MMac, 11.326% MACs, 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn11): BatchNorm2d(1.02 k, 0.003% Params, 16.38 KMac, 0.005% MACs, 512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu11): ReLU(0, 0.000% Params, 8.19 KMac, 0.002% MACs, inplace=True)
(conv12): Conv2d(2.36 M, 7.013% Params, 37.76 MMac, 11.326% MACs, 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn12): BatchNorm2d(1.02 k, 0.003% Params, 16.38 KMac, 0.005% MACs, 512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu12): ReLU(0, 0.000% Params, 8.19 KMac, 0.002% MACs, inplace=True)
(pool13): MaxPool2d(0, 0.000% Params, 8.19 KMac, 0.002% MACs, kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv14): Conv2d(2.36 M, 7.013% Params, 9.44 MMac, 2.832% MACs, 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn14): BatchNorm2d(1.02 k, 0.003% Params, 4.1 KMac, 0.001% MACs, 512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu14): ReLU(0, 0.000% Params, 2.05 KMac, 0.001% MACs, inplace=True)
(conv15): Conv2d(2.36 M, 7.013% Params, 9.44 MMac, 2.832% MACs, 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn15): BatchNorm2d(1.02 k, 0.003% Params, 4.1 KMac, 0.001% MACs, 512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu15): ReLU(0, 0.000% Params, 2.05 KMac, 0.001% MACs, inplace=True)
(conv16): Conv2d(2.36 M, 7.013% Params, 9.44 MMac, 2.832% MACs, 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn16): BatchNorm2d(1.02 k, 0.003% Params, 4.1 KMac, 0.001% MACs, 512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu16): ReLU(0, 0.000% Params, 2.05 KMac, 0.001% MACs, inplace=True)
(pool17): MaxPool2d(0, 0.000% Params, 2.05 KMac, 0.001% MACs, kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(fc19): Linear(2.1 M, 6.245% Params, 2.1 MMac, 0.630% MACs, in_features=512, out_features=4096, bias=True)
(relu20): ReLU(0, 0.000% Params, 4.1 KMac, 0.001% MACs, inplace=True)
(drop21): Dropout(0, 0.000% Params, 0.0 Mac, 0.000% MACs, p=0.5, inplace=False)
(fc22): Linear(16.78 M, 49.875% Params, 16.78 MMac, 5.034% MACs, in_features=4096, out_features=4096, bias=True)
(relu23): ReLU(0, 0.000% Params, 4.1 KMac, 0.001% MACs, inplace=True)
(drop24): Dropout(0, 0.000% Params, 0.0 Mac, 0.000% MACs, p=0.5, inplace=False)
(fc25): Linear(40.97 k, 0.122% Params, 40.97 KMac, 0.012% MACs, in_features=4096, out_features=10, bias=True)
)
Model(
38.96 M, 100.000% Params, 418.4 MMac, 100.000% MACs,
(conv0): Conv2d(1.79 k, 0.005% Params, 1.84 MMac, 0.439% MACs, 3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn0): BatchNorm2d(128, 0.000% Params, 131.07 KMac, 0.031% MACs, 64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu0): ReLU(0, 0.000% Params, 65.54 KMac, 0.016% MACs, inplace=True)
(conv1): Conv2d(36.93 k, 0.095% Params, 37.81 MMac, 9.038% MACs, 64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn1): BatchNorm2d(128, 0.000% Params, 131.07 KMac, 0.031% MACs, 64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu1): ReLU(0, 0.000% Params, 65.54 KMac, 0.016% MACs, inplace=True)
(pool2): MaxPool2d(0, 0.000% Params, 65.54 KMac, 0.016% MACs, kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv3): Conv2d(73.86 k, 0.190% Params, 18.91 MMac, 4.519% MACs, 64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn3): BatchNorm2d(256, 0.001% Params, 65.54 KMac, 0.016% MACs, 128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu3): ReLU(0, 0.000% Params, 32.77 KMac, 0.008% MACs, inplace=True)
(conv4): Conv2d(147.58 k, 0.379% Params, 37.78 MMac, 9.030% MACs, 128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn4): BatchNorm2d(256, 0.001% Params, 65.54 KMac, 0.016% MACs, 128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu4): ReLU(0, 0.000% Params, 32.77 KMac, 0.008% MACs, inplace=True)
(pool5): MaxPool2d(0, 0.000% Params, 32.77 KMac, 0.008% MACs, kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv6): Conv2d(295.17 k, 0.758% Params, 18.89 MMac, 4.515% MACs, 128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn6): BatchNorm2d(512, 0.001% Params, 32.77 KMac, 0.008% MACs, 256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu6): ReLU(0, 0.000% Params, 16.38 KMac, 0.004% MACs, inplace=True)
(conv7): Conv2d(590.08 k, 1.515% Params, 37.77 MMac, 9.026% MACs, 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn7): BatchNorm2d(512, 0.001% Params, 32.77 KMac, 0.008% MACs, 256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu7): ReLU(0, 0.000% Params, 16.38 KMac, 0.004% MACs, inplace=True)
(conv8): Conv2d(590.08 k, 1.515% Params, 37.77 MMac, 9.026% MACs, 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn8): BatchNorm2d(512, 0.001% Params, 32.77 KMac, 0.008% MACs, 256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu8): ReLU(0, 0.000% Params, 16.38 KMac, 0.004% MACs, inplace=True)
(conv9): Conv2d(590.08 k, 1.515% Params, 37.77 MMac, 9.026% MACs, 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn9): BatchNorm2d(512, 0.001% Params, 32.77 KMac, 0.008% MACs, 256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu9): ReLU(0, 0.000% Params, 16.38 KMac, 0.004% MACs, inplace=True)
(pool10): MaxPool2d(0, 0.000% Params, 16.38 KMac, 0.004% MACs, kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv11): Conv2d(1.18 M, 3.029% Params, 18.88 MMac, 4.513% MACs, 256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn11): BatchNorm2d(1.02 k, 0.003% Params, 16.38 KMac, 0.004% MACs, 512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu11): ReLU(0, 0.000% Params, 8.19 KMac, 0.002% MACs, inplace=True)
(conv12): Conv2d(2.36 M, 6.057% Params, 37.76 MMac, 9.024% MACs, 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn12): BatchNorm2d(1.02 k, 0.003% Params, 16.38 KMac, 0.004% MACs, 512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu12): ReLU(0, 0.000% Params, 8.19 KMac, 0.002% MACs, inplace=True)
(conv13): Conv2d(2.36 M, 6.057% Params, 37.76 MMac, 9.024% MACs, 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn13): BatchNorm2d(1.02 k, 0.003% Params, 16.38 KMac, 0.004% MACs, 512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu13): ReLU(0, 0.000% Params, 8.19 KMac, 0.002% MACs, inplace=True)
(conv14): Conv2d(2.36 M, 6.057% Params, 37.76 MMac, 9.024% MACs, 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn14): BatchNorm2d(1.02 k, 0.003% Params, 16.38 KMac, 0.004% MACs, 512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu14): ReLU(0, 0.000% Params, 8.19 KMac, 0.002% MACs, inplace=True)
(pool15): MaxPool2d(0, 0.000% Params, 8.19 KMac, 0.002% MACs, kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv16): Conv2d(2.36 M, 6.057% Params, 9.44 MMac, 2.256% MACs, 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn16): BatchNorm2d(1.02 k, 0.003% Params, 4.1 KMac, 0.001% MACs, 512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu16): ReLU(0, 0.000% Params, 2.05 KMac, 0.000% MACs, inplace=True)
(conv17): Conv2d(2.36 M, 6.057% Params, 9.44 MMac, 2.256% MACs, 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn17): BatchNorm2d(1.02 k, 0.003% Params, 4.1 KMac, 0.001% MACs, 512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu17): ReLU(0, 0.000% Params, 2.05 KMac, 0.000% MACs, inplace=True)
(conv18): Conv2d(2.36 M, 6.057% Params, 9.44 MMac, 2.256% MACs, 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn18): BatchNorm2d(1.02 k, 0.003% Params, 4.1 KMac, 0.001% MACs, 512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu18): ReLU(0, 0.000% Params, 2.05 KMac, 0.000% MACs, inplace=True)
(conv19): Conv2d(2.36 M, 6.057% Params, 9.44 MMac, 2.256% MACs, 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn19): BatchNorm2d(1.02 k, 0.003% Params, 4.1 KMac, 0.001% MACs, 512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu19): ReLU(0, 0.000% Params, 2.05 KMac, 0.000% MACs, inplace=True)
(pool20): MaxPool2d(0, 0.000% Params, 2.05 KMac, 0.000% MACs, kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(fc22): Linear(2.1 M, 5.393% Params, 2.1 MMac, 0.502% MACs, in_features=512, out_features=4096, bias=True)
(relu23): ReLU(0, 0.000% Params, 4.1 KMac, 0.001% MACs, inplace=True)
(drop24): Dropout(0, 0.000% Params, 0.0 Mac, 0.000% MACs, p=0.5, inplace=False)
(fc25): Linear(16.78 M, 43.074% Params, 16.78 MMac, 4.011% MACs, in_features=4096, out_features=4096, bias=True)
(relu26): ReLU(0, 0.000% Params, 4.1 KMac, 0.001% MACs, inplace=True)
(drop27): Dropout(0, 0.000% Params, 0.0 Mac, 0.000% MACs, p=0.5, inplace=False)
(fc28): Linear(40.97 k, 0.105% Params, 40.97 KMac, 0.010% MACs, in_features=4096, out_features=10, bias=True)
)
This diff is collapsed. Click to expand it.
#!/bin/bash
#- Job parameters
# (TODO)
# Please modify job name
#SBATCH -J ALL # The job name
#SBATCH -o ret/ret-%j.out # Write the standard output to file named 'ret-<job_number>.out'
#SBATCH -e ret/ret-%j.err # Write the standard error to file named 'ret-<job_number>.err'
#- Resources
# (TODO)
# Please modify your requirements
#SBATCH -p nv-gpu # Submit to 'nv-gpu' Partitiion
#SBATCH -t 3-00:00:00 # Run for a maximum time of 0 days, 12 hours, 00 mins, 00 secs
#SBATCH --nodes=1 # Request N nodes
#SBATCH --gres=gpu:1 # Request M GPU per node
#SBATCH --gres-flags=enforce-binding # CPU-GPU Affinity
#SBATCH --qos=gpu-long # Request QOS Type
###
### The system will alloc 8 or 16 cores per gpu by default.
### If you need more or less, use following:
### #SBATCH --cpus-per-task=K # Request K cores
###
###
### Without specifying the constraint, any available nodes that meet the requirement will be allocated
### You can specify the characteristics of the compute nodes, and even the names of the compute nodes
###
### #SBATCH --nodelist=gpu-v00 # Request a specific list of hosts
### #SBATCH --constraint="Volta|RTX8000" # Request GPU Type: Volta(V100 or V100S) or RTX8000
###
# set constraint for RTX8000 to meet my cuda
#SBATCH --constraint="Ampere|RTX8000|T4"
#- Log information
echo "Job start at $(date "+%Y-%m-%d %H:%M:%S")"
echo "Job run at:"
echo "$(hostnamectl)"
#- Load environments
source /tools/module_env.sh
module list # list modules loaded
##- Tools
module load cluster-tools/v1.0
module load slurm-tools/v1.0
module load cmake/3.15.7
module load git/2.17.1
module load vim/8.1.2424
##- language
module load python3/3.6.8
##- CUDA
# module load cuda-cudnn/10.2-7.6.5
# module load cuda-cudnn/11.2-8.2.1
module load cuda-cudnn/11.1-8.2.1
##- virtualenv
# source xxxxx/activate
echo $(module list) # list modules loaded
echo $(which gcc)
echo $(which python)
echo $(which python3)
cluster-quota # nas quota
nvidia-smi --format=csv --query-gpu=name,driver_version,power.limit # gpu info
#- Warning! Please not change your CUDA_VISIBLE_DEVICES
#- in `.bashrc`, `env.sh`, or your job script
echo "Use GPU ${CUDA_VISIBLE_DEVICES}" # which gpus
#- The CUDA_VISIBLE_DEVICES variable is assigned and specified by SLURM
#- Job step
# [EDIT HERE(TODO)]
python ptq.py
#- End
echo "Job end at $(date "+%Y-%m-%d %H:%M:%S")"
#!/bin/bash
#- Job parameters
# (TODO)
# Please modify job name
#SBATCH -J ALL-L2 # The job name
#SBATCH -o ret/ret-%j.out # Write the standard output to file named 'ret-<job_number>.out'
#SBATCH -e ret/ret-%j.err # Write the standard error to file named 'ret-<job_number>.err'
#- Resources
# (TODO)
# Please modify your requirements
#SBATCH -p nv-gpu # Submit to 'nv-gpu' Partitiion
#SBATCH -t 3-00:00:00 # Run for a maximum time of 0 days, 12 hours, 00 mins, 00 secs
#SBATCH --nodes=1 # Request N nodes
#SBATCH --gres=gpu:1 # Request M GPU per node
#SBATCH --gres-flags=enforce-binding # CPU-GPU Affinity
#SBATCH --qos=gpu-long # Request QOS Type
###
### The system will alloc 8 or 16 cores per gpu by default.
### If you need more or less, use following:
### #SBATCH --cpus-per-task=K # Request K cores
###
###
### Without specifying the constraint, any available nodes that meet the requirement will be allocated
### You can specify the characteristics of the compute nodes, and even the names of the compute nodes
###
### #SBATCH --nodelist=gpu-v00 # Request a specific list of hosts
### #SBATCH --constraint="Volta|RTX8000" # Request GPU Type: Volta(V100 or V100S) or RTX8000
###
# set constraint for RTX8000 to meet my cuda
#SBATCH --constraint="Ampere|RTX8000|T4"
#- Log information
echo "Job start at $(date "+%Y-%m-%d %H:%M:%S")"
echo "Job run at:"
echo "$(hostnamectl)"
#- Load environments
source /tools/module_env.sh
module list # list modules loaded
##- Tools
module load cluster-tools/v1.0
module load slurm-tools/v1.0
module load cmake/3.15.7
module load git/2.17.1
module load vim/8.1.2424
##- language
module load python3/3.6.8
##- CUDA
# module load cuda-cudnn/10.2-7.6.5
# module load cuda-cudnn/11.2-8.2.1
module load cuda-cudnn/11.1-8.2.1
##- virtualenv
# source xxxxx/activate
echo $(module list) # list modules loaded
echo $(which gcc)
echo $(which python)
echo $(which python3)
cluster-quota # nas quota
nvidia-smi --format=csv --query-gpu=name,driver_version,power.limit # gpu info
#- Warning! Please not change your CUDA_VISIBLE_DEVICES
#- in `.bashrc`, `env.sh`, or your job script
echo "Use GPU ${CUDA_VISIBLE_DEVICES}" # which gpus
#- The CUDA_VISIBLE_DEVICES variable is assigned and specified by SLURM
#- Job step
# [EDIT HERE(TODO)]
python ptq_L2.py
#- End
echo "Job end at $(date "+%Y-%m-%d %H:%M:%S")"
#!/bin/bash
#- Job parameters
# (TODO)
# Please modify job name
#SBATCH -J ALL-nodiv # The job name
#SBATCH -o ret/ret-%j.out # Write the standard output to file named 'ret-<job_number>.out'
#SBATCH -e ret/ret-%j.err # Write the standard error to file named 'ret-<job_number>.err'
#- Resources
# (TODO)
# Please modify your requirements
#SBATCH -p nv-gpu # Submit to 'nv-gpu' Partitiion
#SBATCH -t 3-00:00:00 # Run for a maximum time of 0 days, 12 hours, 00 mins, 00 secs
#SBATCH --nodes=1 # Request N nodes
#SBATCH --gres=gpu:1 # Request M GPU per node
#SBATCH --gres-flags=enforce-binding # CPU-GPU Affinity
#SBATCH --qos=gpu-long # Request QOS Type
###
### The system will alloc 8 or 16 cores per gpu by default.
### If you need more or less, use following:
### #SBATCH --cpus-per-task=K # Request K cores
###
###
### Without specifying the constraint, any available nodes that meet the requirement will be allocated
### You can specify the characteristics of the compute nodes, and even the names of the compute nodes
###
### #SBATCH --nodelist=gpu-v00 # Request a specific list of hosts
### #SBATCH --constraint="Volta|RTX8000" # Request GPU Type: Volta(V100 or V100S) or RTX8000
###
# set constraint for RTX8000 to meet my cuda
#SBATCH --constraint="Ampere|RTX8000|T4"
#- Log information
echo "Job start at $(date "+%Y-%m-%d %H:%M:%S")"
echo "Job run at:"
echo "$(hostnamectl)"
#- Load environments
source /tools/module_env.sh
module list # list modules loaded
##- Tools
module load cluster-tools/v1.0
module load slurm-tools/v1.0
module load cmake/3.15.7
module load git/2.17.1
module load vim/8.1.2424
##- language
module load python3/3.6.8
##- CUDA
# module load cuda-cudnn/10.2-7.6.5
# module load cuda-cudnn/11.2-8.2.1
module load cuda-cudnn/11.1-8.2.1
##- virtualenv
# source xxxxx/activate
echo $(module list) # list modules loaded
echo $(which gcc)
echo $(which python)
echo $(which python3)
cluster-quota # nas quota
nvidia-smi --format=csv --query-gpu=name,driver_version,power.limit # gpu info
#- Warning! Please not change your CUDA_VISIBLE_DEVICES
#- in `.bashrc`, `env.sh`, or your job script
echo "Use GPU ${CUDA_VISIBLE_DEVICES}" # which gpus
#- The CUDA_VISIBLE_DEVICES variable is assigned and specified by SLURM
#- Job step
# [EDIT HERE(TODO)]
python ptq_nodiv.py
#- End
echo "Job end at $(date "+%Y-%m-%d %H:%M:%S")"
#!/bin/bash
#- Job parameters
# (TODO)
# Please modify job name
#SBATCH -J ALL-weightratio # The job name
#SBATCH -o ret/ret-%j.out # Write the standard output to file named 'ret-<job_number>.out'
#SBATCH -e ret/ret-%j.err # Write the standard error to file named 'ret-<job_number>.err'
#- Resources
# (TODO)
# Please modify your requirements
#SBATCH -p nv-gpu # Submit to 'nv-gpu' Partitiion
#SBATCH -t 3-00:00:00 # Run for a maximum time of 0 days, 12 hours, 00 mins, 00 secs
#SBATCH --nodes=1 # Request N nodes
#SBATCH --gres=gpu:1 # Request M GPU per node
#SBATCH --gres-flags=enforce-binding # CPU-GPU Affinity
#SBATCH --qos=gpu-long # Request QOS Type
###
### The system will alloc 8 or 16 cores per gpu by default.
### If you need more or less, use following:
### #SBATCH --cpus-per-task=K # Request K cores
###
###
### Without specifying the constraint, any available nodes that meet the requirement will be allocated
### You can specify the characteristics of the compute nodes, and even the names of the compute nodes
###
### #SBATCH --nodelist=gpu-v00 # Request a specific list of hosts
### #SBATCH --constraint="Volta|RTX8000" # Request GPU Type: Volta(V100 or V100S) or RTX8000
###
# set constraint for RTX8000 to meet my cuda
#SBATCH --constraint="Ampere|RTX8000|T4"
#- Log information
echo "Job start at $(date "+%Y-%m-%d %H:%M:%S")"
echo "Job run at:"
echo "$(hostnamectl)"
#- Load environments
source /tools/module_env.sh
module list # list modules loaded
##- Tools
module load cluster-tools/v1.0
module load slurm-tools/v1.0
module load cmake/3.15.7
module load git/2.17.1
module load vim/8.1.2424
##- language
module load python3/3.6.8
##- CUDA
# module load cuda-cudnn/10.2-7.6.5
# module load cuda-cudnn/11.2-8.2.1
module load cuda-cudnn/11.1-8.2.1
##- virtualenv
# source xxxxx/activate
echo $(module list) # list modules loaded
echo $(which gcc)
echo $(which python)
echo $(which python3)
cluster-quota # nas quota
nvidia-smi --format=csv --query-gpu=name,driver_version,power.limit # gpu info
#- Warning! Please not change your CUDA_VISIBLE_DEVICES
#- in `.bashrc`, `env.sh`, or your job script
echo "Use GPU ${CUDA_VISIBLE_DEVICES}" # which gpus
#- The CUDA_VISIBLE_DEVICES variable is assigned and specified by SLURM
#- Job step
# [EDIT HERE(TODO)]
python ptq_weightratio.py
#- End
echo "Job end at $(date "+%Y-%m-%d %H:%M:%S")"
from model import *
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torchvision.transforms.functional import InterpolationMode
import os
import os.path as osp
def train(model, device, train_loader, optimizer, epoch):
model.train()
lossLayer = torch.nn.CrossEntropyLoss()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = lossLayer(output, target)
loss.backward()
optimizer.step()
if batch_idx % 50 == 0:
print('Train Epoch: {} [{}/{}]\tLoss: {:.6f}'.format(
epoch, batch_idx * len(data), len(train_loader.dataset), loss.item()
))
def test(model, device, test_loader):
model.eval()
test_loss = 0
correct = 0
lossLayer = torch.nn.CrossEntropyLoss(reduction='sum')
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
test_loss += lossLayer(output, target).item()
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)
print('\nTest set: Average loss: {:.4f}, Accuracy: {:.2f}%\n'.format(
test_loss, 100. * correct / len(test_loader.dataset)
))
epochs_cfg_table = {
'AlexNet' : [20, 30, 20, 20, 10],
'AlexNet_BN' : [15, 20, 20, 20, 10, 10],
'VGG_16' : [25, 30, 30, 20, 20, 10, 10],
'VGG_19' : [30, 40, 30, 20, 20, 10, 10],
'Inception_BN' : [20, 30, 30, 20, 20, 10, 10]
}
lr_cfg_table = {
'AlexNet' : [0.01, 0.005, 0.001, 0.0005, 0.0001],
'AlexNet_BN' : [0.01, 0.005, 0.002, 0.001, 0.0005, 0.0001],
'VGG_16' : [0.01, 0.008, 0.005, 0.002, 0.001, 0.0005, 0.0001],
'VGG_19' : [0.01, 0.008, 0.005, 0.002, 0.001, 0.0005, 0.0001],
'Inception_BN' : [0.01, 0.008, 0.005, 0.002, 0.001, 0.0005, 0.0001]
}
if __name__ == "__main__":
batch_size = 32
seed = 1
momentum = 0.5
save_model = True
append = True
torch.manual_seed(seed)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
train_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('../data', train=True, download=True,
transform=transforms.Compose([
transforms.Resize((32, 32), interpolation=InterpolationMode.BICUBIC),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
])),
batch_size=batch_size, shuffle=True, num_workers=1, pin_memory=True
)
test_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('../data', train=False, transform=transforms.Compose([
transforms.Resize((32, 32), interpolation=InterpolationMode.BICUBIC),
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
])),
batch_size=batch_size, shuffle=True, num_workers=1, pin_memory=True
)
if save_model:
if not osp.exists('ckpt'):
os.makedirs('ckpt')
model_name_list = ['AlexNet', 'AlexNet_BN', 'VGG_16', 'VGG_19', 'Inception_BN']
for model_name in model_name_list:
save_path = 'ckpt/cifar10_'+model_name+'.pt'
if os.path.exists(save_path) and append:
continue
model = Model(model_name).to(device)
epoch_start = 1
epochs_cfg = epochs_cfg_table[model_name]
lr_cfg = lr_cfg_table[model_name]
for epochs,lr in zip(epochs_cfg,lr_cfg):
optimizer = optim.SGD(model.parameters(), lr=lr, momentum=momentum)
epoch_end = epoch_start+epochs
for epoch in range(epoch_start,epoch_end):
train(model, device, train_loader, optimizer, epoch)
test(model, device, test_loader)
epoch_start += epochs
if save_model:
torch.save(model.state_dict(), save_path)
\ No newline at end of file
#!/bin/bash
#- Job parameters
# (TODO)
# Please modify job name
#SBATCH -J ALL # The job name
#SBATCH -o ret/ret-%j.out # Write the standard output to file named 'ret-<job_number>.out'
#SBATCH -e ret/ret-%j.err # Write the standard error to file named 'ret-<job_number>.err'
#- Resources
# (TODO)
# Please modify your requirements
#SBATCH -p nv-gpu # Submit to 'nv-gpu' Partitiion
#SBATCH -t 3-00:00:00 # Run for a maximum time of 0 days, 12 hours, 00 mins, 00 secs
#SBATCH --nodes=1 # Request N nodes
#SBATCH --gres=gpu:1 # Request M GPU per node
#SBATCH --gres-flags=enforce-binding # CPU-GPU Affinity
#SBATCH --qos=gpu-long # Request QOS Type
###
### The system will alloc 8 or 16 cores per gpu by default.
### If you need more or less, use following:
### #SBATCH --cpus-per-task=K # Request K cores
###
###
### Without specifying the constraint, any available nodes that meet the requirement will be allocated
### You can specify the characteristics of the compute nodes, and even the names of the compute nodes
###
### #SBATCH --nodelist=gpu-v00 # Request a specific list of hosts
### #SBATCH --constraint="Volta|RTX8000" # Request GPU Type: Volta(V100 or V100S) or RTX8000
###
# set constraint for RTX8000 to meet my cuda
#SBATCH --constraint="Ampere|RTX8000|T4"
#- Log information
echo "Job start at $(date "+%Y-%m-%d %H:%M:%S")"
echo "Job run at:"
echo "$(hostnamectl)"
#- Load environments
source /tools/module_env.sh
module list # list modules loaded
##- Tools
module load cluster-tools/v1.0
module load slurm-tools/v1.0
module load cmake/3.15.7
module load git/2.17.1
module load vim/8.1.2424
##- language
module load python3/3.6.8
##- CUDA
# module load cuda-cudnn/10.2-7.6.5
# module load cuda-cudnn/11.2-8.2.1
module load cuda-cudnn/11.1-8.2.1
##- virtualenv
# source xxxxx/activate
echo $(module list) # list modules loaded
echo $(which gcc)
echo $(which python)
echo $(which python3)
cluster-quota # nas quota
nvidia-smi --format=csv --query-gpu=name,driver_version,power.limit # gpu info
#- Warning! Please not change your CUDA_VISIBLE_DEVICES
#- in `.bashrc`, `env.sh`, or your job script
echo "Use GPU ${CUDA_VISIBLE_DEVICES}" # which gpus
#- The CUDA_VISIBLE_DEVICES variable is assigned and specified by SLURM
#- Job step
# [EDIT HERE(TODO)]
python train.py
#- End
echo "Job end at $(date "+%Y-%m-%d %H:%M:%S")"
import torch
import torch.nn as nn
def ebit_list(quant_type, num_bits):
if quant_type == 'FLOAT':
e_bit_list = list(range(1,num_bits-1))
else:
e_bit_list = [0]
return e_bit_list
def numbit_list(quant_type):
if quant_type == 'INT':
num_bit_list = list(range(2,17))
# num_bit_list = [4,5]
elif quant_type == 'POT':
num_bit_list = list(range(2,9))
# num_bit_list = [5]
else:
num_bit_list = list(range(2,9))
# num_bit_list = [8]
return num_bit_list
def build_bias_list(quant_type):
if quant_type == 'POT':
return build_pot_list(8)
else:
return build_float_list(16,7)
def build_list(quant_type, num_bits, e_bits):
if quant_type == 'POT':
return build_pot_list(num_bits)
else:
return build_float_list(num_bits,e_bits)
def build_pot_list(num_bits):
plist = [0.]
for i in range(-2 ** (num_bits-1) + 2, 1):
# i最高到0,即pot量化最大值为1
plist.append(2. ** i)
plist.append(-2. ** i)
plist = torch.Tensor(list(set(plist)))
# plist = plist.mul(1.0 / torch.max(plist))
return plist
def build_float_list(num_bits,e_bits):
m_bits = num_bits - 1 - e_bits
plist = [0.]
# 相邻尾数的差值
dist_m = 2 ** (-m_bits)
e = -2 ** (e_bits - 1) + 1
for m in range(1, 2 ** m_bits):
frac = m * dist_m # 尾数部分
expo = 2 ** e # 指数部分
flt = frac * expo
plist.append(flt)
plist.append(-flt)
for e in range(-2 ** (e_bits - 1) + 2, 2 ** (e_bits - 1) + 1):
expo = 2 ** e
for m in range(0, 2 ** m_bits):
frac = 1. + m * dist_m
flt = frac * expo
plist.append(flt)
plist.append(-flt)
plist = torch.Tensor(list(set(plist)))
return plist
#此处不必cfg,直接取同前缀同后缀即可。将relu一起考虑进去
def fold_ratio(layer, par_ratio, flop_ratio):
idx = -1
for name in layer:
if 'conv' in name:
conv_idx = layer.index(name)
[prefix,suffix] = name.split('conv')
bn_name = prefix+'bn'+suffix
relu_name = prefix+'relu'+suffix
if bn_name in layer:
bn_idx = layer.index(bn_name)
par_ratio[conv_idx]+=par_ratio[bn_idx]
flop_ratio[conv_idx]+=flop_ratio[bn_idx]
if relu_name in layer:
relu_idx = layer.index(relu_name)
par_ratio[conv_idx]+=par_ratio[relu_idx]
flop_ratio[conv_idx]+=flop_ratio[bn_idx]
return par_ratio,flop_ratio
def fold_model(model):
for name, module in model.named_modules():
if 'conv' in name:
[prefix,suffix] = name.split('conv')
bn_name = prefix+'bn'+suffix
if hasattr(model,bn_name):
bn_layer = getattr(model,bn_name)
fold_bn(module,bn_layer)
def fold_bn(conv, bn):
# 获取 BN 层的参数
mean = bn.running_mean
var = bn.running_var
eps = bn.eps
std = torch.sqrt(var + eps)
if bn.affine:
gamma_ = bn.weight / std
weight = conv.weight * gamma_.view(conv.out_channels, 1, 1, 1)
if conv.bias is not None:
bias = gamma_ * conv.bias - gamma_ * mean + bn.bias
else:
bias = bn.bias - gamma_ * mean
else:
gamma_ = 1 / std
weight = conv.weight * gamma_
if conv.bias is not None:
bias = gamma_ * conv.bias - gamma_ * mean
else:
bias = -gamma_ * mean
# 设置新的 weight 和 bias
conv.weight.data = weight.data
if conv.bias is not None:
conv.bias.data = bias.data
else:
conv.bias = torch.nn.Parameter(bias)
...@@ -159,9 +159,10 @@ def model_quantize(model,cfg_table,quant_type,num_bits,e_bits): ...@@ -159,9 +159,10 @@ def model_quantize(model,cfg_table,quant_type,num_bits,e_bits):
# 支持选择反量化位置,进行debug。最后release时可取消 # 支持选择反量化位置,进行debug。最后release时可取消
# end_pos为-1时表示到最后才反量化,否则在i层反量化 # end_pos为-1时表示到最后才反量化,否则在i层反量化
# 增加了func='fakefreeze'
def model_utils(model,cfg_table,func,x=None): def model_utils(model,cfg_table,func,x=None):
end_flag = False end_flag = False
end_pos = 6 end_pos = -1
last_qo = None last_qo = None
for i in range(len(cfg_table)): for i in range(len(cfg_table)):
cfg = cfg_table[i] cfg = cfg_table[i]
...@@ -171,7 +172,7 @@ def model_utils(model,cfg_table,func,x=None): ...@@ -171,7 +172,7 @@ def model_utils(model,cfg_table,func,x=None):
x = last_qo.dequantize_tensor(x) x = last_qo.dequantize_tensor(x)
if cfg[0] == 'Inc': if cfg[0] == 'Inc':
if end_flag: if end_flag:
if func != 'freeze': if func == 'inference' or func == 'forward':
x = inc_forward(model,cfg[1],x) x = inc_forward(model,cfg[1],x)
continue continue
x,last_qo = inc_utils(model,cfg[1],func,x,last_qo) x,last_qo = inc_utils(model,cfg[1],func,x,last_qo)
...@@ -179,7 +180,7 @@ def model_utils(model,cfg_table,func,x=None): ...@@ -179,7 +180,7 @@ def model_utils(model,cfg_table,func,x=None):
if end_flag: if end_flag:
name = 'conv%d'%i name = 'conv%d'%i
layer = getattr(model,name) layer = getattr(model,name)
if func != 'freeze': if func == 'inference' or func == 'forward':
x = layer(x) x = layer(x)
continue continue
qname = 'q_conv%d'%i qname = 'q_conv%d'%i
...@@ -191,14 +192,16 @@ def model_utils(model,cfg_table,func,x=None): ...@@ -191,14 +192,16 @@ def model_utils(model,cfg_table,func,x=None):
if cfg[2]: if cfg[2]:
x = qlayer.qi.quantize_tensor(x) x = qlayer.qi.quantize_tensor(x)
x = qlayer.quantize_inference(x) x = qlayer.quantize_inference(x)
else: #freeze elif func == 'freeze':
qlayer.freeze(last_qo) qlayer.freeze(last_qo)
elif func == 'fakefreeze':
qlayer.fakefreeze()
last_qo = qlayer.qo last_qo = qlayer.qo
elif cfg[0] == 'R': elif cfg[0] == 'R':
if end_flag: if end_flag:
name = 'relu%d'%i name = 'relu%d'%i
layer = getattr(model,name) layer = getattr(model,name)
if func != 'freeze': if func == 'inference' or func == 'forward':
x = layer(x) x = layer(x)
continue continue
qname = 'q_relu%d'%i qname = 'q_relu%d'%i
...@@ -207,13 +210,15 @@ def model_utils(model,cfg_table,func,x=None): ...@@ -207,13 +210,15 @@ def model_utils(model,cfg_table,func,x=None):
x = qlayer(x) x = qlayer(x)
elif func == 'inference': elif func == 'inference':
x = qlayer.quantize_inference(x) x = qlayer.quantize_inference(x)
else: #freeze elif func == 'freeze':
qlayer.freeze(last_qo) qlayer.freeze(last_qo)
elif func == 'fakefreeze':
qlayer.fakefreeze()
elif cfg[0] == 'MP': elif cfg[0] == 'MP':
if end_flag: if end_flag:
name = 'pool%d'%i name = 'pool%d'%i
layer = getattr(model,name) layer = getattr(model,name)
if func != 'freeze': if func == 'inference' or func == 'forward':
x = layer(x) x = layer(x)
continue continue
qname = 'q_pool%d'%i qname = 'q_pool%d'%i
...@@ -222,13 +227,15 @@ def model_utils(model,cfg_table,func,x=None): ...@@ -222,13 +227,15 @@ def model_utils(model,cfg_table,func,x=None):
x = qlayer(x) x = qlayer(x)
elif func == 'inference': elif func == 'inference':
x = qlayer.quantize_inference(x) x = qlayer.quantize_inference(x)
else: #freeze elif func == 'freeze':
qlayer.freeze(last_qo) qlayer.freeze(last_qo)
elif func == 'fakefreeze':
qlayer.fakefreeze()
elif cfg[0] == 'AAP': elif cfg[0] == 'AAP':
if end_flag: if end_flag:
name = 'aap%d'%i name = 'aap%d'%i
layer = getattr(model,name) layer = getattr(model,name)
if func != 'freeze': if func == 'inference' or func == 'forward':
x = layer(x) x = layer(x)
continue continue
qname = 'q_aap%d'%i qname = 'q_aap%d'%i
...@@ -237,11 +244,13 @@ def model_utils(model,cfg_table,func,x=None): ...@@ -237,11 +244,13 @@ def model_utils(model,cfg_table,func,x=None):
x = qlayer(x) x = qlayer(x)
elif func == 'inference': elif func == 'inference':
x = qlayer.quantize_inference(x) x = qlayer.quantize_inference(x)
else: #freeze elif func == 'freeze':
qlayer.freeze(last_qo) qlayer.freeze(last_qo)
elif func == 'fakefreeze':
qlayer.fakefreeze()
last_qo = qlayer.qo last_qo = qlayer.qo
elif cfg[0] == 'F': elif cfg[0] == 'F':
if func != 'freeze': if func == 'inference' or func == 'forward':
x = torch.flatten(x,start_dim=1) x = torch.flatten(x,start_dim=1)
if func == 'inference' and not end_flag: if func == 'inference' and not end_flag:
...@@ -351,8 +360,10 @@ def inc_utils(model,inc_idx,func,x=None,qo=None): ...@@ -351,8 +360,10 @@ def inc_utils(model,inc_idx,func,x=None,qo=None):
tmp = qlayer(tmp) tmp = qlayer(tmp)
elif func == 'inference': elif func == 'inference':
tmp = qlayer.quantize_inference(tmp) tmp = qlayer.quantize_inference(tmp)
else: #freeze elif func == 'freeze':
qlayer.freeze(last_qo) qlayer.freeze(last_qo)
elif func == 'fakefreeze':
qlayer.fakefreeze()
elif cfg[0] == 'R': elif cfg[0] == 'R':
qname = qprefix+'relu%d'%j qname = qprefix+'relu%d'%j
qlayer = getattr(model,qname) qlayer = getattr(model,qname)
...@@ -360,8 +371,10 @@ def inc_utils(model,inc_idx,func,x=None,qo=None): ...@@ -360,8 +371,10 @@ def inc_utils(model,inc_idx,func,x=None,qo=None):
tmp = qlayer(tmp) tmp = qlayer(tmp)
elif func == 'inference': elif func == 'inference':
tmp = qlayer.quantize_inference(tmp) tmp = qlayer.quantize_inference(tmp)
else: #freeze elif func == 'freeze':
qlayer.freeze(last_qo) qlayer.freeze(last_qo)
elif func == 'fakefreeze':
qlayer.fakefreeze()
else: else:
qname = qprefix+'conv%d'%j qname = qprefix+'conv%d'%j
qlayer = getattr(model,qname) qlayer = getattr(model,qname)
...@@ -369,8 +382,10 @@ def inc_utils(model,inc_idx,func,x=None,qo=None): ...@@ -369,8 +382,10 @@ def inc_utils(model,inc_idx,func,x=None,qo=None):
tmp = qlayer(tmp) tmp = qlayer(tmp)
elif func == 'inference': elif func == 'inference':
tmp = qlayer.quantize_inference(tmp) tmp = qlayer.quantize_inference(tmp)
else: #freeze elif func == 'freeze':
qlayer.freeze(last_qo) qlayer.freeze(last_qo)
elif func == 'fakefreeze':
qlayer.fakefreeze()
last_qo = qlayer.qo last_qo = qlayer.qo
outs.append(tmp) outs.append(tmp)
...@@ -411,6 +426,8 @@ class Inception_BN(nn.Module): ...@@ -411,6 +426,8 @@ class Inception_BN(nn.Module):
def quantize_inference(self,x): def quantize_inference(self,x):
return model_utils(self,self.cfg_table,func='inference',x=x) return model_utils(self,self.cfg_table,func='inference',x=x)
def fakefreeze(self):
model_utils(self,self.cfg_table,func='fakefreeze')
if __name__ == "__main__": if __name__ == "__main__":
model = Inception_BN() model = Inception_BN()
......
...@@ -8,6 +8,50 @@ from torch.autograd import Variable ...@@ -8,6 +8,50 @@ from torch.autograd import Variable
from function import FakeQuantize from function import FakeQuantize
def mid_ratio(x):
x = x.view(-1)
std = torch.std(x)
max = 3*std#.item()
min = 3*(-std)#.item()
print("%f %f"%(max,min))
max = torch.max(torch.abs(max),torch.abs(min))
mid = max/2
cond = torch.logical_and(x>=-mid,x<=mid)
cnt = torch.sum(cond).item()
ratio = cnt/len(x)
return ratio
def pearson_corr(tensor1, tensor2):
"""
计算tensor1和tensor2的Pearson相关系数
"""
# 将tensor1和tensor2展平为二维
tensor1 = tensor1.view(-1, tensor1.size(-1))
tensor2 = tensor2.view(-1, tensor2.size(-1))
# 计算tensor1和tensor2的均值
tensor1_mean = torch.mean(tensor1, dim=0)
tensor2_mean = torch.mean(tensor2, dim=0)
# 计算centered tensor
tensor1_c = tensor1 - tensor1_mean
tensor2_c = tensor2 - tensor2_mean
# 计算covariance matrix
cov_mat = torch.matmul(tensor1_c.t(), tensor2_c) / (tensor1.size(0) - 1)
# 计算相关系数
corr_mat = cov_mat / torch.std(tensor1, dim=0) / torch.std(tensor2, dim=0)
pearson = torch.mean(corr_mat)
return pearson.item()
def cos_sim(a,b):
a = a.view(-1)
b = b.view(-1)
cossim = torch.cosine_similarity(a, b, dim=0, eps=1e-6)
# cossim = (cossim-0.97)/(1-0.97)
return cossim.item()
def js_div(p_output, q_output, get_softmax=True): def js_div(p_output, q_output, get_softmax=True):
""" """
...@@ -176,6 +220,8 @@ class QModule(nn.Module): ...@@ -176,6 +220,8 @@ class QModule(nn.Module):
def quantize_inference(self, x): def quantize_inference(self, x):
raise NotImplementedError('quantize_inference should be implemented.') raise NotImplementedError('quantize_inference should be implemented.')
def fakefreeze(self):
pass
""" """
QModule 量化卷积 QModule 量化卷积
...@@ -227,6 +273,11 @@ class QConv2d(QModule): ...@@ -227,6 +273,11 @@ class QConv2d(QModule):
self.conv_module.bias.data, scale=self.qi.scale * self.qw.scale, self.conv_module.bias.data, scale=self.qi.scale * self.qw.scale,
zero_point=0.,qmax=self.bias_qmax, is_bias=True) zero_point=0.,qmax=self.bias_qmax, is_bias=True)
def fakefreeze(self):
self.conv_module.weight.data = self.qw.dequantize_tensor(self.conv_module.weight.data)
if self.conv_module.bias is not None:
self.conv_module.bias.data = dequantize_tensor(self.conv_module.bias.data,scale=self.qi.scale*self.qw.scale,zero_point=0.)
def forward(self, x): # 前向传播,输入张量,x为浮点型数据 def forward(self, x): # 前向传播,输入张量,x为浮点型数据
if hasattr(self, 'qi'): if hasattr(self, 'qi'):
self.qi.update(x) self.qi.update(x)
...@@ -292,6 +343,11 @@ class QLinear(QModule): ...@@ -292,6 +343,11 @@ class QLinear(QModule):
self.fc_module.bias.data, scale=self.qi.scale * self.qw.scale, self.fc_module.bias.data, scale=self.qi.scale * self.qw.scale,
zero_point=0., qmax=self.bias_qmax, is_bias=True) zero_point=0., qmax=self.bias_qmax, is_bias=True)
def fakefreeze(self):
self.fc_module.weight.data = self.qw.dequantize_tensor(self.fc_module.weight.data)
if self.fc_module.bias is not None:
self.fc_module.bias.data = dequantize_tensor(self.fc_module.bias.data,scale=self.qi.scale*self.qw.scale,zero_point=0.)
def forward(self, x): def forward(self, x):
if hasattr(self, 'qi'): if hasattr(self, 'qi'):
self.qi.update(x) self.qi.update(x)
...@@ -483,6 +539,10 @@ class QConvBNReLU(QModule): ...@@ -483,6 +539,10 @@ class QConvBNReLU(QModule):
zero_point=0., qmax=self.bias_qmax,is_bias=True) zero_point=0., qmax=self.bias_qmax,is_bias=True)
self.conv_module.bias = torch.nn.Parameter(bias) self.conv_module.bias = torch.nn.Parameter(bias)
def fakefreeze(self):
self.conv_module.weight.data = self.qw.dequantize_tensor(self.conv_module.weight.data)
self.conv_module.bias.data = dequantize_tensor(self.conv_module.bias.data,scale=self.qi.scale*self.qw.scale,zero_point=0.)
def forward(self, x): def forward(self, x):
if hasattr(self, 'qi'): if hasattr(self, 'qi'):
...@@ -601,6 +661,10 @@ class QConvBN(QModule): ...@@ -601,6 +661,10 @@ class QConvBN(QModule):
zero_point=0., qmax=self.bias_qmax,is_bias=True) zero_point=0., qmax=self.bias_qmax,is_bias=True)
self.conv_module.bias = torch.nn.Parameter(bias) self.conv_module.bias = torch.nn.Parameter(bias)
def fakefreeze(self):
self.conv_module.weight.data = self.qw.dequantize_tensor(self.conv_module.weight.data)
self.conv_module.bias.data = dequantize_tensor(self.conv_module.bias.data,scale=self.qi.scale*self.qw.scale,zero_point=0.)
def forward(self, x): def forward(self, x):
if hasattr(self, 'qi'): if hasattr(self, 'qi'):
......
...@@ -83,7 +83,7 @@ if __name__ == "__main__": ...@@ -83,7 +83,7 @@ if __name__ == "__main__":
model.to(device) model.to(device)
load_ptq = True load_ptq = True
store_ptq = False store_ptq = True
ptq_file_prefix = 'ckpt/cifar10_Inception_BN_ptq_' ptq_file_prefix = 'ckpt/cifar10_Inception_BN_ptq_'
model.eval() model.eval()
...@@ -96,12 +96,18 @@ if __name__ == "__main__": ...@@ -96,12 +96,18 @@ if __name__ == "__main__":
full_names = [] full_names = []
full_params = [] full_params = []
# full_mid_ratios = []
# full_params_norm = []
for name, param in model.named_parameters(): for name, param in model.named_parameters():
if 'conv' in name or 'fc' in name: if 'conv' in name or 'fc' in name:
full_names.append(name) full_names.append(name)
param_norm = F.normalize(param.data.cpu(),p=2,dim=-1) # param_norm = F.normalize(param.data.cpu(),p=2,dim=-1)
full_params.append(param_norm) full_params.append(param.data.cpu())
# full_mr = mid_ratio(param.data.cpu())
# full_mid_ratios.append(full_mr)
# print(name+':%f'%full_mr)
# full_params_norm.append(param_norm)
writer.add_histogram(tag='Full_' + name + '_data', values=param.data) writer.add_histogram(tag='Full_' + name + '_data', values=param.data)
#统计每个参数对应层的参数个数 #统计每个参数对应层的参数个数
...@@ -120,9 +126,10 @@ if __name__ == "__main__": ...@@ -120,9 +126,10 @@ if __name__ == "__main__":
# input() # input()
gol._init() gol._init()
# quant_type_list = ['INT','POT','FLOAT'] quant_type_list = ['INT','POT','FLOAT']
# quant_type_list = ['INT'] # quant_type_list = ['INT']
quant_type_list = ['POT'] # quant_type_list = ['POT']
# quant_type_list = ['INT','POT']
title_list = [] title_list = []
js_flops_list = [] js_flops_list = []
js_param_list = [] js_param_list = []
...@@ -151,8 +158,11 @@ if __name__ == "__main__": ...@@ -151,8 +158,11 @@ if __name__ == "__main__":
# 设置量化表 # 设置量化表
if quant_type != 'INT': if quant_type != 'INT':
plist = build_list(quant_type, num_bits, e_bits) plist = build_list(quant_type, num_bits, e_bits)
# list_mid_ratio = mid_ratio(plist)
gol.set_value(plist) gol.set_value(plist)
# else:
# list_mid_ratio = 0.5
# print('list_mid_ratio:%f'%list_mid_ratio)
# 判断是否需要载入 # 判断是否需要载入
if load_ptq is True and osp.exists(ptq_file_prefix + title + '.pt'): if load_ptq is True and osp.exists(ptq_file_prefix + title + '.pt'):
model_ptq.quantize(quant_type,num_bits,e_bits) model_ptq.quantize(quant_type,num_bits,e_bits)
...@@ -174,6 +184,7 @@ if __name__ == "__main__": ...@@ -174,6 +184,7 @@ if __name__ == "__main__":
acc_loss = (full_acc - ptq_acc) / full_acc acc_loss = (full_acc - ptq_acc) / full_acc
acc_loss_list.append(acc_loss) acc_loss_list.append(acc_loss)
model_ptq.fakefreeze()
# 获取计算量/参数量下的js-div # 获取计算量/参数量下的js-div
js_flops = 0. js_flops = 0.
js_param = 0. js_param = 0.
...@@ -188,10 +199,24 @@ if __name__ == "__main__": ...@@ -188,10 +199,24 @@ if __name__ == "__main__":
layer_idx = layer.index(prefix) layer_idx = layer.index(prefix)
ptq_param = param.data.cpu() ptq_param = param.data.cpu()
# 取L2范数 # 取L2范数
ptq_norm = F.normalize(ptq_param,p=2,dim=-1) # ptq_norm = F.normalize(ptq_param,p=2,dim=-1)
writer.add_histogram(tag=title +':'+ name + '_data', values=ptq_param) writer.add_histogram(tag=title +':'+ name + '_data', values=ptq_param)
js = js_div(ptq_norm,full_params[name_idx]) js = js_div(ptq_param,full_params[name_idx])
js /= full_par_num[name_idx] js /= full_par_num[name_idx]
# if 'weight' in name:
#获取量化前后分布的相关系数,范围[0,1],度量分布趋势相似度
# coeff = pearson_corr(ptq_param,full_params[name_idx])
# print(name+':%f'%coeff)
# js /= coeff
# cossim = cos_sim(ptq_param,full_params[name_idx])
# print(name+':%f'%cossim)
# js /= cossim
# mr_dist = torch.abs(torch.tensor(full_mid_ratios[name_idx]-list_mid_ratio))
# print(name+":%f"%mr_dist)
# js /= mr_dist
js = js.item() js = js.item()
if js < 0.: if js < 0.:
js = 0. js = 0.
......
title_list: title_list:
POT_2 POT_3 POT_4 POT_5 POT_6 POT_7 POT_8 INT_2 INT_3 INT_4 INT_5 INT_6 INT_7 INT_8 INT_9 INT_10 INT_11 INT_12 INT_13 INT_14 INT_15 INT_16 POT_2 POT_3 POT_4 POT_5 POT_6 POT_7 POT_8 FLOAT_3_E1 FLOAT_4_E1 FLOAT_4_E2 FLOAT_5_E1 FLOAT_5_E2 FLOAT_5_E3 FLOAT_6_E1 FLOAT_6_E2 FLOAT_6_E3 FLOAT_6_E4 FLOAT_7_E1 FLOAT_7_E2 FLOAT_7_E3 FLOAT_7_E4 FLOAT_7_E5 FLOAT_8_E1 FLOAT_8_E2 FLOAT_8_E3 FLOAT_8_E4 FLOAT_8_E5 FLOAT_8_E6
js_flops_list: js_flops_list:
import torch
import torch.nn.functional as F
def pearson_corr(tensor1, tensor2):
"""
计算tensor1和tensor2的Pearson相关系数
"""
# 将tensor1和tensor2展平为二维
tensor1 = tensor1.view(-1, tensor1.size(-1))
tensor2 = tensor2.view(-1, tensor2.size(-1))
# 计算tensor1和tensor2的均值
tensor1_mean = torch.mean(tensor1, dim=0)
tensor2_mean = torch.mean(tensor2, dim=0)
# 计算centered tensor
tensor1_c = tensor1 - tensor1_mean
tensor2_c = tensor2 - tensor2_mean
# 计算covariance matrix
cov_mat = torch.matmul(tensor1_c.t(), tensor2_c) / (tensor1.size(0) - 1)
print(cov_mat)
print('----')
# 计算相关系数
corr_mat = cov_mat / torch.std(tensor1, dim=0) / torch.std(tensor2, dim=0)
print(corr_mat)
pearson = torch.mean(corr_mat)
return pearson
def cos_sim(a,b):
a = a.view(-1)
b = b.view(-1)
cossim = torch.cosine_similarity(a, b, dim=0, eps=1e-6)
# cossim = (cossim-0.975)/(1-0.975)
return cossim
def mid_ratio(x):
x = x.view(-1)
max = torch.max(x)#.item()
min = torch.min(x)#.item()
max = torch.max(torch.abs(max),torch.abs(min))
mid = max/2
cond = torch.logical_and(x>=-mid,x<=mid)
print(cond)
cnt = torch.sum(cond).item()
print(cnt)
ratio = cnt/len(x)
print(ratio)
if __name__ == "__main__":
# 创建两个3维的tensor
x = torch.tensor([[1, 2, 3], [2, 3, 4], [3, 4, 5]])
y = torch.tensor([[2, 3, 4], [3, 4, 5], [4, 5, 6]])
z = torch.tensor([-3,-2,-1,1,2,3])
x = x.float()
y = y.float()
# x = F.normalize(x,p=2,dim=-1)
# y = F.normalize(y,p=2,dim=-1)
# 计算相关系数
# r = pearson_corr(x, y)
# r = cos_sim(x,y)
mid_ratio(y)
# 输出相关系数
# print(r)
\ No newline at end of file
import torch
from utils import *
from module import *
from torch.utils.tensorboard import SummaryWriter
def build_int_list(num_bits):
plist = [0.]
for i in range(0,2**(num_bits-1)):
# i最高到0,即pot量化最大值为1
plist.append(i)
plist.append(i)
plist = torch.Tensor(list(set(plist)))
# plist = plist.mul(1.0 / torch.max(plist))
return plist
if __name__ == "__main__":
writer = SummaryWriter(log_dir='./log')
quant_type_list = ['INT','POT','FLOAT']
for quant_type in quant_type_list:
num_bit_list = numbit_list(quant_type)
for num_bits in num_bit_list:
e_bit_list = ebit_list(quant_type,num_bits)
for e_bits in e_bit_list:
if quant_type == 'FLOAT':
title = '%s_%d_E%d' % (quant_type, num_bits, e_bits)
else:
title = '%s_%d' % (quant_type, num_bits)
print('\nPTQ: '+title)
# 设置量化表
if quant_type != 'INT':
plist = build_list(quant_type, num_bits, e_bits)
list_mid_ratio = mid_ratio(plist)
else:
plist = build_int_list(num_bits)
list_mid_ratio = mid_ratio(plist)
writer.add_histogram(tag=title, values=plist)
print(list_mid_ratio)
writer.close()
\ No newline at end of file
from torch.serialization import load
from model import *
from extract_ratio import *
from utils import *
import gol
import openpyxl
import sys
import argparse
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torchvision.transforms.functional import InterpolationMode
import torch.utils.bottleneck as bn
import os
import os.path as osp
from torch.utils.tensorboard import SummaryWriter
def direct_quantize(model, test_loader,device):
for i, (data, target) in enumerate(test_loader, 1):
data = data.to(device)
output = model.quantize_forward(data).cpu()
if i % 500 == 0:
break
print('direct quantization finish')
def full_inference(model, test_loader, device):
correct = 0
for i, (data, target) in enumerate(test_loader, 1):
data = data.to(device)
output = model(data).cpu()
pred = output.argmax(dim=1, keepdim=True)
# print(pred)
correct += pred.eq(target.view_as(pred)).sum().item()
print('\nTest set: Full Model Accuracy: {:.2f}%'.format(100. * correct / len(test_loader.dataset)))
return 100. * correct / len(test_loader.dataset)
def quantize_inference(model, test_loader, device):
correct = 0
for i, (data, target) in enumerate(test_loader, 1):
data = data.to(device)
output = model.quantize_inference(data).cpu()
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
print('Test set: Quant Model Accuracy: {:.2f}%'.format(100. * correct / len(test_loader.dataset)))
return 100. * correct / len(test_loader.dataset)
if __name__ == "__main__":
batch_size = 32
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(device)
model = Inception_BN()
writer = SummaryWriter(log_dir='./log')
full_file = 'ckpt/cifar10_Inception_BN.pt'
model.load_state_dict(torch.load(full_file))
model.to(device)
load_ptq = True
store_ptq = False
ptq_file_prefix = 'ckpt/cifar10_Inception_BN_ptq_'
model.eval()
# 传入后可变
fold_model(model)
layer, par_ratio, flop_ratio = extract_ratio()
par_ratio, flop_ratio = fold_ratio(layer, par_ratio, flop_ratio)
full_names = []
full_params = []
full_mid_ratios = []
full_params_norm = []
for name, param in model.named_parameters():
if 'conv' in name or 'fc' in name:
full_names.append(name)
param_norm = F.normalize(param.data.cpu(),p=2,dim=-1)
full_params.append(param.data.cpu())
full_mr = mid_ratio(param.data.cpu())
full_mid_ratios.append(full_mr)
print(name+':%f'%full_mr)
full_params_norm.append(param_norm)
writer.add_histogram(tag='Full_' + name + '_data', values=param.data)
\ No newline at end of file
from torch.serialization import load
from model import *
from extract_ratio import *
from utils import *
import gol
import openpyxl
import sys
import argparse
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torchvision.transforms.functional import InterpolationMode
import torch.utils.bottleneck as bn
import os
import os.path as osp
from torch.utils.tensorboard import SummaryWriter
def build_int_list(num_bits):
plist = [0.]
for i in range(0,2**(num_bits-1)):
# i最高到0,即pot量化最大值为1
plist.append(i)
plist.append(i)
plist = torch.Tensor(list(set(plist)))
# plist = plist.mul(1.0 / torch.max(plist))
return plist
def js_div_diff(p_output, q_output, get_softmax=True):
"""
Function that measures JS divergence between target and output logits:
"""
KLDivLoss = nn.KLDivLoss(reduction='sum')
if get_softmax:
p_output = F.softmax(p_output)
q_output = F.softmax(q_output)
p_output = p_output.view(-1)
q_output = q_output.view(-1)
# log_mean_output = ((p_output + q_output)/2).log()
log_mean_output = 0.
log_mean_output += p_output.log()
log_mean_output += (q_output / q_output.size(0)).log()
log_mean_output /= 2.0
return (KLDivLoss(log_mean_output, p_output) + KLDivLoss(log_mean_output, q_output))/2
if __name__ == "__main__":
batch_size = 32
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(device)
model = Inception_BN()
full_file = 'ckpt/cifar10_Inception_BN.pt'
model.load_state_dict(torch.load(full_file))
model.to(device)
model.eval()
# 传入后可变
fold_model(model)
layer, par_ratio, flop_ratio = extract_ratio()
par_ratio, flop_ratio = fold_ratio(layer, par_ratio, flop_ratio)
full_names = []
full_params = []
for name, param in model.named_parameters():
if 'conv' in name or 'fc' in name:
full_names.append(name)
param_norm = F.normalize(param.data.cpu(),p=2,dim=-1)
full_params.append(param_norm)
#统计每个参数对应层的参数个数
full_par_num=[]
for name in full_names:
prefix = name.rsplit('.',1)[0]
cnt = 0
for str in full_names:
sprefix = str.rsplit('.',1)[0]
if prefix == sprefix:
cnt += 1
full_par_num.append(cnt)
gol._init()
# quant_type_list = ['INT','POT','FLOAT']
# quant_type_list = ['INT']
# quant_type_list = ['POT']
quant_type_list = ['INT','POT']
title_list = []
js_flops_list = []
js_param_list = []
for quant_type in quant_type_list:
num_bit_list = numbit_list(quant_type)
# 对一个量化类别,只需设置一次bias量化表
# int由于位宽大,使用量化表开销过大,直接_round即可
if quant_type != 'INT':
bias_list = build_bias_list(quant_type)
gol.set_value(bias_list, is_bias=True)
else:
bias_list = build_int_list(16)
gol.set_value(bias_list, is_bias=True)
for num_bits in num_bit_list:
e_bit_list = ebit_list(quant_type,num_bits)
for e_bits in e_bit_list:
if quant_type == 'FLOAT':
title = '%s_%d_E%d' % (quant_type, num_bits, e_bits)
else:
title = '%s_%d' % (quant_type, num_bits)
print('\nDIFF: '+title)
title_list.append(title)
# 设置量化表
if quant_type != 'INT':
plist = build_list(quant_type, num_bits, e_bits)
gol.set_value(plist)
else:
plist = build_int_list(num_bits)
gol.set_value(plist)
# 获取计算量/参数量下的js-div
js_flops = 0.
js_param = 0.
for name, _ in model.named_parameters():
if 'conv' not in name and 'fc' not in name:
continue
if 'weight' in name:
plist = gol.get_value(False)
else:
plist = gol.get_value(True)
prefix = name.rsplit('.',1)[0]
layer_idx = layer.index(prefix)
name_idx = full_names.index(name)
layer_idx = layer.index(prefix)
# ptq_param = param.data.cpu()
# 取L2范数
plist_norm = F.normalize(plist,p=2,dim=-1)
print(name)
print(plist_norm)
print(full_params[name_idx])
js = js_div_diff(plist_norm,full_params[name_idx])
js /= full_par_num[name_idx]
js = js.item()
if js < 0.:
js = 0.
js_flops = js_flops + js * flop_ratio[layer_idx]
js_param = js_param + js * par_ratio[layer_idx]
js_flops_list.append(js_flops)
js_param_list.append(js_param)
print(title + ': js_flops: %f js_param: %f ' % (js_flops, js_param))
# 写入xlsx
workbook = openpyxl.Workbook()
worksheet = workbook.active
worksheet.cell(row=3,column=1,value='title')
worksheet.cell(row=3,column=2,value='js_flops')
worksheet.cell(row=3,column=3,value='js_param')
for i in range(len(title_list)):
worksheet.cell(row=i+4, column=1, value=title_list[i])
worksheet.cell(row=i+4, column=2, value=js_flops_list[i])
worksheet.cell(row=i+4, column=3, value=js_param_list[i])
ft = open('diff_result.txt','w')
print('title_list:',file=ft)
print(" ".join(title_list),file=ft)
print('js_flops_list:',file=ft)
print(" ".join(str(i) for i in js_flops_list), file=ft)
print('js_param_list:',file=ft)
print(" ".join(str(i) for i in js_param_list), file=ft)
ft.close()
...@@ -12,8 +12,10 @@ def ebit_list(quant_type, num_bits): ...@@ -12,8 +12,10 @@ def ebit_list(quant_type, num_bits):
def numbit_list(quant_type): def numbit_list(quant_type):
if quant_type == 'INT': if quant_type == 'INT':
num_bit_list = list(range(2,17)) num_bit_list = list(range(2,17))
# num_bit_list = [4,5]
elif quant_type == 'POT': elif quant_type == 'POT':
num_bit_list = list(range(2,9)) num_bit_list = list(range(2,9))
# num_bit_list = [5]
else: else:
num_bit_list = list(range(2,9)) num_bit_list = list(range(2,9))
# num_bit_list = [8] # num_bit_list = [8]
......
% 导入数据表 % 导入数据表
file_data = xlsread('D:\Desktop\ptq_result.xlsx','Sheet','B4:E46'); file_data = xlsread('D:\Desktop\ptq_result.xlsx','Inception_BN','B4:E46');
js_flops = file_data(:,1); js_flops = file_data(:,1);
js_param = file_data(:,2); js_param = file_data(:,2);
ptq_acc = file_data(:,3); ptq_acc = file_data(:,3);
acc_loss = file_data(:,4); acc_loss = file_data(:,4);
x = js_flops;
y = acc_loss;
% 定义颜色向量和每个数据点所属的类别 % 定义颜色向量和每个数据点所属的类别
colors = ['r', 'g', 'm']; colors = ['r', 'g', 'm'];
class = [ones(16,1); 2*ones(6,1); 3*ones(21,1)]; class = [ones(16,1); 2*ones(6,1); 3*ones(21,1)];
...@@ -12,15 +15,25 @@ class = [ones(16,1); 2*ones(6,1); 3*ones(21,1)]; ...@@ -12,15 +15,25 @@ class = [ones(16,1); 2*ones(6,1); 3*ones(21,1)];
% 指定拟合模型 % 指定拟合模型
rational_model = fittype('(p1*js_flops.^2 + p2*js_flops + p3) / (q1*js_flops.^2 + q2*js_flops + q3)', 'independent', 'js_flops', 'coefficients', {'p1', 'p2', 'p3', 'q1', 'q2', 'q3'}); rational_model = fittype('(p1*js_flops.^2 + p2*js_flops + p3) / (q1*js_flops.^2 + q2*js_flops + q3)', 'independent', 'js_flops', 'coefficients', {'p1', 'p2', 'p3', 'q1', 'q2', 'q3'});
% 进行拟合 %初次拟合
[fitresult,gof] = fit(js_flops, acc_loss, rational_model); [fitresult, gof] = fit(x, y, rational_model);
% 确保拟合结果是单调上升的
tolerance = 0;
x_range = min(x):0.1:max(x);
y_fit = fitresult(x_range);
while any(diff(y_fit) < tolerance)
% 如果拟合曲线不是单调上升的,重新拟合
[fitresult, gof] = fit(x, y, rational_model);
y_fit = fitresult(x_range);
end
% 可视化数据点和拟合曲线 % 可视化数据点和拟合曲线
scatter(js_flops(1:15), acc_loss(1:15), [], colors(1), 'filled'); scatter(x(1:15), y(1:15), [], colors(1), 'filled');
hold on; hold on;
scatter(js_flops(16:22), acc_loss(16:22), [], colors(2), 'filled'); scatter(x(16:22), y(16:22), [], colors(2), 'filled');
scatter(js_flops(23:43), acc_loss(23:43), [], colors(3), 'filled'); scatter(x(23:43), y(23:43), [], colors(3), 'filled');
plot(fitresult,'k',js_flops,acc_loss); plot(fitresult,'k',x,y);
xlabel('js\_flops'); xlabel('js\_flops');
ylabel('acc\_loss'); ylabel('acc\_loss');
legend('INT', 'POT', 'FLOAT','ALL', 'Rational-Fit', 'Location', 'Northeast'); legend('INT', 'POT', 'FLOAT','ALL', 'Rational-Fit', 'Location', 'Northeast');
......
% 导入数据表
data0 = xlsread('D:\Desktop\ptq_result.xlsx','AlexNet','B4:E46');
data1 = xlsread('D:\Desktop\ptq_result.xlsx','AlexNet_BN','B4:E46');
data2 = xlsread('D:\Desktop\ptq_result.xlsx','VGG_16','B4:E46');
data3 = xlsread('D:\Desktop\ptq_result.xlsx','VGG_19','B4:E46');
data4 = xlsread('D:\Desktop\ptq_result.xlsx','Inception_BN','B4:E46');
file_data = vertcat(data0,data1,data2,data3,data4)
js_flops = file_data(:,1);
js_param = file_data(:,2);
ptq_acc = file_data(:,3);
acc_loss = file_data(:,4);
% 指定横纵坐标及多项式次数
x = js_flops;
y = acc_loss;
poly = 4;
% 定义颜色向量和每个数据点所属的类别
colors = ['r', 'g', 'b','m','o'];
class = [ones(43,1); 2*ones(43,1); 3*ones(43,1);4*ones(43,1);5*ones(43,1);];
% 指定拟合模型
if poly == 2
rational_model = fittype('(p1*x.^2 + p2*x + p3) / (q1*x.^2 + q2*x + q3)', 'independent', 'x', 'coefficients', {'p1', 'p2', 'p3', 'q1', 'q2', 'q3'});
elseif poly == 3
rational_model = fittype('(p1*x.^3 + p2*x.^2 + p3*x + p4) / (q1*x.^3 + q2*x.^2 + q3*x + q4)', 'independent', 'x', 'coefficients', {'p1', 'p2', 'p3','p4', 'q1', 'q2', 'q3','q4'});
elseif poly == 4
rational_model = fittype('(p0*x.^4 + p1*x.^3 + p2*x.^2 + p3*x + p4) / (q0*x.^4 + q1*x.^3 + q2*x.^2 + q3*x + q4)', 'independent', 'x', 'coefficients', {'p0', 'p1', 'p2', 'p3','p4','q0', 'q1', 'q2', 'q3','q4'});
end
%初次拟合
[fitresult, gof] = fit(x, y, rational_model);
% 确保拟合结果是单调上升的
tolerance = 0;
x_range = min(x):0.1:max(x);
y_fit = fitresult(x_range);
while any(diff(y_fit) < tolerance)
% 如果拟合曲线不是单调上升的,重新拟合
[fitresult, gof] = fit(x, y, rational_model);
y_fit = fitresult(x_range);
end
% 可视化数据点和拟合曲线
scatter(x(1:43), y(1:43), [], colors(1), 'filled');
hold on;
scatter(x(44:86), y(44:86), [], colors(2), 'filled');
scatter(x(86:129), y(86:129), [], colors(3), 'filled');
scatter(x(130:172), y(130:172), [], colors(4), 'filled');
scatter(x(173:215), y(173:215), [], colors(5), 'filled');
plot(fitresult,'k',x,y);
xlabel('js\_flops');
ylabel('acc\_loss')
legend('AlexNet', 'AlexNet\_BN','VGG\_16','VGG\_19','Inception\_BN','ALL', 'Rational-Fit', 'Location', 'Northeast');
% 获取评价指标
SSE=gof.sse;
R_square = gof.rsquare;
RMSE = gof.rmse;
% 将拟合公式和 R 方显示在图上
text(0.65, 0.2, sprintf('Goodness of fit:\n SSE:%.4f\n R-square:%.4f\n RMSE:%.4f', SSE, R_square, RMSE), 'Units', 'normalized', 'FontSize', 11);
hold off;
\ No newline at end of file
% 导入数据表
data0 = xlsread('D:\Desktop\ptq_result_weighted.xlsx','AlexNet','B4:F46');
data1 = xlsread('D:\Desktop\ptq_result_weighted.xlsx','AlexNet_BN','B4:F46');
data2 = xlsread('D:\Desktop\ptq_result_weighted.xlsx','VGG_16','B4:F46');
data3 = xlsread('D:\Desktop\ptq_result_weighted.xlsx','VGG_19','B4:F46');
data4 = xlsread('D:\Desktop\ptq_result_weighted.xlsx','Inception_BN','B4:F46');
file_data = vertcat(data0,data1,data2,data3,data4)
js_flops = file_data(:,1);
js_flops_weighted = file_data(:,2);
js_param = file_data(:,3);
%js_param_weighted = file_data(:,4);
ptq_acc = file_data(:,4);
acc_loss = file_data(:,5);
% 指定横纵坐标及多项式次数
x = js_flops_weighted;
y = acc_loss;
poly = 4;
% 定义颜色向量和每个数据点所属的类别
colors = ['r', 'g', 'b','m','o'];
class = [ones(43,1); 2*ones(43,1); 3*ones(43,1);4*ones(43,1);5*ones(43,1);];
% 指定拟合模型
if poly == 2
rational_model = fittype('(p1*x.^2 + p2*x + p3) / (q1*x.^2 + q2*x + q3)', 'independent', 'x', 'coefficients', {'p1', 'p2', 'p3', 'q1', 'q2', 'q3'});
elseif poly == 3
rational_model = fittype('(p1*x.^3 + p2*x.^2 + p3*x + p4) / (q1*x.^3 + q2*x.^2 + q3*x + q4)', 'independent', 'x', 'coefficients', {'p1', 'p2', 'p3','p4', 'q1', 'q2', 'q3','q4'});
elseif poly == 4
rational_model = fittype('(p0*x.^4 + p1*x.^3 + p2*x.^2 + p3*x + p4) / (q0*x.^4 + q1*x.^3 + q2*x.^2 + q3*x + q4)', 'independent', 'x', 'coefficients', {'p0', 'p1', 'p2', 'p3','p4','q0', 'q1', 'q2', 'q3','q4'});
end
%初次拟合
[fitresult, gof] = fit(x, y, rational_model);
% 确保拟合结果是单调上升的
tolerance = 0;
x_range = min(x):0.1:max(x);
y_fit = fitresult(x_range);
while any(diff(y_fit) < tolerance)
% 如果拟合曲线不是单调上升的,重新拟合
[fitresult, gof] = fit(x, y, rational_model);
y_fit = fitresult(x_range);
end
% 可视化数据点和拟合曲线
scatter(x(1:43), y(1:43), [], colors(1), 'filled');
hold on;
scatter(x(44:86), y(44:86), [], colors(2), 'filled');
scatter(x(86:129), y(86:129), [], colors(3), 'filled');
scatter(x(130:172), y(130:172), [], colors(4), 'filled');
scatter(x(173:215), y(173:215), [], colors(5), 'filled');
plot(fitresult,'k',x,y);
xlabel('js\_flops\_weighted');
ylabel('acc\_loss')
legend('AlexNet', 'AlexNet\_BN','VGG\_16','VGG\_19','Inception\_BN','ALL', 'Rational-Fit', 'Location', 'Northeast');
% 获取评价指标
SSE=gof.sse;
R_square = gof.rsquare;
RMSE = gof.rmse;
% 将拟合公式和 R 方显示在图上
text(0.65, 0.2, sprintf('Goodness of fit:\n SSE:%.4f\n R-square:%.4f\n RMSE:%.4f', SSE, R_square, RMSE), 'Units', 'normalized', 'FontSize', 11);
hold off;
\ No newline at end of file
% 导入数据表 % 导入数据表
file_data = xlsread('D:\Desktop\ptq_result.xlsx','Sheet','B4:E46'); file_data = xlsread('D:\Desktop\ptq_result.xlsx','Inception_BN','B4:E46');
js_flops = file_data(:,1); js_flops = file_data(:,1);
js_param = file_data(:,2); js_param = file_data(:,2);
ptq_acc = file_data(:,3); ptq_acc = file_data(:,3);
acc_loss = file_data(:,4); acc_loss = file_data(:,4);
x = js_param;
y = acc_loss;
% 定义颜色向量和每个数据点所属的类别 % 定义颜色向量和每个数据点所属的类别
colors = ['r', 'g', 'm']; colors = ['r', 'g', 'm'];
class = [ones(16,1); 2*ones(6,1); 3*ones(21,1)]; class = [ones(16,1); 2*ones(6,1); 3*ones(21,1)];
...@@ -12,15 +15,25 @@ class = [ones(16,1); 2*ones(6,1); 3*ones(21,1)]; ...@@ -12,15 +15,25 @@ class = [ones(16,1); 2*ones(6,1); 3*ones(21,1)];
% 指定拟合模型 % 指定拟合模型
rational_model = fittype('(p1*js_flops.^2 + p2*js_flops + p3) / (q1*js_flops.^2 + q2*js_flops + q3)', 'independent', 'js_flops', 'coefficients', {'p1', 'p2', 'p3', 'q1', 'q2', 'q3'}); rational_model = fittype('(p1*js_flops.^2 + p2*js_flops + p3) / (q1*js_flops.^2 + q2*js_flops + q3)', 'independent', 'js_flops', 'coefficients', {'p1', 'p2', 'p3', 'q1', 'q2', 'q3'});
% 进行拟合 %初次拟合
[fitresult,gof] = fit(js_param, acc_loss, rational_model); [fitresult, gof] = fit(x, y, rational_model);
% 确保拟合结果是单调上升的
tolerance = 0;
x_range = min(x):0.1:max(x);
y_fit = fitresult(x_range);
while any(diff(y_fit) < tolerance)
% 如果拟合曲线不是单调上升的,重新拟合
[fitresult, gof] = fit(x, y, rational_model);
y_fit = fitresult(x_range);
end
% 可视化数据点和拟合曲线 % 可视化数据点和拟合曲线
scatter(js_param(1:15), acc_loss(1:15), [], colors(1), 'filled'); scatter(x(1:15), y(1:15), [], colors(1), 'filled');
hold on; hold on;
scatter(js_param(16:22), acc_loss(16:22), [], colors(2), 'filled'); scatter(x(16:22), y(16:22), [], colors(2), 'filled');
scatter(js_param(23:43), acc_loss(23:43), [], colors(3), 'filled'); scatter(x(23:43), y(23:43), [], colors(3), 'filled');
plot(fitresult,'k',js_param,acc_loss); plot(fitresult,'k',x,y);
xlabel('js\_param'); xlabel('js\_param');
ylabel('acc\_loss'); ylabel('acc\_loss');
legend('INT', 'POT', 'FLOAT','ALL', 'Rational-Fit', 'Location', 'Northeast'); legend('INT', 'POT', 'FLOAT','ALL', 'Rational-Fit', 'Location', 'Northeast');
......
% 导入数据表
data0 = xlsread('D:\Desktop\ptq_result.xlsx','AlexNet','B4:E46');
data1 = xlsread('D:\Desktop\ptq_result.xlsx','AlexNet_BN','B4:E46');
data2 = xlsread('D:\Desktop\ptq_result.xlsx','VGG_16','B4:E46');
data3 = xlsread('D:\Desktop\ptq_result.xlsx','VGG_19','B4:E46');
data4 = xlsread('D:\Desktop\ptq_result.xlsx','Inception_BN','B4:E46');
file_data = vertcat(data0,data1,data2,data3,data4)
js_flops = file_data(:,1);
js_param = file_data(:,2);
ptq_acc = file_data(:,3);
acc_loss = file_data(:,4);
% 指定横纵坐标及多项式次数
x = js_param;
y = acc_loss;
poly = 4;
% 定义颜色向量和每个数据点所属的类别
colors = ['r', 'g', 'b','m','o'];
class = [ones(43,1); 2*ones(43,1); 3*ones(43,1);4*ones(43,1);5*ones(43,1);];
% 指定拟合模型
if poly == 2
rational_model = fittype('(p1*x.^2 + p2*x + p3) / (q1*x.^2 + q2*x + q3)', 'independent', 'x', 'coefficients', {'p1', 'p2', 'p3', 'q1', 'q2', 'q3'});
elseif poly == 3
rational_model = fittype('(p1*x.^3 + p2*x.^2 + p3*x + p4) / (q1*x.^3 + q2*x.^2 + q3*x + q4)', 'independent', 'x', 'coefficients', {'p1', 'p2', 'p3','p4', 'q1', 'q2', 'q3','q4'});
elseif poly == 4
rational_model = fittype('(p0*x.^4 + p1*x.^3 + p2*x.^2 + p3*x + p4) / (q0*x.^4 + q1*x.^3 + q2*x.^2 + q3*x + q4)', 'independent', 'x', 'coefficients', {'p0', 'p1', 'p2', 'p3','p4','q0', 'q1', 'q2', 'q3','q4'});
end
%初次拟合
[fitresult, gof] = fit(x, y, rational_model);
% 确保拟合结果是单调上升的
tolerance = 0;
x_range = min(x):0.1:max(x);
y_fit = fitresult(x_range);
while any(diff(y_fit) < tolerance)
% 如果拟合曲线不是单调上升的,重新拟合
[fitresult, gof] = fit(x, y, rational_model);
y_fit = fitresult(x_range);
end
% 可视化数据点和拟合曲线
scatter(x(1:43), y(1:43), [], colors(1), 'filled');
hold on;
scatter(x(44:86), y(44:86), [], colors(2), 'filled');
scatter(x(86:129), y(86:129), [], colors(3), 'filled');
scatter(x(130:172), y(130:172), [], colors(4), 'filled');
scatter(x(173:215), y(173:215), [], colors(5), 'filled');
plot(fitresult,'k',x,y);
xlabel('js\_param');
ylabel('acc\_loss')
legend('AlexNet', 'AlexNet\_BN','VGG\_16','VGG\_19','Inception\_BN','ALL', 'Rational-Fit', 'Location', 'Northeast');
% 获取评价指标
SSE=gof.sse;
R_square = gof.rsquare;
RMSE = gof.rmse;
% 将拟合公式和 R 方显示在图上
text(0.65, 0.2, sprintf('Goodness of fit:\n SSE:%.4f\n R-square:%.4f\n RMSE:%.4f', SSE, R_square, RMSE), 'Units', 'normalized', 'FontSize', 11);
hold off;
\ No newline at end of file
# 改动说明 # 改动说明
## update: 2023/04/26
+ 将新框架应用于所有模型(AlexNet、AlexNet_BN、VGG_16、VGG_19、Inception_BN),并采用不同方式对单个模型拟合,以及对所有模型一同拟合。详见ALL
+ 更新Matlab脚本,支持单模型/多模型拟合,支持选择不同多项式次数的拟合模型。添加约束保证拟合曲线单调不下降
+ ALL的README所提到的POT点不协调的问题
+ 关于精度的讨论可见Inception_BN部分的README
+ 关于原因及方案讨论可见ALL的README
## update: 2023/04/22 ## update: 2023/04/22
+ 添加Inception BN模型,对框架改动如下 + 添加Inception BN模型,对框架改动如下
...@@ -12,7 +25,7 @@ ...@@ -12,7 +25,7 @@
+ 由于允许conv.bias=None,相应改变全精度模型fold_bn方法,从而保证量化前后可比参数相同。改写方式同量化层 + 由于允许conv.bias=None,相应改变全精度模型fold_bn方法,从而保证量化前后可比参数相同。改写方式同量化层
+ 更改js_div计算方法,一个层如果同时有多个参数,例如weight和bias,应该总共加起来权重为1。当前直接简单取平均(即js除以该层参数量),后续考虑加权。PS: Inception_BN中,外层conv层有bias,Inception模块内由于后接bn层,bias为false + 更改js_div计算方法,一个层如果同时有多个参数,例如weight和bias,应该总共加起来权重为1。当前直接简单取平均(即js除以该层参数量),后续考虑加权。PS: Inception_BN中,外层conv层有bias,Inception模块内由于后接bn层,bias为false
+ 由于named_parameters迭代器长度不固定,需要先将排成固定列表再处理,从而获得同一层参数数,改动见ptq.py。对全精度模型做此操作即可 + 由于named_parameters迭代器长度不固定,需要先将排成固定列表再处理,从而获得同一层参数数,改动见ptq.py。对全精度模型做此操作即可
+ 新框架中的model_utils方法可以通过调整反量化位置来进行bug的确定。经过当前实验,可以初步判断精度问题出现在inception结构中,具体信息见Inception_BN相关部分。经过排查,量化框架本身并未出现问题,问题可能在于该模型参数分布与POT集中分布的不适配。 + Inception_BN框架中的model_utils方法可以通过调整反量化位置来进行bug的确定。经过当前实验,可以初步判断精度问题出现在inception结构中,具体信息见Inception_BN相关部分。经过排查,量化框架本身并未出现问题,问题可能在于该模型参数分布与POT量化点分布的不适配。
## update: 2023/04/17 ## update: 2023/04/17
...@@ -26,6 +39,6 @@ ...@@ -26,6 +39,6 @@
+ ptq.py中计算js_param笔误,应由flop_ratio改为par_ratio。否则flops和param拟合没有区别 + ptq.py中计算js_param笔误,应由flop_ratio改为par_ratio。否则flops和param拟合没有区别
+ module.py中bias_qmax方法,应当为float类型传参num_bits为16,e_bits为7. + module.py中bias_qmax方法,应当为float类型传参num_bits为16,e_bits为7.
+ 这里主要关注e_bits,拟合离群点主要为FLOAT_7_E5 / FLOAT_8_E5 / FLOAT_8_E6,其表现为bias两极分布,与之前int量化bias溢出的问题现象相似。 + 这里主要关注e_bits,拟合离群点主要为FLOAT_7_E5 / FLOAT_8_E5 / FLOAT_8_E6,其表现为bias两极分布,与之前int量化bias溢出的问题现象相似。
+ 原先指定e_bits为5,由于bias的scale为input和weight的scale乘积,bias量化范围应当大致为x和weight量化范围的平方倍。目前代码支持的最高x和weight量化范围大致为 $2^{2^{6}}$ ,因此bias范围应当近似取到$2^{2^7}$,即将e_bits指定为7 + 原先指定e_bits为5,由于bias的scale为input和weight的scale乘积,bias量化范围应当大致为x和weight量化范围的平方倍。目前代码支持的最高x和weight量化范围大致为 2的2的6次方 ,因此bias范围应当近似取到 2的2的7次方,即将e_bits指定为7
+ 改动之后,离群点消失,拟合效果显著提高 + 改动之后,离群点消失,拟合效果显著提高
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment