Commit 55cf8622 by Klin

feat: support ResNet/MobileNet and cifar100. Details in README

parent 2d918317
# 模型整合说明
+ 该文件夹下实现了基于cifar10数据集的AlexNet、AlexNet_BN、VGG_16、VGG_19、Inception_BN的整合。
+ 新增了ResNet_18、ResNet_50、ResNet_152、MobileNetV2的整合
### 部署说明
#### cfg_table
在该通用框架下,当前所有模型部署只需要提供一个cfg_table,如包含特殊结构(如Inception_BN的inception多分支结构),额外针对特殊结构提供cfg_table即可。详见`cfg.py`
cfg_table书写规则说明如下:
+ 每项根据进行量化的合并单位给出,例如Conv后接BN或BN+ReLU或BN+ReLU6时,将被统一合并为一个量化层,则在cfg_table中表现为一项,''/'B'/'BRL'/'BRS'参数可选。
+ 针对不涉及量化,但位置需特别指定的操作,如drop、view、softmax,同样在cfg_table中单独指定。
根据cfg_table,也相应简化了fold_ratio和fold_model方法,合并与量化层进行对比的全精度层的参数值、参数量、计算量。
#### 训练方案
+ 使用了新的学习率方案,概述如下:
+ 更大的batch_size(32->128),可以更充分学习到数据分布,以期得到更好的泛化性能和量化效果
+ 更小的初始学习率、余弦退火策略学习率调度器`CosineAnnealingLR`(周期500epochs):以获得更平缓的参数更新,并与更大的batch_size相适应
+ 数据增强:dataloader的transform由Resize更改为RandomCrop
+ 验证集:从训练集切分20%的样本作为验证集,防止测试集过拟合,也避免引入测试集噪声导致量化效果的偏差。
+ weight_decay:惩罚过大权重值,避免过拟合,同时一定程度增强量化效果。
+ “早停”:当连续patience个epoch未取得精度提升时,如果已经达到过最小精度,则早停,从而避免过拟合
+ “复活”:当未达到最小精度界限时,满足上述条件不停止,而是重载bestpoint。该方法最后未采用
+ 不同训练方式的全精度模型准确度和ptq拟合结果
+ SGD优化器(初始lr=0.005,end_lr_bound=0.00005)
+ SGD+momentum=0.9
+ SGD+momentum=0.95+weight_decay=1e-4+nesterov。最终采用
+ Adam优化器(初始lr=0.001,end_lr_bound=0.00001)
| Model | S_acc | S_R2 | S_m9_acc | S_m9_R2 | S_m95_acc | S_m95_R2 | A_acc | A_R2 |
| ------------ | ----- | ---------------------------- | -------- | ------- | --------- | -------- | ----- | ------ |
| AlexNet | 87.33 | 0.9535 | 88.91 | 0.9202 | 87.15 | 0.9587 | 87.75 | 0.4999 |
| AlexNet_BN | 88.17 | 0.9641 | 90.15 | 0.9678 | 89.02 | 0.9508 | 89.86 | 0.8492 |
| VGG_16 | 89.03 | 0.8411 | 92.14 | 0.7813 | 91.67 | 0.8929 | 92.75 | 0.7334 |
| VGG_19 | 88.67 | 0.8750 | 92.27 | 0.7829 | 91.68 | 0.9155 | 92.26 | 0.6578 |
| Inception_BN | 88.55 | 0.9218 | 92.78 | 0.9725 | 92.40 | 0.9776 | 93.77 | 0.9121 |
| ResNet_18 | 84.32 | 0.9493 | 89.24 | 0.9699 | 87.72 | 0.9538 | 90.34 | 0.9585 |
| ResNet_50 | 82.10 | 0.9498 | 89.85 | 0.9693 | 89.52 | 0.9692 | 91.61 | 0.9594 |
| ResNet_152 | 79.76 | 0.8947 | 89.35 | 0.7978 | 88.92 | 0.8519 | 91.78 | 0.9083 |
| MobileNetV2 | 84.61 | 0.9918<br />(no tolerance) | 88.98 | 0.9632 | 88.93 | 0.9882 | 91.93 | 0.8082 |
| ALL | | 0.8014 | | 0.6787 | | 0.7189 | | 0.6726 |
+ 结果展示:
+ sgd-AlexNet
<img src="image\sgd-AlexNet.png" alt="sgd-AlexNet" style="zoom:50%;" />
+ sgd+momentum95+...-AlexNet
<img src="image\sgd-m-AlexNet.png" alt="sgd-m-AlexNet" style="zoom:50%;" />
+ adam-AlexNet
<img src="image\adam-AlexNet.png" alt="adam-AlexNet" style="zoom:50%;" />
+ 最终采用SGD+momentum=0.95+weight_decay=1e-4+nesterov
以下为cifar10上拟合效果,均直接使用js_flops,未进行加权处理
+ 未分段
<img src="image\sgd-ALL.png" alt="sgd-ALL" style="zoom:50%;" />
+ 分段(3.5为分界)
<img src="image\sgd-sep.png" alt="sgd-sep" style="zoom:50%;" />
+ SGD和Adam差别原因分析:
1. Adam优化器采用动态调整的学习率,通常会使模型更新的权重参数具有较大的变化范围。而SGD使用固定学习率,参数更新幅度较小。这会导致Adam优化器训练出的模型参数分布更加集中,而SGD训练出的模型参数分布更加分散,有利于量化。
1. Adam优化器在反向传播时会对梯度进行额外的平滑和校正,缓解梯度消失与爆炸问题;在参数更新时,会考虑一阶矩和二阶矩的影响,实现比较平稳的更新。而SGD直接使用原始梯度。这也会导致Adam优化器训练出的模型参数更加集中,不利于量化。
3. Adam优化器对小的梯度更加敏感,可以更快找到较优解。但对模型量化不利。SGD相对而言更新步骤更大,有利于量化。
所以,总体来说,Adam优化器采用的平滑逐步的梯度更新策略,会得到参数分布更加集中的模型。这种模型不利于进行有损压缩的量化。而SGD的比较激进的更新方式,会得到参数分布更加分散的模型,更加适合量化。
+ 如果想加快训练速度,可以缩减周期训练epoch数,并增加dataloader的num_worker以缩短数据加载实现。(num_workder的并行加载有概率出现访问共享内存错误。不进行手动指定随机种子,则每个num_worker会自动分配不同的随机种子,可以一定程度避免这个错误)
### PTQ部分概述
#### matlab脚本
+ `flops\param``flops\param_all``flops_all_weight`分别是对单个模型、全体模型、计算量加权后全体模型的拟合。
针对单个模型,脚本使用不同颜色标记不同量化类别的点;针对全体模型,脚本使用不同颜色标记不同模型的点。
+ 脚本内添加了拟合条件限制,以保证拟合曲线一定是单调不降的。如果想允许拟合曲线在某个位置出现略微的转折下降,可以改动tolerance为一个较小的负数,如-1e-5。
+ 拟合条件限制基于采样实现,对于fakefreeze,采样步长设为0.1,对于L2方法,采样步长设为10
+ 脚本同时在图像上输出拟合效果评估,如SSE、R方、RMSE指标。
+ 支持选择拟合模型,当前可选择poly为2/3/4分别表示rational模型分子分母的多项式次数
+ 由于每次拟合结果会具有细微差别,后续可以更改脚本,使得脚本在每次拟合完成后再进行三次符合约束的拟合,效果均不好于当前拟合才终止,否则取效果更好的拟合结果,并重复该过程。该步骤目前由人工完成
+ 新增flops_sep,对poly2提供了分段拟合。
#### 拟合方式及结果简述
+ 使用fakefreeze方法对量化层参数rescale到和全精度层一致。
+ 将weight:bias的js散度权重比设置为1:1,即有bias的层,weight影响不变,bias额外施加影响。
+ 对模型计算量进行log10和cuberoot加权
#### 拟合中发现的问题
在VGG_16 VGG_19 Inception_BN的fakefreeze方式中,都观察到POT量化点扎堆(acc_loss具有略小差距,js_div相近,在图上表现为连续的一竖列点),影响了量化效果。
观察这些模型的权重参数分布,可以发现出现问题的模型存在着无尖端的权重分布。而有尖无尖的分布在面对不同量化方式的分布如下:
![diff1](image\diff1.png)
![diff2](image\diff2.png)
根据不同模型权重分布的特点,可以推测出现问题的模型POT量化散度较大且集中的重要原因是量化后分布与原分布趋势不同。基于此,我们可能需要在相似度之外额外的考虑模型参数分布与量化方式的适配性。这需要进行实验的验证,例如,直接度量全精度模型-量化模型用于衡量分布趋势的系数;度量全精度权重的尖锐程度和量化表的尖锐程度等。并将所得值作用于原先所求js散度上。
+ 方案一:度量全精度模型、量化模型分布趋势相似度
使用pearson相关系数或余弦相似度,并作用于js散度。例如,若POT量化的余弦相似度较小(趋势差异较大),考虑将js散度乘余弦相似度,从而矫正因趋势过大的散度。
+ 方案二:考虑尖锐程度
考虑到无尖端分布遇到有极大尖端的POT量化点列表会产生不同趋势的问题,从分布和量化点的角度入手。例如,衡量在均值范围内的比例,差异较大可能说明尖锐程度差异大,从而矫正js散度。或者可以考虑对原分布做bins切割,若某个bins有量化点则统计该bins内元素,考虑所有和量化点在同一bins的点数以衡量分布与量化方式的适配度。
#### 后续增强拟合效果的方案
+ 针对POT量化点扎堆,可以考虑使用更关注趋势的Pearson相关系数、余弦相似度等对js散度进行修正,或者考虑将量化范围切分多个bins评估量化点覆盖率的方式修正。
+ 对weight和bias采取更合理的加权方式
+ 根据对精度的影响(不易衡量,不易确定基准)
+ 在模型整合上,尝试更有效的加权方式
+ 考虑到js散度达到一定值后acc_loss不会再上升(因为最差效果是随机分类,准确度也有10%),采取分段拟合的方式。
#!/bin/bash
#- Job parameters
# (TODO)
# Please modify job name
#- Resources
# (TODO)
# Please modify your requirements
#SBATCH -p nv-gpu # Submit to 'nv-gpu' Partitiion
#SBATCH -t 0-08:00:00
### #SBATCH -t 1-06:00:00 # Run for a maximum time of 0 days, 12 hours, 00 mins, 00 secs
#SBATCH --nodes=1 # Request N nodes
#SBATCH --gres=gpu:1 # Request M GPU per node
#SBATCH --gres-flags=enforce-binding # CPU-GPU Affinity
#SBATCH --qos=gpu-short
### #SBATCH --qos=gpu-normal # Request QOS Type
###
### The system will alloc 8 or 16 cores per gpu by default.
### If you need more or less, use following:
### #SBATCH --cpus-per-task=K # Request K cores
###
###
### Without specifying the constraint, any available nodes that meet the requirement will be allocated
### You can specify the characteristics of the compute nodes, and even the names of the compute nodes
###
### #SBATCH --nodelist=gpu-v00 # Request a specific list of hosts
### #SBATCH --constraint="Volta|RTX8000" # Request GPU Type: Volta(V100 or V100S) or RTX8000
###
# set constraint for RTX8000 to meet my cuda
#SBATCH --constraint="Ampere|RTX8000"
#- Log information
echo "Job start at $(date "+%Y-%m-%d %H:%M:%S")"
echo "Job run at:"
echo "$(hostnamectl)"
#- Load environments
source /tools/module_env.sh
module list # list modules loaded
##- Tools
module load cluster-tools/v1.0
module load slurm-tools/v1.0
module load cmake/3.15.7
module load git/2.17.1
module load vim/8.1.2424
##- language
module load python3/3.6.8
##- CUDA
# module load cuda-cudnn/10.2-7.6.5
# module load cuda-cudnn/11.2-8.2.1
module load cuda-cudnn/11.1-8.2.1
##- virtualenv
# source xxxxx/activate
echo $(module list) # list modules loaded
echo $(which gcc)
echo $(which python)
echo $(which python3)
cluster-quota # nas quota
nvidia-smi --format=csv --query-gpu=name,driver_version,power.limit # gpu info
#- Warning! Please not change your CUDA_VISIBLE_DEVICES
#- in `.bashrc`, `env.sh`, or your job script
echo "Use GPU ${CUDA_VISIBLE_DEVICES}" # which gpus
#- The CUDA_VISIBLE_DEVICES variable is assigned and specified by SLURM
#- Job step
# [EDIT HERE(TODO)]
python analyse.py $Model
#- End
echo "Job end at $(date "+%Y-%m-%d %H:%M:%S")"
import torch
def build_int_list(num_bits):
plist = [0.]
for i in range(0,2**(num_bits-1)):
# i最高到0,即pot量化最大值为1
plist.append(i)
plist.append(-i)
plist = torch.Tensor(list(set(plist)))
# plist = plist.mul(1.0 / torch.max(plist))
return plist
def std_mid_ratio(x):
x = x.view(-1)
std = torch.std(x)
max = 3*std#.item()
min = 3*(-std)#.item()
max = torch.max(torch.abs(max),torch.abs(min))
mid = max/2
cond = torch.logical_and(x>=-mid,x<=mid)
cnt = torch.sum(cond).item()
ratio = cnt/len(x)
return ratio
def range_mid_ratio(x):
x = x.view(-1)
max = torch.max(x)
min = torch.min(x)
max = torch.max(torch.abs(max),torch.abs(min))
mid = max/2
cond = torch.logical_and(x>=-mid,x<=mid)
cnt = torch.sum(cond).item()
ratio = cnt/len(x)
return ratio
def pearson_corr(tensor1, tensor2):
"""
计算tensor1和tensor2的Pearson相关系数
"""
if torch.equal(tensor1,tensor2):
return 1.0
# 将tensor1和tensor2展平为二维
tensor1 = tensor1.view(-1, tensor1.size(-1))
tensor2 = tensor2.view(-1, tensor2.size(-1))
# 计算tensor1和tensor2的均值
tensor1_mean = torch.mean(tensor1, dim=0)
tensor2_mean = torch.mean(tensor2, dim=0)
# 计算centered tensor
tensor1_c = tensor1 - tensor1_mean
tensor2_c = tensor2 - tensor2_mean
# 计算covariance matrix
cov_mat = torch.matmul(tensor1_c.t(), tensor2_c) / (tensor1.size(0) - 1)
# 计算相关系数
corr_mat = cov_mat / torch.std(tensor1, dim=0) / torch.std(tensor2, dim=0)
pearson = torch.mean(corr_mat)
return pearson.item()
def cos_sim(a,b):
a = a.view(-1)
b = b.view(-1)
cossim = torch.cosine_similarity(a, b, dim=0, eps=1e-6)
# cossim = (cossim-0.97)/(1-0.97)
return cossim.item()
def kurtosis(tensor):
mean = tensor.mean()
std = tensor.std(unbiased=False)
n = tensor.numel()
fourth_moment = ((tensor - mean)**4).sum() / n
second_moment = std**2
kurt = (fourth_moment / second_moment**2) - 3
return kurt.item()
\ No newline at end of file
# conv: 'C',''/'B'/'BRL'/'BRS',qi,in_ch,out_ch,kernel_size,stirde,padding,bias
# relu: 'RL'
# relu6: 'RS'
# inception: 'Inc'
# maxpool: 'MP',kernel_size,stride,padding
# adaptiveavgpool: 'AAP',output_size
# view: 'VW':
# dafault: x = x.view(x.size(0),-1)
# dropout: 'D'
# MakeLayer: 'ML','BBLK'/'BTNK'/'IRES', ml_idx, blocks
# softmax: 'SM'
# class 10
ResNet_18_cfg_table = [
['C','BRL',True,3,16,3,1,1,True],
['ML','BBLK',0,2],
['ML','BBLK',1,2],
['ML','BBLK',2,2],
['ML','BBLK',3,2],
['AAP',1],
['VW'],
['FC',128,10,True],
['SM']
]
ResNet_50_cfg_table = [
['C','BRL',True,3,16,3,1,1,True],
['ML','BTNK',0,3],
['ML','BTNK',1,4],
['ML','BTNK',2,6],
['ML','BTNK',3,3],
['AAP',1],
['VW'],
['FC',512,10,True],
['SM']
]
ResNet_152_cfg_table = [
['C','BRL',True,3,16,3,1,1,True],
['ML','BTNK',0,3],
['ML','BTNK',1,8],
['ML','BTNK',2,36],
['ML','BTNK',3,3],
['AAP',1],
['VW'],
['FC',512,10,True],
['SM']
]
MobileNetV2_cfg_table = [
['C','BRS',True,3,32,3,1,1,True],
['ML','IRES',0,1],
['ML','IRES',1,2],
['ML','IRES',2,3],
['ML','IRES',3,3],
['ML','IRES',4,3],
['ML','IRES',5,1],
['C','',False,320,1280,1,1,0,True],
['AAP',1],
['VW'],
['FC',1280,10,True]
]
AlexNet_cfg_table = [
['C','',True,3,32,3,1,1,True],
['RL'],
['MP',2,2,0],
['C','',False,32,64,3,1,1,True],
['RL'],
['MP',2,2,0],
['C','',False,64,128,3,1,1,True],
['RL'],
['C','',False,128,256,3,1,1,True],
['RL'],
['C','',False,256,256,3,1,1,True],
['RL'],
['MP',3,2,0],
['VW'],
['D',0.5],
['FC',2304,1024,True],
['RL'],
['D',0.5],
['FC',1024,512,True],
['RL'],
['FC',512,10,True]
]
AlexNet_BN_cfg_table = [
['C','BRL',True,3,32,3,1,1,True],
['MP',2,2,0],
['C','BRL',False,32,64,3,1,1,True],
['MP',2,2,0],
['C','BRL',False,64,128,3,1,1,True],
['C','BRL',False,128,256,3,1,1,True],
['C','BRL',False,256,256,3,1,1,True],
['MP',3,2,0],
['VW'],
['D',0.5],
['FC',2304,1024,True],
['RL'],
['D',0.5],
['FC',1024,512,True],
['RL'],
['FC',512,10,True]
]
VGG_16_cfg_table = [
['C','BRL',True,3,64,3,1,1,True],
['C','BRL',False,64,64,3,1,1,True],
['MP',2,2,0],
['C','BRL',False,64,128,3,1,1,True],
['C','BRL',False,128,128,3,1,1,True],
['MP',2,2,0],
['C','BRL',False,128,256,3,1,1,True],
['C','BRL',False,256,256,3,1,1,True],
['C','BRL',False,256,256,3,1,1,True],
['MP',2,2,0],
['C','BRL',False,256,512,3,1,1,True],
['C','BRL',False,512,512,3,1,1,True],
['C','BRL',False,512,512,3,1,1,True],
['MP',2,2,0],
['C','BRL',False,512,512,3,1,1,True],
['C','BRL',False,512,512,3,1,1,True],
['C','BRL',False,512,512,3,1,1,True],
['MP',2,2,0],
['VW'],
['FC',512,4096,True],
['RL'],
['D',0.5],
['FC',4096,4096,True],
['RL'],
['D',0.5],
['FC',4096,10,True]
]
VGG_19_cfg_table = [
['C','BRL',True,3,64,3,1,1,True],
['C','BRL',False,64,64,3,1,1,True],
['MP',2,2,0],
['C','BRL',False,64,128,3,1,1,True],
['C','BRL',False,128,128,3,1,1,True],
['MP',2,2,0],
['C','BRL',False,128,256,3,1,1,True],
['C','BRL',False,256,256,3,1,1,True],
['C','BRL',False,256,256,3,1,1,True],
['C','BRL',False,256,256,3,1,1,True],
['MP',2,2,0],
['C','BRL',False,256,512,3,1,1,True],
['C','BRL',False,512,512,3,1,1,True],
['C','BRL',False,512,512,3,1,1,True],
['C','BRL',False,512,512,3,1,1,True],
['MP',2,2,0],
['C','BRL',False,512,512,3,1,1,True],
['C','BRL',False,512,512,3,1,1,True],
['C','BRL',False,512,512,3,1,1,True],
['C','BRL',False,512,512,3,1,1,True],
['MP',2,2,0],
['VW'],
['FC',512,4096,True],
['RL'],
['D',0.5],
['FC',4096,4096,True],
['RL'],
['D',0.5],
['FC',4096,10,True]
]
Inception_BN_cfg_table = [
['C','',True,3,64,3,1,1,True],
['RL'],
['C','',False,64,64,3,1,1,True],
['RL'],
['Inc',0],
['Inc',1],
['MP',3,2,1],
['Inc',2],
['Inc',3],
['Inc',4],
['Inc',5],
['Inc',6],
['MP',3,2,1],
['Inc',7],
['Inc',8],
['AAP',1],
['C','',False,1024,10,1,1,0,True],
['VW']
]
model_cfg_table = {
'AlexNet' : AlexNet_cfg_table,
'AlexNet_BN' : AlexNet_BN_cfg_table,
'VGG_16' : VGG_16_cfg_table,
'VGG_19' : VGG_19_cfg_table,
'Inception_BN' : Inception_BN_cfg_table,
'ResNet_18' : ResNet_18_cfg_table,
'ResNet_50' : ResNet_50_cfg_table,
'ResNet_152' : ResNet_152_cfg_table,
'MobileNetV2' : MobileNetV2_cfg_table
}
#每行对应一个Inc结构(channel)的参数表
inc_ch_table=[
[ 64, 64, 96,128, 16, 32, 32],#3a
[256,128,128,192, 32, 96, 64],#3b
[480,192, 96,208, 16, 48, 64],#4a
[512,160,112,224, 24, 64, 64],#4b
[512,128,128,256, 24, 64, 64],#4c
[512,112,144,288, 32, 64, 64],#4d
[528,256,160,320, 32,128,128],#4e
[832,256,160,320, 32,128,128],#5a
[832,384,192,384, 48,128,128] #5b
]
# br0,br1,br2,br3 <- br1x1,br3x3,br5x5,brM
# 每个子数组对应Inc结构中一个分支的结构,均默认含'BRL'参数,bias为False
# Conv层第2、3个参数是对应Inc结构(即ch_table中的一行)中的索引
# 由于每个Inc结构操作一致,只有权重不同,使用索引而非具体值,方便复用
# 各分支后还有Concat操作,由于只有唯一结构,未特殊说明
# conv: 'C', ('BRL' default), in_ch_idex, out_ch_idx, kernel_size, stride, padding, (bias: True default)
# maxpool: 'MP', kernel_size, stride, padding
# relu: 'RL'
inc_cfg_table = [
[
['C',0,1,1,1,0]
],
[
['C',0,2,1,1,0],
['C',2,3,3,1,1]
],
[
['C',0,4,1,1,0],
['C',4,5,5,1,2]
],
[
['MP',3,1,1],
['RL'],
['C',0,6,1,1,0]
]
]
# ml_cfg_table = []
#BasicBlock
#value: downsample,inplanes,planes,planes*expansion,stride,1(dafault stride and group)
bblk_ch_table = [
[False, 16, 16, 16,1,1], #layer1,first
[False, 16, 16, 16,1,1], # other
[True, 16, 32, 32,2,1], #layer2
[False, 32, 32, 32,1,1],
[True, 32, 64, 64,2,1], #layer3
[False, 64, 64, 64,1,1],
[True, 64,128,128,2,1], #layer4
[False,128,128,128,1,1]
]
#conv: 'C','B'/'BRL'/'BRS', in_ch_idx, out_ch_idx, kernel_sz, stride_idx, padding, groups_idx (bias: True default)
#add: 'AD', unconditonal. unconditonal为true或flag为true时将outs中两元素相加
bblk_cfg_table = [
[
['C','BRL',1,2,3,4,1,5],
['C','B' ,2,2,3,5,1,5],
],
# downsample, 仅当downsample传入为True时使用
[
['C','B' ,1,3,1,4,0,5]
],
# 分支交汇后动作
[
['AD',True],
['RL']
]
]
#BottleNeck
#value: downsample,inplanes,planes,planes*expansion,stride,1(dafault stride and group)
btnk_ch_table = [
[True, 16, 16, 64,1,1], #layer1,first
[False, 64, 16, 64,1,1], # other
[True, 64, 32,128,2,1], #layer2
[False,128, 32,128,1,1],
[True, 128, 64,256,2,1], #layer3
[False,256, 64,256,1,1],
[True, 256,128,512,2,1], #layer4
[False,512,128,512,1,1]
]
#conv: 'C','B'/'BRL'/'BRS', in_ch_idx, out_ch_idx, kernel_sz, stride_idx, padding, groups_idx (bias: True default)
#add: 'AD', unconditonal. unconditonal为true或flag为true时将outs中两元素相加
btnk_cfg_table = [
[
['C','BRL',1,2,1,5,0,5],
['C','BRL',2,2,3,4,1,5],
['C','B' ,2,3,1,5,0,5]
],
# downsample, 仅当downsample传入为True时使用
[
['C','B' ,1,3,1,4,0,5]
],
# 分支交汇后动作
[
['AD',True],
['RL']
]
]
#InvertedResidual
#value: identity_flag, in_ch, out_ch, in_ch*expand_ratio, stride, 1(dafault stride and group)
ires_ch_table = [
[False, 32, 16, 32,1,1], #layer1,first
[ True, 16, 16, 16,1,1], # other
[False, 16, 24, 96,2,1], #layer2
[ True, 24, 24, 144,1,1],
[False, 24, 32, 144,2,1], #layer3
[ True, 32, 32, 192,1,1],
[False, 32, 96, 192,1,1], #layer4
[ True, 96, 96, 576,1,1],
[False, 96,160, 576,2,1], #layer5
[ True,160,160, 960,1,1],
[False,160,320, 960,1,1], #layer6
[ True,320,320,1920,1,1]
]
#conv: 'C','B'/'BRL'/'BRS', in_ch_idx, out_ch_idx, kernel_sz, stride_idx, padding, groups_idx (bias: True default)
#add: 'AD', unconditonal. unconditonal为true或flag为true时将outs中两元素相加
ires_cfg_table = [
[
['C','BRS',1,3,1,5,0,5],
['C','BRS',3,3,3,4,1,3],
['C','B' ,3,2,1,5,0,5]
],
# identity_br empty
[
],
# 分支汇合后操作
[
['AD',False] #有条件的相加
]
]
\ No newline at end of file
......@@ -6,8 +6,13 @@ import os
def extract_ratio(model_name):
fr = open('param_flops/'+model_name+'.txt','r')
lines = fr.readlines()
Mac = lines[1].split('Mac,')[0].split(',')[-1]
#跳过warning
for i in range(len(lines)):
if 'Model' in lines[i]:
head = i+1
break
Mac = lines[head].split('Mac,')[0].split(',')[-1]
if 'M' in Mac:
Mac = Mac.split('M')[0]
Mac = float(Mac)
......@@ -16,13 +21,18 @@ def extract_ratio(model_name):
Mac = float(Mac)
Mac *= 1024
Param = lines[1].split('M,')[0]
Param = float(Param)
Param = lines[head].split(',')[0]
if 'M' in Param:
Param = Param.split('M')[0]
Param = float(Param)
elif 'k' in Param:
Param = Param.split('k')[0]
Param = float(Param)
Param /= 1024
layer = []
par_ratio = []
flop_ratio = []
weight_ratio = []
for line in lines:
if '(' in line and ')' in line:
layer.append(line.split(')')[0].split('(')[1])
......@@ -32,34 +42,14 @@ def extract_ratio(model_name):
r2 = line.split('%')[-2].split(',')[-1]
r2 = float(r2)
flop_ratio.append(r2)
if 'conv' in line:
#无论是否bias=false都计算,fold之后直接使用conv的近似计算
inch = line.split(',')[4]
# outch = line.split(',')[5]
klsz = line.split(',')[6].split('(')[-1]
inch = float(inch)
# outch = float(outch)
klsz = float(klsz)
wr = inch * klsz * klsz
wr = wr / (1+wr)
weight_ratio.append(wr)
elif 'fc' in line:
inch = line.split(',')[4].split('=')[-1]
inch = float(inch)
wr = inch / (1+inch)
weight_ratio.append(wr)
else:
weight_ratio.append(0)
return Mac, Param, layer, par_ratio, flop_ratio, weight_ratio
return Mac, Param, layer, par_ratio, flop_ratio
if __name__ == "__main__":
Mac, Param, layer, par_ratio, flop_ratio, weight_ratio = extract_ratio('Inception_BN')
Mac, Param, layer, par_ratio, flop_ratio = extract_ratio('Inception_BN')
print(Mac)
print(Param)
print(layer)
print(par_ratio)
print(flop_ratio)
print(weight_ratio)
\ No newline at end of file
print(flop_ratio)
\ No newline at end of file
# 模型整合说明
# fit_bkp
+ 该文件夹下实现了基于cifar10数据集的AlexNet、AlexNet_BN、VGG_16、VGG_19、Inception_BN的整合。
+ 备份之前的数据拟合分析结果
### 部署说明
#### cfg_table
在该通用框架下,当前所有模型部署只需要提供一个cfg_table,如包含特殊结构(如Inception_BN的inception多分支结构),额外针对特殊结构提供cfg_table即可。详见`cfg.py`
cfg_table书写规则说明如下:
+ 每项根据进行量化的合并单位给出,例如Conv后接BN或BN+ReLu时,将被统一合并为一个量化层,则在cfg_table中表现为一项,''/'B'/'BR'参数可选。
+ 针对不涉及量化,但位置需特别指定的操作,如flatten、drop,同样在cfg_table中单独指定。
根据cfg_table,也相应简化了fold_ratio和fold_model方法,合并与量化层进行对比的全精度层的参数值、参数量、计算量。
#### 训练方案
+ 为每个模型提供了梯度学习率的训练方案,以获得更好的全精度精确度。根据传入的模型名字会自动从epochs_cfg_table和lr_cfg_table中提取,详见`train.py`
### PTQ部分概述
#### matlab脚本
+ `flops\param``flops\param_all``flops_all_weight`分别是对单个模型、全体模型、计算量加权后全体模型的拟合。
针对单个模型,脚本使用不同颜色标记不同量化类别的点;针对全体模型,脚本使用不同颜色标记不同模型的点。
+ 脚本内添加了拟合条件限制,以保证拟合曲线一定是单调不降的。如果想允许拟合曲线在某个位置出现略微的转折下降,可以改动tolerance为一个较小的负数,如-1e-5。
+ 拟合条件限制基于采样实现,对于fakefreeze,采样步长设为0.1,对于L2方法,采样步长设为10
+ 脚本同时在图像上输出拟合效果评估,如SSE、R方、RMSE指标。
+ 支持选择拟合模型,当前可选择poly为2/3/4分别表示rational模型分子分母的多项式次数
+ 由于每次拟合结果会具有细微差别,后续可以更改脚本,使得脚本在每次拟合完成后再进行三次符合约束的拟合,效果均不好于当前拟合才终止,否则取效果更好的拟合结果,并重复该过程。该步骤目前由人工完成
## update:2023/04/26
#### 拟合方式及结果简述
......@@ -63,9 +34,9 @@ cfg_table书写规则说明如下:
观察这些模型的权重参数分布,可以发现出现问题的模型存在着无尖端的权重分布。而有尖无尖的分布在面对不同量化方式的分布如下:
![diff1](image/diff1.png)
![diff1](image\diff1.png)
![diff2](image/diff2.png)
![diff2](image\diff2.png)
根据不同模型权重分布的特点,可以推测出现问题的模型POT量化散度较大且集中的重要原因是量化后分布与原分布趋势不同。基于此,我们可能需要在相似度之外额外的考虑模型参数分布与量化方式的适配性。这需要进行实验的验证,例如,直接度量全精度模型-量化模型用于衡量分布趋势的系数;度量全精度权重的尖锐程度和量化表的尖锐程度等。并将所得值作用于原先所求js散度上。
......@@ -109,23 +80,23 @@ cfg_table书写规则说明如下:
+ 所有模型拟合
![L2_param](image/L2_param.png)
![L2_param](image\L2_param.png)
![L2_flops](image/L2_flops.png)
![L2_flops](image\L2_flops.png)
![L2_flops_weighted](image/L2_flops_weighted.png)
![L2_flops_weighted](image\L2_flops_weighted.png)
+ 单个模型拟合
![L2_AlexNet](image/L2_AlexNet.png)
![L2_AlexNet](image\L2_AlexNet.png)
![L2_AlexNet_BN](image/L2_AlexNet_BN.png)
![L2_AlexNet_BN](image\L2_AlexNet_BN.png)
![L2_VGG_16](image/L2_VGG_16.png)
![L2_VGG_16](image\L2_VGG_16.png)
![L2_VGG_19](image/L2_VGG_19.png)
![L2_VGG_19](image\L2_VGG_19.png)
![L2_Inception_BN](image/L2_Inception_BN.png)
![L2_Inception_BN](image\L2_Inception_BN.png)
......@@ -133,25 +104,25 @@ cfg_table书写规则说明如下:
+ 所有模型拟合
![fakefreeze_param](image/fakefreeze_param.png)
![fakefreeze_param](image\fakefreeze_param.png)
![fakefreeze_flops](image/fakefreeze_flops.png)
![fakefreeze_flops](image\fakefreeze_flops.png)
![fakefreeze_flops_weighted_log](image/fakefreeze_flops_weighted_log.png)
![fakefreeze_flops_weighted_log](image\fakefreeze_flops_weighted_log.png)
![fakefreeze_flops_weighted_cuberoot](image/fakefreeze_flops_weighted_cuberoot.png)
![fakefreeze_flops_weighted_cuberoot](image\fakefreeze_flops_weighted_cuberoot.png)
+ 单个模型拟合
![fakefreeze_AlexNet](image/fakefreeze_AlexNet.png)
![fakefreeze_AlexNet](image\fakefreeze_AlexNet.png)
![fakefreeze_AlexNet_BN](image/fakefreeze_AlexNet_BN.png)
![fakefreeze_AlexNet_BN](image\fakefreeze_AlexNet_BN.png)
![fakefreeze_VGG_16](image/fakefreeze_VGG_16.png)
![fakefreeze_VGG_16](image\fakefreeze_VGG_16.png)
![fakefreeze_VGG_19](image/fakefreeze_VGG_19.png)
![fakefreeze_VGG_19](image\fakefreeze_VGG_19.png)
![fakefreeze_Inception_BN](image/fakefreeze_Inception_BN.png)
![fakefreeze_Inception_BN](image\fakefreeze_Inception_BN.png)
......@@ -159,23 +130,23 @@ cfg_table书写规则说明如下:
+ 所有模型拟合
![fakefreeze_nodiv_param](image/fakefreeze_nodiv_param.png)
![fakefreeze_nodiv_param](image\fakefreeze_nodiv_param.png)
![fakefreeze_nodiv_flops](image/fakefreeze_nodiv_flops.png)
![fakefreeze_nodiv_flops](image\fakefreeze_nodiv_flops.png)
![fakefreeze_nodiv_flops_weighted_log](image/fakefreeze_nodiv_flops_weighted_log.png)
![fakefreeze_nodiv_flops_weighted_log](image\fakefreeze_nodiv_flops_weighted_log.png)
![fakefreeze_nodiv_flops_weighted_cuderoot](image/fakefreeze_nodiv_flops_weighted_cuderoot.png)
![fakefreeze_nodiv_flops_weighted_cuderoot](image\fakefreeze_nodiv_flops_weighted_cuderoot.png)
+ 单个模型拟合
![fakefreeze_nodiv_AlexNet](image/fakefreeze_nodiv_AlexNet.png)
![fakefreeze_nodiv_AlexNet](image\fakefreeze_nodiv_AlexNet.png)
![fakefreeze_nodiv_AlexNet_BN](image/fakefreeze_nodiv_AlexNet_BN.png)
![fakefreeze_nodiv_AlexNet_BN](image\fakefreeze_nodiv_AlexNet_BN.png)
![fakefreeze_nodiv_VGG_16](image/fakefreeze_nodiv_VGG_16.png)
![fakefreeze_nodiv_VGG_16](image\fakefreeze_nodiv_VGG_16.png)
![fakefreeze_nodiv_VGG_19](image/fakefreeze_nodiv_VGG_19.png)
![fakefreeze_nodiv_VGG_19](image\fakefreeze_nodiv_VGG_19.png)
![fakefreeze_nodiv_Inception_BN](image/fakefreeze_nodiv_Inception_BN.png)
![fakefreeze_nodiv_Inception_BN](image\fakefreeze_nodiv_Inception_BN.png)
......@@ -9,8 +9,8 @@ if __name__ == "__main__":
model_name = sys.argv[1]
model = Model(model_name)
full_file = 'ckpt/cifar10_'+model_name+'.pt'
model.load_state_dict(torch.load(full_file))
# full_file = 'ckpt/cifar10_'+model_name+'.pt'
# model.load_state_dict(torch.load(full_file))
flops, params = get_model_complexity_info(model, (3, 32, 32), as_strings=True, print_per_layer_stat=True)
......@@ -82,12 +82,13 @@ echo "Use GPU ${CUDA_VISIBLE_DEVICES}" # which gpus
#- Job step
# [EDIT HERE(TODO)]
name_list="AlexNet AlexNet_BN VGG_16 VGG_19 Inception_BN"
name_list="AlexNet AlexNet_BN VGG_16 VGG_19 Inception_BN ResNet_18 ResNet_50 ResNet_152 MobileNetV2"
# name_list="MobileNetV2"
for name in $name_list; do
if [ -f "param_flops/$name.txt" ];then
echo "$name: param_flops exists"
elif [ ! -f "ckpt/cifar10_$name.pt" ];then
echo "$name: ckpt not exists"
# elif [ ! -f "ckpt/cifar10_$name.pt" ];then
# echo "$name: ckpt not exists"
else
python get_param_flops.py $name > param_flops/$name.txt
fi
......
import torch.nn as nn
from cfg import *
from module import *
from model_deployment import *
class Model(nn.Module):
def __init__(self,model_name,num_classes=10):
super(Model, self).__init__()
self.cfg_table = model_cfg_table[model_name]
make_layers(self,self.cfg_table)
# # 参数初始化
# for m in self.modules():
# if isinstance(m, nn.Conv2d):
# nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
# elif isinstance(m, nn.BatchNorm2d):
# nn.init.constant_(m.weight, 1)
# nn.init.constant_(m.bias, 0)
# elif isinstance(m, nn.Linear):
# nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
def forward(self,x):
x = model_forward(self,self.cfg_table,x)
return x
def quantize(self, quant_type, num_bits=8, e_bits=3):
model_quantize(self,self.cfg_table,quant_type,num_bits,e_bits)
def quantize_forward(self,x):
return model_utils(self,self.cfg_table,func='forward',x=x)
def freeze(self):
model_utils(self,self.cfg_table,func='freeze')
def quantize_inference(self,x):
return model_utils(self,self.cfg_table,func='inference',x=x)
def fakefreeze(self):
model_utils(self,self.cfg_table,func='fakefreeze')
# if __name__ == "__main__":
# model = Inception_BN()
# model.quantize('INT',8,3)
# print(model.named_modules)
# print('-------')
# print(model.named_parameters)
# print(len(model.conv0.named_parameters()))
\ No newline at end of file
Model(
702.67 k, 100.000% Params, 35.65 MMac, 100.000% MACs,
(conv0): Conv2d(448, 0.064% Params, 458.75 KMac, 1.287% MACs, 3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn0): BatchNorm2d(32, 0.005% Params, 32.77 KMac, 0.092% MACs, 16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu0): ReLU(0, 0.000% Params, 16.38 KMac, 0.046% MACs, inplace=True)
(ml0_blk0_ma_conv0): Conv2d(2.32 k, 0.330% Params, 2.38 MMac, 6.664% MACs, 16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(ml0_blk0_ma_bn0): BatchNorm2d(32, 0.005% Params, 32.77 KMac, 0.092% MACs, 16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(ml0_blk0_ma_relu0): ReLU(0, 0.000% Params, 16.38 KMac, 0.046% MACs, inplace=True)
(ml0_blk0_ma_conv1): Conv2d(2.32 k, 0.330% Params, 2.38 MMac, 6.664% MACs, 16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(ml0_blk0_ma_bn1): BatchNorm2d(32, 0.005% Params, 32.77 KMac, 0.092% MACs, 16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(ml0_blk0_relu1): ReLU(0, 0.000% Params, 16.38 KMac, 0.046% MACs, inplace=True)
(ml0_blk1_ma_conv0): Conv2d(2.32 k, 0.330% Params, 2.38 MMac, 6.664% MACs, 16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(ml0_blk1_ma_bn0): BatchNorm2d(32, 0.005% Params, 32.77 KMac, 0.092% MACs, 16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(ml0_blk1_ma_relu0): ReLU(0, 0.000% Params, 16.38 KMac, 0.046% MACs, inplace=True)
(ml0_blk1_ma_conv1): Conv2d(2.32 k, 0.330% Params, 2.38 MMac, 6.664% MACs, 16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(ml0_blk1_ma_bn1): BatchNorm2d(32, 0.005% Params, 32.77 KMac, 0.092% MACs, 16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(ml0_blk1_relu1): ReLU(0, 0.000% Params, 16.38 KMac, 0.046% MACs, inplace=True)
(ml1_blk0_ma_conv0): Conv2d(4.64 k, 0.660% Params, 1.19 MMac, 3.332% MACs, 16, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(ml1_blk0_ma_bn0): BatchNorm2d(64, 0.009% Params, 16.38 KMac, 0.046% MACs, 32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(ml1_blk0_ma_relu0): ReLU(0, 0.000% Params, 8.19 KMac, 0.023% MACs, inplace=True)
(ml1_blk0_ma_conv1): Conv2d(9.25 k, 1.316% Params, 2.37 MMac, 6.641% MACs, 32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(ml1_blk0_ma_bn1): BatchNorm2d(64, 0.009% Params, 16.38 KMac, 0.046% MACs, 32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(ml1_blk0_ds_conv0): Conv2d(544, 0.077% Params, 139.26 KMac, 0.391% MACs, 16, 32, kernel_size=(1, 1), stride=(2, 2))
(ml1_blk0_ds_bn0): BatchNorm2d(64, 0.009% Params, 16.38 KMac, 0.046% MACs, 32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(ml1_blk0_relu1): ReLU(0, 0.000% Params, 8.19 KMac, 0.023% MACs, inplace=True)
(ml1_blk1_ma_conv0): Conv2d(9.25 k, 1.316% Params, 2.37 MMac, 6.641% MACs, 32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(ml1_blk1_ma_bn0): BatchNorm2d(64, 0.009% Params, 16.38 KMac, 0.046% MACs, 32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(ml1_blk1_ma_relu0): ReLU(0, 0.000% Params, 8.19 KMac, 0.023% MACs, inplace=True)
(ml1_blk1_ma_conv1): Conv2d(9.25 k, 1.316% Params, 2.37 MMac, 6.641% MACs, 32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(ml1_blk1_ma_bn1): BatchNorm2d(64, 0.009% Params, 16.38 KMac, 0.046% MACs, 32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(ml1_blk1_relu1): ReLU(0, 0.000% Params, 8.19 KMac, 0.023% MACs, inplace=True)
(ml2_blk0_ma_conv0): Conv2d(18.5 k, 2.632% Params, 1.18 MMac, 3.321% MACs, 32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(ml2_blk0_ma_bn0): BatchNorm2d(128, 0.018% Params, 8.19 KMac, 0.023% MACs, 64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(ml2_blk0_ma_relu0): ReLU(0, 0.000% Params, 4.1 KMac, 0.011% MACs, inplace=True)
(ml2_blk0_ma_conv1): Conv2d(36.93 k, 5.255% Params, 2.36 MMac, 6.630% MACs, 64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(ml2_blk0_ma_bn1): BatchNorm2d(128, 0.018% Params, 8.19 KMac, 0.023% MACs, 64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(ml2_blk0_ds_conv0): Conv2d(2.11 k, 0.301% Params, 135.17 KMac, 0.379% MACs, 32, 64, kernel_size=(1, 1), stride=(2, 2))
(ml2_blk0_ds_bn0): BatchNorm2d(128, 0.018% Params, 8.19 KMac, 0.023% MACs, 64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(ml2_blk0_relu1): ReLU(0, 0.000% Params, 4.1 KMac, 0.011% MACs, inplace=True)
(ml2_blk1_ma_conv0): Conv2d(36.93 k, 5.255% Params, 2.36 MMac, 6.630% MACs, 64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(ml2_blk1_ma_bn0): BatchNorm2d(128, 0.018% Params, 8.19 KMac, 0.023% MACs, 64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(ml2_blk1_ma_relu0): ReLU(0, 0.000% Params, 4.1 KMac, 0.011% MACs, inplace=True)
(ml2_blk1_ma_conv1): Conv2d(36.93 k, 5.255% Params, 2.36 MMac, 6.630% MACs, 64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(ml2_blk1_ma_bn1): BatchNorm2d(128, 0.018% Params, 8.19 KMac, 0.023% MACs, 64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(ml2_blk1_relu1): ReLU(0, 0.000% Params, 4.1 KMac, 0.011% MACs, inplace=True)
(ml3_blk0_ma_conv0): Conv2d(73.86 k, 10.511% Params, 1.18 MMac, 3.315% MACs, 64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(ml3_blk0_ma_bn0): BatchNorm2d(256, 0.036% Params, 4.1 KMac, 0.011% MACs, 128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(ml3_blk0_ma_relu0): ReLU(0, 0.000% Params, 2.05 KMac, 0.006% MACs, inplace=True)
(ml3_blk0_ma_conv1): Conv2d(147.58 k, 21.003% Params, 2.36 MMac, 6.624% MACs, 128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(ml3_blk0_ma_bn1): BatchNorm2d(256, 0.036% Params, 4.1 KMac, 0.011% MACs, 128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(ml3_blk0_ds_conv0): Conv2d(8.32 k, 1.184% Params, 133.12 KMac, 0.373% MACs, 64, 128, kernel_size=(1, 1), stride=(2, 2))
(ml3_blk0_ds_bn0): BatchNorm2d(256, 0.036% Params, 4.1 KMac, 0.011% MACs, 128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(ml3_blk0_relu1): ReLU(0, 0.000% Params, 2.05 KMac, 0.006% MACs, inplace=True)
(ml3_blk1_ma_conv0): Conv2d(147.58 k, 21.003% Params, 2.36 MMac, 6.624% MACs, 128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(ml3_blk1_ma_bn0): BatchNorm2d(256, 0.036% Params, 4.1 KMac, 0.011% MACs, 128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(ml3_blk1_ma_relu0): ReLU(0, 0.000% Params, 2.05 KMac, 0.006% MACs, inplace=True)
(ml3_blk1_ma_conv1): Conv2d(147.58 k, 21.003% Params, 2.36 MMac, 6.624% MACs, 128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(ml3_blk1_ma_bn1): BatchNorm2d(256, 0.036% Params, 4.1 KMac, 0.011% MACs, 128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(ml3_blk1_relu1): ReLU(0, 0.000% Params, 2.05 KMac, 0.006% MACs, inplace=True)
(aap5): AdaptiveAvgPool2d(0, 0.000% Params, 2.05 KMac, 0.006% MACs, output_size=1)
(fc7): Linear(1.29 k, 0.184% Params, 1.29 KMac, 0.004% MACs, in_features=128, out_features=10, bias=True)
)
......@@ -5,7 +5,7 @@
# (TODO)
# Please modify job name
#SBATCH -J ALL-L2 # The job name
#SBATCH -J PTQ # The job name
#SBATCH -o ret/ret-%j.out # Write the standard output to file named 'ret-<job_number>.out'
#SBATCH -e ret/ret-%j.err # Write the standard error to file named 'ret-<job_number>.err'
......@@ -36,7 +36,8 @@
###
# set constraint for RTX8000 to meet my cuda
#SBATCH --constraint="Ampere|RTX8000|T4"
### #SBATCH --constraint="Ampere|RTX8000|T4"
#SBATCH --constraint="Ampere"
#- Log information
......@@ -82,7 +83,7 @@ echo "Use GPU ${CUDA_VISIBLE_DEVICES}" # which gpus
#- Job step
# [EDIT HERE(TODO)]
python ptq_L2.py
python ptq.py
#- End
echo "Job end at $(date "+%Y-%m-%d %H:%M:%S")"
......@@ -5,66 +5,98 @@ import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torchvision.transforms.functional import InterpolationMode
from torch.optim.lr_scheduler import CosineAnnealingLR
import os
import os.path as osp
import time
# import sys
def train(model, device, train_loader, optimizer, epoch):
model.train()
lossLayer = torch.nn.CrossEntropyLoss()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
total_loss = 0.
lossLayer = nn.CrossEntropyLoss()
start_time = time.time()
for batch_idx, (data, targets) in enumerate(train_loader):
data,targets = data.to(device), targets.to(device)
optimizer.zero_grad()
output = model(data)
loss = lossLayer(output, target)
loss = lossLayer(output, targets)
loss.backward()
total_loss += loss.item() * len(data)
optimizer.step()
if batch_idx % 50 == 0:
print('Train Epoch: {} [{}/{}]\tLoss: {:.6f}'.format(
epoch, batch_idx * len(data), len(train_loader.dataset), loss.item()
))
pred = output.argmax(dim=1, keepdim=True)
if batch_idx % 200 == 0 and batch_idx > 0:
cur_loss = total_loss / 200
elapsed = time.time() - start_time
lr = optimizer.param_groups[0]['lr']
print('| epoch {:3d} | {:5d}/{:5d} batches | lr {:02.4f} | ms/batch {:5.2f} | '
'loss {:5.2f}'.format(
epoch, batch_idx, len(train_loader.dataset) // len(data), lr,
elapsed * 1000 / 200, cur_loss))
total_loss = 0.
correct = 0
start_time = time.time()
def test(model, device, test_loader):
def evaluate(model, device, eval_loader):
model.eval()
test_loss = 0
total_loss = 0
correct = 0
lossLayer = torch.nn.CrossEntropyLoss(reduction='sum')
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
test_loss += lossLayer(output, target).item()
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
lossLayer = nn.CrossEntropyLoss()
with torch.no_grad():
for data, targets in eval_loader:
data,targets = data.to(device), targets.to(device)
output = model(data)
total_loss += len(data) * lossLayer(output, targets).item()
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(targets.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)
test_loss = total_loss / len(eval_loader.dataset)
test_acc = 100. * correct / len(eval_loader.dataset)
return test_loss,test_acc
print('\nTest set: Average loss: {:.4f}, Accuracy: {:.2f}%\n'.format(
test_loss, 100. * correct / len(test_loader.dataset)
))
epochs_cfg_table = {
'AlexNet' : [20, 30, 20, 20, 10],
'AlexNet_BN' : [15, 20, 20, 20, 10, 10],
'VGG_16' : [25, 30, 30, 20, 20, 10, 10],
'VGG_19' : [30, 40, 30, 20, 20, 10, 10],
'Inception_BN' : [20, 30, 30, 20, 20, 10, 10]
'Inception_BN' : [20, 30, 30, 20, 20, 10, 10],
'ResNet_18' : [30, 25, 25, 20, 10, 10],
'ResNet_50' : [30, 40, 35, 25, 15, 10, 10],
'ResNet_152' : [50, 60, 50, 40, 25, 15, 10, 10],
'MobileNetV2' : [25, 35, 30, 20, 10, 10],
}
lr_cfg_table = {
'AlexNet' : [0.01, 0.005, 0.001, 0.0005, 0.0001],
'AlexNet_BN' : [0.01, 0.005, 0.002, 0.001, 0.0005, 0.0001],
'VGG_16' : [0.01, 0.008, 0.005, 0.002, 0.001, 0.0005, 0.0001],
'VGG_19' : [0.01, 0.008, 0.005, 0.002, 0.001, 0.0005, 0.0001],
'Inception_BN' : [0.01, 0.008, 0.005, 0.002, 0.001, 0.0005, 0.0001]
'Inception_BN' : [0.01, 0.008, 0.005, 0.002, 0.001, 0.0005, 0.0001],
'ResNet_18' : [0.01, 0.005, 0.002, 0.001, 0.0005, 0.0001],
'ResNet_50' : [0.01, 0.008, 0.005, 0.002, 0.001, 0.0005, 0.0001],
'ResNet_152' : [0.01, 0.008, 0.005, 0.003, 0.002, 0.001, 0.0005, 0.0001],
'MobileNetV2' : [0.01, 0.008, 0.005, 0.002, 0.001, 0.0001],
}
if __name__ == "__main__":
# sys.stdout = open(sys.stdout.fileno(), mode='w', buffering=1)
batch_size = 32
seed = 1
momentum = 0.5
seed = 1111
seed_gpu = 1111
lr = 0.05 # origin lr
# momentum = 0.5
t_epochs = 300 #学习率衰减周期
patience = 30 #早停参数
save_model = True
append = True
torch.manual_seed(seed)
append = False
torch.manual_seed(seed)
torch.cuda.manual_seed(seed_gpu)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
train_loader = torch.utils.data.DataLoader(
......@@ -91,23 +123,49 @@ if __name__ == "__main__":
if not osp.exists('ckpt'):
os.makedirs('ckpt')
model_name_list = ['AlexNet', 'AlexNet_BN', 'VGG_16', 'VGG_19', 'Inception_BN']
# model_name_list = ['AlexNet', 'AlexNet_BN', 'VGG_16', 'VGG_19', 'Inception_BN',
# 'ResNet_18', 'ResNet_50', 'ResNet_152', 'MobileNetV2']
# model_name_list = ['ResNet_18', 'ResNet_50', 'ResNet_152', 'MobileNetV2']
model_name_list = ['ResNet_152']
for model_name in model_name_list:
save_path = 'ckpt/cifar10_'+model_name+'.pt'
if os.path.exists(save_path) and append:
continue
model = Model(model_name).to(device)
else:
print('>>>>>>>>>>>>>>>>>>>>>>>> Train: '+model_name+' <<<<<<<<<<<<<<<<<<<<<<<<')
model = Model(model_name).to(device)
epoch_start = 1
epochs_cfg = epochs_cfg_table[model_name]
lr_cfg = lr_cfg_table[model_name]
for epochs,lr in zip(epochs_cfg,lr_cfg):
optimizer = optim.SGD(model.parameters(), lr=lr, momentum=momentum)
epoch_end = epoch_start+epochs
for epoch in range(epoch_start,epoch_end):
best_val_acc = None
optimizer = optim.SGD(model.parameters(), lr=lr)
lr_scheduler = CosineAnnealingLR(optimizer, T_max=t_epochs)
weak_cnt = 0 # 弱于最佳精度的计数器
epoch = 0
while weak_cnt < patience:
epoch += 1
epoch_start_time = time.time()
train(model, device, train_loader, optimizer, epoch)
test(model, device, test_loader)
epoch_start += epochs
val_loss, val_acc = evaluate(model, device, test_loader)
if not best_val_acc or val_acc > best_val_acc:
best_val_acc = val_acc
weak_cnt = 0
if save_model:
torch.save(model.state_dict(), save_path)
else:
weak_cnt += 1
print('-' * 89)
print('| end of epoch {:3d} | time: {:5.2f}s | test loss {:5.2f} | '
'test acc {:.2f} | weak_cnt {:d}'.format(epoch, (time.time() - epoch_start_time),
val_loss, val_acc, weak_cnt))
print('-' * 89)
lr_scheduler.step()
print('>>> Early Stop: No improvement after patience(%d) epochs.'%patience)
if save_model:
torch.save(model.state_dict(), save_path)
\ No newline at end of file
model = Model(model_name).to(device)
model.load_state_dict(torch.load(save_path))
test_loss,test_acc = evaluate(model, device, test_loader)
print('=' * 89)
print('| Test on {:s} | test loss {:5.2f} | test acc {:.2f}'.format(
model_name, test_loss, test_acc))
print('=' * 89)
\ No newline at end of file
from model import *
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.optim.lr_scheduler import CosineAnnealingLR
import os
import os.path as osp
import time
import sys
def train(model, device, train_loader, optimizer, epoch):
model.train()
total_loss = 0.
lossLayer = nn.CrossEntropyLoss()
start_time = time.time()
for batch_idx, (data, targets) in enumerate(train_loader):
data,targets = data.to(device), targets.to(device)
optimizer.zero_grad()
output = model(data)
loss = lossLayer(output, targets)
loss.backward()
total_loss += loss.item()
optimizer.step()
if batch_idx % 50 == 0 and batch_idx > 0:
cur_loss = total_loss / 50
elapsed = time.time() - start_time
lr = optimizer.param_groups[0]['lr']
print('| epoch {:3d} | {:5d}/{:5d} batches | lr {:02.7f} | ms/batch {:5.2f} | '
'loss {:5.2f}'.format(
epoch, batch_idx, len(train_loader.dataset) // len(data), lr,
elapsed * 1000 / 50, cur_loss))
total_loss = 0.
start_time = time.time()
def evaluate(model, device, eval_loader):
model.eval()
total_loss = 0
correct = 0
lossLayer = nn.CrossEntropyLoss()
with torch.no_grad():
for data, targets in eval_loader:
data,targets = data.to(device), targets.to(device)
output = model(data)
total_loss += len(data) * lossLayer(output, targets).item()
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(targets.view_as(pred)).sum().item()
test_loss = total_loss / len(eval_loader.dataset)
test_acc = 100. * correct / len(eval_loader.dataset)
return test_loss,test_acc
if __name__ == "__main__":
# sys.stdout = open(sys.stdout.fileno(), mode='w', buffering=1)
batch_size = 128
optim_type = 'adam'
# optim_type = 'sgd'
if optim_type is 'adam':
lr = 0.001
opt_path = 'adam_lr'+str(lr).split('.')[-1]
elif optim_type is 'sgd':
lr = 0.01
momentum = 0.9
weight_decay = 1e-4
nesterov = True
opt_path = 'sgd_lr'+str(lr).split('.')[-1]+'_wd'+str(weight_decay).split('.')[-1]+'_ntr'
# lr = 0.001 # origin lr
# end_lr_bound = 0.00001 #早停时学习率应当低于该值,从而保证充分收敛
# momentum = 0.9
# weight_decay = 1e-4
# nesterov = True
t_epochs = 500 #学习率衰减周期
patience = 50 #早停参数
print(opt_path)
print('t_epoches:%d patience:%d'%(t_epochs,patience))
save_model = True
append = False
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
train_transfrom = transforms.Compose([
transforms.RandomCrop(32, padding=2),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize(
(0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
])
eval_transfrom = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465),
(0.2023, 0.1994, 0.2010))
])
alltrainset = datasets.CIFAR10(root='/lustre/datasets/CIFAR10',train=True,download=True,transform=train_transfrom)
train_size = (int)(0.8 * len(alltrainset))
val_size = (int)(0.2 * len(alltrainset))
train_idx, val_idx = torch.utils.data.random_split(range(train_size+val_size),[train_size,val_size])
trainset = torch.utils.data.Subset(alltrainset,train_idx)
valset = torch.utils.data.Subset(alltrainset,val_idx)
train_loader = torch.utils.data.DataLoader(
trainset,
batch_size=batch_size, shuffle=True, num_workers=4, pin_memory=True
)
val_loader = torch.utils.data.DataLoader(
valset,
batch_size=batch_size, shuffle=False, num_workers=4, pin_memory=True
)
test_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('/lustre/datasets/CIFAR10', train=False, download=False,
transform=eval_transfrom),
batch_size=batch_size, shuffle=False, num_workers=4, pin_memory=True
)
# ckpt_path = 'ckpt_sgd_1_momentum_9_ntb'
ckpt_path = 'ckpt_'+opt_path
if save_model:
if not osp.exists(ckpt_path):
os.makedirs(ckpt_path)
model_name = sys.argv[1]
save_path = ckpt_path+'/cifar10_'+model_name+'.pt'
if os.path.exists(save_path) and append:
pass
else:
print('>>>>>>>>>>>>>>>>>>>>>>>> Train: '+model_name+' <<<<<<<<<<<<<<<<<<<<<<<<')
model = Model(model_name).to(device)
best_val_acc = None
if 'adam' in opt_path:
optimizer = optim.Adam(model.parameters(), lr=lr)
elif 'sgd' in opt_path:
optimizer = optim.SGD(model.parameters(), lr=lr, momentum=momentum,weight_decay = weight_decay,nesterov=nesterov)
else:
raise ValueError('Illegal opttype')
lr_scheduler = CosineAnnealingLR(optimizer, T_max=t_epochs)
weak_cnt = 0 # 弱于最佳精度的计数器
epoch = 0
while True:
epoch += 1
epoch_start_time = time.time()
train(model, device, train_loader, optimizer, epoch)
val_loss, val_acc = evaluate(model, device, val_loader)
if not best_val_acc or val_acc > best_val_acc:
best_val_acc = val_acc
weak_cnt = 0
if save_model:
torch.save(model.state_dict(), save_path)
else:
weak_cnt += 1
print('-' * 89)
print('| end of epoch {:3d} | time: {:5.2f}s | test loss {:5.2f} | '
'test acc {:.2f} | weak_cnt {:d}'.format(epoch, (time.time() - epoch_start_time),
val_loss, val_acc, weak_cnt))
print('-' * 89)
lr_scheduler.step()
#这里额外限制早停时,学习率曾低于一个限度,保证充分训练
if weak_cnt >= patience:
break
# if optimizer.param_groups[0]['lr'] < end_lr_bound:
# break
# elif epoch > t_epochs: # 已经训练完一个周期,曾达到最小精度
# break
# else: #若多个epoch未提升,载入当前最佳精度模型继续训练
# print('>> Turn Back: No improvement after %d epochs, back to BestPoint'%patience)
# weak_cnt = 0
# model.load_state_dict(torch.load(save_path))
print('>>> Early Stop: No improvement after patience(%d) epochs'%patience)
# print('>>> Early Stop: No improvement after patience(%d) epochs And lr under bound(%f) '%(patience,end_lr_bound))
model = Model(model_name).to(device)
model.load_state_dict(torch.load(save_path))
test_loss,test_acc = evaluate(model, device, test_loader)
print('=' * 89)
print('| Test on {:s} | test loss {:5.2f} | test acc {:.2f}'.format(
model_name, test_loss, test_acc))
print('=' * 89)
\ No newline at end of file
#!/bin/bash
#- Job parameters
# (TODO)
# Please modify job name
#- Resources
# (TODO)
# Please modify your requirements
#SBATCH -p nv-gpu # Submit to 'nv-gpu' Partitiion
#SBATCH -t 1-06:00:00 # Run for a maximum time of 0 days, 12 hours, 00 mins, 00 secs
#SBATCH --nodes=1 # Request N nodes
#SBATCH --gres=gpu:1 # Request M GPU per node
#SBATCH --gres-flags=enforce-binding # CPU-GPU Affinity
#SBATCH --qos=gpu-normal # Request QOS Type
###
### The system will alloc 8 or 16 cores per gpu by default.
### If you need more or less, use following:
### #SBATCH --cpus-per-task=K # Request K cores
###
###
### Without specifying the constraint, any available nodes that meet the requirement will be allocated
### You can specify the characteristics of the compute nodes, and even the names of the compute nodes
###
### #SBATCH --nodelist=gpu-v00 # Request a specific list of hosts
### #SBATCH --constraint="Volta|RTX8000" # Request GPU Type: Volta(V100 or V100S) or RTX8000
###
# set constraint for RTX8000 to meet my cuda
#SBATCH --constraint="Ampere"
#- Log information
echo "Job start at $(date "+%Y-%m-%d %H:%M:%S")"
echo "Job run at:"
echo "$(hostnamectl)"
#- Load environments
source /tools/module_env.sh
module list # list modules loaded
##- Tools
module load cluster-tools/v1.0
module load slurm-tools/v1.0
module load cmake/3.15.7
module load git/2.17.1
module load vim/8.1.2424
##- language
module load python3/3.6.8
##- CUDA
# module load cuda-cudnn/10.2-7.6.5
# module load cuda-cudnn/11.2-8.2.1
module load cuda-cudnn/11.1-8.2.1
##- virtualenv
# source xxxxx/activate
echo $(module list) # list modules loaded
echo $(which gcc)
echo $(which python)
echo $(which python3)
cluster-quota # nas quota
nvidia-smi --format=csv --query-gpu=name,driver_version,power.limit # gpu info
#- Warning! Please not change your CUDA_VISIBLE_DEVICES
#- in `.bashrc`, `env.sh`, or your job script
echo "Use GPU ${CUDA_VISIBLE_DEVICES}" # which gpus
#- The CUDA_VISIBLE_DEVICES variable is assigned and specified by SLURM
#- Job step
# [EDIT HERE(TODO)]
python train_one.py $Model
#- End
echo "Job end at $(date "+%Y-%m-%d %H:%M:%S")"
......@@ -76,6 +76,7 @@ def fold_ratio(layer, par_ratio, flop_ratio):
[prefix,suffix] = name.split('conv')
bn_name = prefix+'bn'+suffix
relu_name = prefix+'relu'+suffix
relus_name = prefix+'relus'+suffix
if bn_name in layer:
bn_idx = layer.index(bn_name)
par_ratio[conv_idx]+=par_ratio[bn_idx]
......@@ -83,7 +84,11 @@ def fold_ratio(layer, par_ratio, flop_ratio):
if relu_name in layer:
relu_idx = layer.index(relu_name)
par_ratio[conv_idx]+=par_ratio[relu_idx]
flop_ratio[conv_idx]+=flop_ratio[bn_idx]
flop_ratio[conv_idx]+=flop_ratio[relu_idx]
elif relus_name in layer:
relus_idx = layer.index(relus_name)
par_ratio[conv_idx]+=par_ratio[relus_idx]
flop_ratio[conv_idx]+=flop_ratio[relus_idx]
return par_ratio,flop_ratio
def fold_model(model):
......
# ALL-cifar100量化
+ cfg最后输出层类别由cifar10的10改为100
+ 从训练集切分验证集
+ 训练器:Adam lr=0.0001
+ 全精度准确度
| AlexNet | AlexNet_BN | VGG_16 | VGG_19 | Inception_BN | ResNet_18 | ResNet_50 | ResNet_152 | MobileNetV2 |
| ------- | ---------- | ------ | ------ | ------------ | --------- | --------- | ---------- | ----------- |
| 56.88 | 61.60 | 63.29 | 60.84 | 68.44 | 43.26 | 37.1 | 38.56 | 50.3 |
+ 拟合结果:R2=0.7989
<img src="image\adam-ALL.png" alt="adam-ALL" />
\ No newline at end of file
# conv: 'C',''/'B'/'BRL'/'BRS',qi,in_ch,out_ch,kernel_size,stirde,padding,bias
# relu: 'RL'
# relu6: 'RS'
# inception: 'Inc'
# maxpool: 'MP',kernel_size,stride,padding
# adaptiveavgpool: 'AAP',output_size
# view: 'VW':
# dafault: x = x.view(x.size(0),-1)
# dropout: 'D'
# MakeLayer: 'ML','BBLK'/'BTNK'/'IRES', ml_idx, blocks
# softmax: 'SM'
# class 100
ResNet_18_cfg_table = [
['C','BRL',True,3,16,3,1,1,True],
['ML','BBLK',0,2],
['ML','BBLK',1,2],
['ML','BBLK',2,2],
['ML','BBLK',3,2],
['AAP',1],
['VW'],
['FC',128,100,True],
['SM']
]
ResNet_50_cfg_table = [
['C','BRL',True,3,16,3,1,1,True],
['ML','BTNK',0,3],
['ML','BTNK',1,4],
['ML','BTNK',2,6],
['ML','BTNK',3,3],
['AAP',1],
['VW'],
['FC',512,100,True],
['SM']
]
ResNet_152_cfg_table = [
['C','BRL',True,3,16,3,1,1,True],
['ML','BTNK',0,3],
['ML','BTNK',1,8],
['ML','BTNK',2,36],
['ML','BTNK',3,3],
['AAP',1],
['VW'],
['FC',512,100,True],
['SM']
]
MobileNetV2_cfg_table = [
['C','BRS',True,3,32,3,1,1,True],
['ML','IRES',0,1],
['ML','IRES',1,2],
['ML','IRES',2,3],
['ML','IRES',3,3],
['ML','IRES',4,3],
['ML','IRES',5,1],
['C','',False,320,1280,1,1,0,True],
['AAP',1],
['VW'],
['FC',1280,100,True]
]
AlexNet_cfg_table = [
['C','',True,3,32,3,1,1,True],
['RL'],
['MP',2,2,0],
['C','',False,32,64,3,1,1,True],
['RL'],
['MP',2,2,0],
['C','',False,64,128,3,1,1,True],
['RL'],
['C','',False,128,256,3,1,1,True],
['RL'],
['C','',False,256,256,3,1,1,True],
['RL'],
['MP',3,2,0],
['VW'],
['D',0.5],
['FC',2304,1024,True],
['RL'],
['D',0.5],
['FC',1024,512,True],
['RL'],
['FC',512,100,True]
]
AlexNet_BN_cfg_table = [
['C','BRL',True,3,32,3,1,1,True],
['MP',2,2,0],
['C','BRL',False,32,64,3,1,1,True],
['MP',2,2,0],
['C','BRL',False,64,128,3,1,1,True],
['C','BRL',False,128,256,3,1,1,True],
['C','BRL',False,256,256,3,1,1,True],
['MP',3,2,0],
['VW'],
['D',0.5],
['FC',2304,1024,True],
['RL'],
['D',0.5],
['FC',1024,512,True],
['RL'],
['FC',512,100,True]
]
VGG_16_cfg_table = [
['C','BRL',True,3,64,3,1,1,True],
['C','BRL',False,64,64,3,1,1,True],
['MP',2,2,0],
['C','BRL',False,64,128,3,1,1,True],
['C','BRL',False,128,128,3,1,1,True],
['MP',2,2,0],
['C','BRL',False,128,256,3,1,1,True],
['C','BRL',False,256,256,3,1,1,True],
['C','BRL',False,256,256,3,1,1,True],
['MP',2,2,0],
['C','BRL',False,256,512,3,1,1,True],
['C','BRL',False,512,512,3,1,1,True],
['C','BRL',False,512,512,3,1,1,True],
['MP',2,2,0],
['C','BRL',False,512,512,3,1,1,True],
['C','BRL',False,512,512,3,1,1,True],
['C','BRL',False,512,512,3,1,1,True],
['MP',2,2,0],
['VW'],
['FC',512,4096,True],
['RL'],
['D',0.5],
['FC',4096,4096,True],
['RL'],
['D',0.5],
['FC',4096,100,True]
]
VGG_19_cfg_table = [
['C','BRL',True,3,64,3,1,1,True],
['C','BRL',False,64,64,3,1,1,True],
['MP',2,2,0],
['C','BRL',False,64,128,3,1,1,True],
['C','BRL',False,128,128,3,1,1,True],
['MP',2,2,0],
['C','BRL',False,128,256,3,1,1,True],
['C','BRL',False,256,256,3,1,1,True],
['C','BRL',False,256,256,3,1,1,True],
['C','BRL',False,256,256,3,1,1,True],
['MP',2,2,0],
['C','BRL',False,256,512,3,1,1,True],
['C','BRL',False,512,512,3,1,1,True],
['C','BRL',False,512,512,3,1,1,True],
['C','BRL',False,512,512,3,1,1,True],
['MP',2,2,0],
['C','BRL',False,512,512,3,1,1,True],
['C','BRL',False,512,512,3,1,1,True],
['C','BRL',False,512,512,3,1,1,True],
['C','BRL',False,512,512,3,1,1,True],
['MP',2,2,0],
['VW'],
['FC',512,4096,True],
['RL'],
['D',0.5],
['FC',4096,4096,True],
['RL'],
['D',0.5],
['FC',4096,100,True]
]
Inception_BN_cfg_table = [
['C','',True,3,64,3,1,1,True],
['RL'],
['C','',False,64,64,3,1,1,True],
['RL'],
['Inc',0],
['Inc',1],
['MP',3,2,1],
['Inc',2],
['Inc',3],
['Inc',4],
['Inc',5],
['Inc',6],
['MP',3,2,1],
['Inc',7],
['Inc',8],
['AAP',1],
['C','',False,1024,100,1,1,0,True],
['VW']
]
model_cfg_table = {
'AlexNet' : AlexNet_cfg_table,
'AlexNet_BN' : AlexNet_BN_cfg_table,
'VGG_16' : VGG_16_cfg_table,
'VGG_19' : VGG_19_cfg_table,
'Inception_BN' : Inception_BN_cfg_table,
'ResNet_18' : ResNet_18_cfg_table,
'ResNet_50' : ResNet_50_cfg_table,
'ResNet_152' : ResNet_152_cfg_table,
'MobileNetV2' : MobileNetV2_cfg_table
}
#每行对应一个Inc结构(channel)的参数表
inc_ch_table=[
[ 64, 64, 96,128, 16, 32, 32],#3a
[256,128,128,192, 32, 96, 64],#3b
[480,192, 96,208, 16, 48, 64],#4a
[512,160,112,224, 24, 64, 64],#4b
[512,128,128,256, 24, 64, 64],#4c
[512,112,144,288, 32, 64, 64],#4d
[528,256,160,320, 32,128,128],#4e
[832,256,160,320, 32,128,128],#5a
[832,384,192,384, 48,128,128] #5b
]
# br0,br1,br2,br3 <- br1x1,br3x3,br5x5,brM
# 每个子数组对应Inc结构中一个分支的结构,均默认含'BRL'参数,bias为False
# Conv层第2、3个参数是对应Inc结构(即ch_table中的一行)中的索引
# 由于每个Inc结构操作一致,只有权重不同,使用索引而非具体值,方便复用
# 各分支后还有Concat操作,由于只有唯一结构,未特殊说明
# conv: 'C', ('BRL' default), in_ch_idex, out_ch_idx, kernel_size, stride, padding, (bias: True default)
# maxpool: 'MP', kernel_size, stride, padding
# relu: 'RL'
inc_cfg_table = [
[
['C',0,1,1,1,0]
],
[
['C',0,2,1,1,0],
['C',2,3,3,1,1]
],
[
['C',0,4,1,1,0],
['C',4,5,5,1,2]
],
[
['MP',3,1,1],
['RL'],
['C',0,6,1,1,0]
]
]
# ml_cfg_table = []
#BasicBlock
#value: downsample,inplanes,planes,planes*expansion,stride,1(dafault stride and group)
bblk_ch_table = [
[False, 16, 16, 16,1,1], #layer1,first
[False, 16, 16, 16,1,1], # other
[True, 16, 32, 32,2,1], #layer2
[False, 32, 32, 32,1,1],
[True, 32, 64, 64,2,1], #layer3
[False, 64, 64, 64,1,1],
[True, 64,128,128,2,1], #layer4
[False,128,128,128,1,1]
]
#conv: 'C','B'/'BRL'/'BRS', in_ch_idx, out_ch_idx, kernel_sz, stride_idx, padding, groups_idx (bias: True default)
#add: 'AD', unconditonal. unconditonal为true或flag为true时将outs中两元素相加
bblk_cfg_table = [
[
['C','BRL',1,2,3,4,1,5],
['C','B' ,2,2,3,5,1,5],
],
# downsample, 仅当downsample传入为True时使用
[
['C','B' ,1,3,1,4,0,5]
],
# 分支交汇后动作
[
['AD',True],
['RL']
]
]
#BottleNeck
#value: downsample,inplanes,planes,planes*expansion,stride,1(dafault stride and group)
btnk_ch_table = [
[True, 16, 16, 64,1,1], #layer1,first
[False, 64, 16, 64,1,1], # other
[True, 64, 32,128,2,1], #layer2
[False,128, 32,128,1,1],
[True, 128, 64,256,2,1], #layer3
[False,256, 64,256,1,1],
[True, 256,128,512,2,1], #layer4
[False,512,128,512,1,1]
]
#conv: 'C','B'/'BRL'/'BRS', in_ch_idx, out_ch_idx, kernel_sz, stride_idx, padding, groups_idx (bias: True default)
#add: 'AD', unconditonal. unconditonal为true或flag为true时将outs中两元素相加
btnk_cfg_table = [
[
['C','BRL',1,2,1,5,0,5],
['C','BRL',2,2,3,4,1,5],
['C','B' ,2,3,1,5,0,5]
],
# downsample, 仅当downsample传入为True时使用
[
['C','B' ,1,3,1,4,0,5]
],
# 分支交汇后动作
[
['AD',True],
['RL']
]
]
#InvertedResidual
#value: identity_flag, in_ch, out_ch, in_ch*expand_ratio, stride, 1(dafault stride and group)
ires_ch_table = [
[False, 32, 16, 32,1,1], #layer1,first
[ True, 16, 16, 16,1,1], # other
[False, 16, 24, 96,2,1], #layer2
[ True, 24, 24, 144,1,1],
[False, 24, 32, 144,2,1], #layer3
[ True, 32, 32, 192,1,1],
[False, 32, 96, 192,1,1], #layer4
[ True, 96, 96, 576,1,1],
[False, 96,160, 576,2,1], #layer5
[ True,160,160, 960,1,1],
[False,160,320, 960,1,1], #layer6
[ True,320,320,1920,1,1]
]
#conv: 'C','B'/'BRL'/'BRS', in_ch_idx, out_ch_idx, kernel_sz, stride_idx, padding, groups_idx (bias: True default)
#add: 'AD', unconditonal. unconditonal为true或flag为true时将outs中两元素相加
ires_cfg_table = [
[
['C','BRS',1,3,1,5,0,5],
['C','BRS',3,3,3,4,1,3],
['C','B' ,3,2,1,5,0,5]
],
# identity_br empty
[
],
# 分支汇合后操作
[
['AD',False] #有条件的相加
]
]
\ No newline at end of file
import sys
import os
# 从get_param.py输出重定向文件val.txt中提取参数量和计算量
def extract_ratio(model_name):
fr = open('param_flops/'+model_name+'.txt','r')
lines = fr.readlines()
#跳过warning
for i in range(len(lines)):
if 'Model' in lines[i]:
head = i+1
break
Mac = lines[head].split('Mac,')[0].split(',')[-1]
if 'M' in Mac:
Mac = Mac.split('M')[0]
Mac = float(Mac)
elif 'G' in Mac:
Mac = Mac.split('G')[0]
Mac = float(Mac)
Mac *= 1024
Param = lines[head].split(',')[0]
if 'M' in Param:
Param = Param.split('M')[0]
Param = float(Param)
elif 'k' in Param:
Param = Param.split('k')[0]
Param = float(Param)
Param /= 1024
layer = []
par_ratio = []
flop_ratio = []
for line in lines:
if '(' in line and ')' in line:
layer.append(line.split(')')[0].split('(')[1])
r1 = line.split('%')[0].split(',')[-1]
r1 = float(r1)
par_ratio.append(r1)
r2 = line.split('%')[-2].split(',')[-1]
r2 = float(r2)
flop_ratio.append(r2)
return Mac, Param, layer, par_ratio, flop_ratio
if __name__ == "__main__":
Mac, Param, layer, par_ratio, flop_ratio = extract_ratio('Inception_BN')
print(Mac)
print(Param)
print(layer)
print(par_ratio)
print(flop_ratio)
\ No newline at end of file
from torch.autograd import Function
class FakeQuantize(Function):
@staticmethod
def forward(ctx, x, qparam):
x = qparam.quantize_tensor(x)
x = qparam.dequantize_tensor(x)
return x
@staticmethod
def backward(ctx, grad_output):
return grad_output, None
\ No newline at end of file
from model import *
import sys
import torch
from ptflops import get_model_complexity_info
if __name__ == "__main__":
model_name = sys.argv[1]
model = Model(model_name)
# full_file = 'ckpt/cifar10_'+model_name+'.pt'
# model.load_state_dict(torch.load(full_file))
flops, params = get_model_complexity_info(model, (3, 32, 32), as_strings=True, print_per_layer_stat=True)
#!/bin/bash
#- Job parameters
# (TODO)
# Please modify job name
#SBATCH -J ALL # The job name
#SBATCH -o ret/ret-%j.out # Write the standard output to file named 'ret-<job_number>.out'
#SBATCH -e ret/ret-%j.err # Write the standard error to file named 'ret-<job_number>.err'
#- Resources
# (TODO)
# Please modify your requirements
#SBATCH -p nv-gpu # Submit to 'nv-gpu' Partitiion
#SBATCH -t 0-01:30:00 # Run for a maximum time of 0 days, 12 hours, 00 mins, 00 secs
#SBATCH --nodes=1 # Request N nodes
#SBATCH --gres=gpu:1 # Request M GPU per node
#SBATCH --gres-flags=enforce-binding # CPU-GPU Affinity
#SBATCH --qos=gpu-debug # Request QOS Type
###
### The system will alloc 8 or 16 cores per gpu by default.
### If you need more or less, use following:
### #SBATCH --cpus-per-task=K # Request K cores
###
###
### Without specifying the constraint, any available nodes that meet the requirement will be allocated
### You can specify the characteristics of the compute nodes, and even the names of the compute nodes
###
### #SBATCH --nodelist=gpu-v00 # Request a specific list of hosts
### #SBATCH --constraint="Volta|RTX8000" # Request GPU Type: Volta(V100 or V100S) or RTX8000
###
# set constraint for RTX8000 to meet my cuda
#SBATCH --constraint="Ampere|RTX8000|T4"
#- Log information
echo "Job start at $(date "+%Y-%m-%d %H:%M:%S")"
echo "Job run at:"
echo "$(hostnamectl)"
#- Load environments
source /tools/module_env.sh
module list # list modules loaded
##- Tools
module load cluster-tools/v1.0
module load slurm-tools/v1.0
module load cmake/3.15.7
module load git/2.17.1
module load vim/8.1.2424
##- language
module load python3/3.6.8
##- CUDA
# module load cuda-cudnn/10.2-7.6.5
# module load cuda-cudnn/11.2-8.2.1
module load cuda-cudnn/11.1-8.2.1
##- virtualenv
# source xxxxx/activate
echo $(module list) # list modules loaded
echo $(which gcc)
echo $(which python)
echo $(which python3)
cluster-quota # nas quota
nvidia-smi --format=csv --query-gpu=name,driver_version,power.limit # gpu info
#- Warning! Please not change your CUDA_VISIBLE_DEVICES
#- in `.bashrc`, `env.sh`, or your job script
echo "Use GPU ${CUDA_VISIBLE_DEVICES}" # which gpus
#- The CUDA_VISIBLE_DEVICES variable is assigned and specified by SLURM
#- Job step
# [EDIT HERE(TODO)]
name_list="AlexNet AlexNet_BN VGG_16 VGG_19 Inception_BN ResNet_18 ResNet_50 ResNet_152 MobileNetV2"
# name_list="MobileNetV2"
for name in $name_list; do
if [ -f "param_flops/$name.txt" ];then
echo "$name: param_flops exists"
# elif [ ! -f "ckpt/cifar10_$name.pt" ];then
# echo "$name: ckpt not exists"
else
python get_param_flops.py $name > param_flops/$name.txt
fi
done
#- End
echo "Job end at $(date "+%Y-%m-%d %H:%M:%S")"
# -*- coding: utf-8 -*-
# 用于多个module之间共享全局变量
def _init(): # 初始化
global _global_dict
_global_dict = {}
def set_value(value,is_bias=False):
# 定义一个全局变量
if is_bias:
_global_dict[0] = value
else:
_global_dict[1] = value
def get_value(is_bias=False): # 给bias独立于各变量外的精度
if is_bias:
return _global_dict[0]
else:
return _global_dict[1]
import torch.nn as nn
from cfg import *
from module import *
from model_deployment import *
class Model(nn.Module):
def __init__(self,model_name):
super(Model, self).__init__()
self.cfg_table = model_cfg_table[model_name]
make_layers(self,self.cfg_table)
# # 参数初始化
# for m in self.modules():
# if isinstance(m, nn.Conv2d):
# nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
# elif isinstance(m, nn.BatchNorm2d):
# nn.init.constant_(m.weight, 1)
# nn.init.constant_(m.bias, 0)
# elif isinstance(m, nn.Linear):
# nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
def forward(self,x):
x = model_forward(self,self.cfg_table,x)
return x
def quantize(self, quant_type, num_bits=8, e_bits=3):
model_quantize(self,self.cfg_table,quant_type,num_bits,e_bits)
def quantize_forward(self,x):
return model_utils(self,self.cfg_table,func='forward',x=x)
def freeze(self):
model_utils(self,self.cfg_table,func='freeze')
def quantize_inference(self,x):
return model_utils(self,self.cfg_table,func='inference',x=x)
def fakefreeze(self):
model_utils(self,self.cfg_table,func='fakefreeze')
# if __name__ == "__main__":
# model = Inception_BN()
# model.quantize('INT',8,3)
# print(model.named_modules)
# print('-------')
# print(model.named_parameters)
# print(len(model.conv0.named_parameters()))
\ No newline at end of file
Model(
3.91 M, 100.000% Params, 70.13 MMac, 100.000% MACs,
(conv0): Conv2d(896, 0.023% Params, 917.5 KMac, 1.308% MACs, 3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu1): ReLU(0, 0.000% Params, 32.77 KMac, 0.047% MACs, inplace=True)
(pool2): MaxPool2d(0, 0.000% Params, 32.77 KMac, 0.047% MACs, kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv3): Conv2d(18.5 k, 0.472% Params, 4.73 MMac, 6.752% MACs, 32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu4): ReLU(0, 0.000% Params, 16.38 KMac, 0.023% MACs, inplace=True)
(pool5): MaxPool2d(0, 0.000% Params, 16.38 KMac, 0.023% MACs, kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv6): Conv2d(73.86 k, 1.887% Params, 4.73 MMac, 6.740% MACs, 64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu7): ReLU(0, 0.000% Params, 8.19 KMac, 0.012% MACs, inplace=True)
(conv8): Conv2d(295.17 k, 7.540% Params, 18.89 MMac, 26.937% MACs, 128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu9): ReLU(0, 0.000% Params, 16.38 KMac, 0.023% MACs, inplace=True)
(conv10): Conv2d(590.08 k, 15.073% Params, 37.77 MMac, 53.851% MACs, 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu11): ReLU(0, 0.000% Params, 16.38 KMac, 0.023% MACs, inplace=True)
(pool12): MaxPool2d(0, 0.000% Params, 16.38 KMac, 0.023% MACs, kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
(drop14): Dropout(0, 0.000% Params, 0.0 Mac, 0.000% MACs, p=0.5, inplace=False)
(fc15): Linear(2.36 M, 60.290% Params, 2.36 MMac, 3.366% MACs, in_features=2304, out_features=1024, bias=True)
(relu16): ReLU(0, 0.000% Params, 1.02 KMac, 0.001% MACs, inplace=True)
(drop17): Dropout(0, 0.000% Params, 0.0 Mac, 0.000% MACs, p=0.5, inplace=False)
(fc18): Linear(524.8 k, 13.405% Params, 524.8 KMac, 0.748% MACs, in_features=1024, out_features=512, bias=True)
(relu19): ReLU(0, 0.000% Params, 512.0 Mac, 0.001% MACs, inplace=True)
(fc20): Linear(51.3 k, 1.310% Params, 51.3 KMac, 0.073% MACs, in_features=512, out_features=100, bias=True)
)
Model(
3.92 M, 100.000% Params, 70.31 MMac, 100.000% MACs,
(conv0): Conv2d(896, 0.023% Params, 917.5 KMac, 1.305% MACs, 3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn0): BatchNorm2d(64, 0.002% Params, 65.54 KMac, 0.093% MACs, 32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu0): ReLU(0, 0.000% Params, 32.77 KMac, 0.047% MACs, inplace=True)
(pool1): MaxPool2d(0, 0.000% Params, 32.77 KMac, 0.047% MACs, kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv2): Conv2d(18.5 k, 0.472% Params, 4.73 MMac, 6.735% MACs, 32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn2): BatchNorm2d(128, 0.003% Params, 32.77 KMac, 0.047% MACs, 64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu2): ReLU(0, 0.000% Params, 16.38 KMac, 0.023% MACs, inplace=True)
(pool3): MaxPool2d(0, 0.000% Params, 16.38 KMac, 0.023% MACs, kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv4): Conv2d(73.86 k, 1.886% Params, 4.73 MMac, 6.723% MACs, 64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn4): BatchNorm2d(256, 0.007% Params, 16.38 KMac, 0.023% MACs, 128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu4): ReLU(0, 0.000% Params, 8.19 KMac, 0.012% MACs, inplace=True)
(conv5): Conv2d(295.17 k, 7.537% Params, 18.89 MMac, 26.868% MACs, 128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn5): BatchNorm2d(512, 0.013% Params, 32.77 KMac, 0.047% MACs, 256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu5): ReLU(0, 0.000% Params, 16.38 KMac, 0.023% MACs, inplace=True)
(conv6): Conv2d(590.08 k, 15.067% Params, 37.77 MMac, 53.713% MACs, 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn6): BatchNorm2d(512, 0.013% Params, 32.77 KMac, 0.047% MACs, 256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu6): ReLU(0, 0.000% Params, 16.38 KMac, 0.023% MACs, inplace=True)
(pool7): MaxPool2d(0, 0.000% Params, 16.38 KMac, 0.023% MACs, kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
(drop9): Dropout(0, 0.000% Params, 0.0 Mac, 0.000% MACs, p=0.5, inplace=False)
(fc10): Linear(2.36 M, 60.268% Params, 2.36 MMac, 3.357% MACs, in_features=2304, out_features=1024, bias=True)
(relu11): ReLU(0, 0.000% Params, 1.02 KMac, 0.001% MACs, inplace=True)
(drop12): Dropout(0, 0.000% Params, 0.0 Mac, 0.000% MACs, p=0.5, inplace=False)
(fc13): Linear(524.8 k, 13.400% Params, 524.8 KMac, 0.746% MACs, in_features=1024, out_features=512, bias=True)
(relu14): ReLU(0, 0.000% Params, 512.0 Mac, 0.001% MACs, inplace=True)
(fc15): Linear(51.3 k, 1.310% Params, 51.3 KMac, 0.073% MACs, in_features=512, out_features=100, bias=True)
)
Model(
714.28 k, 100.000% Params, 35.66 MMac, 100.000% MACs,
(conv0): Conv2d(448, 0.063% Params, 458.75 KMac, 1.286% MACs, 3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn0): BatchNorm2d(32, 0.004% Params, 32.77 KMac, 0.092% MACs, 16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu0): ReLU(0, 0.000% Params, 16.38 KMac, 0.046% MACs, inplace=True)
(ml0_blk0_ma_conv0): Conv2d(2.32 k, 0.325% Params, 2.38 MMac, 6.662% MACs, 16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(ml0_blk0_ma_bn0): BatchNorm2d(32, 0.004% Params, 32.77 KMac, 0.092% MACs, 16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(ml0_blk0_ma_relu0): ReLU(0, 0.000% Params, 16.38 KMac, 0.046% MACs, inplace=True)
(ml0_blk0_ma_conv1): Conv2d(2.32 k, 0.325% Params, 2.38 MMac, 6.662% MACs, 16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(ml0_blk0_ma_bn1): BatchNorm2d(32, 0.004% Params, 32.77 KMac, 0.092% MACs, 16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(ml0_blk0_relu1): ReLU(0, 0.000% Params, 16.38 KMac, 0.046% MACs, inplace=True)
(ml0_blk1_ma_conv0): Conv2d(2.32 k, 0.325% Params, 2.38 MMac, 6.662% MACs, 16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(ml0_blk1_ma_bn0): BatchNorm2d(32, 0.004% Params, 32.77 KMac, 0.092% MACs, 16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(ml0_blk1_ma_relu0): ReLU(0, 0.000% Params, 16.38 KMac, 0.046% MACs, inplace=True)
(ml0_blk1_ma_conv1): Conv2d(2.32 k, 0.325% Params, 2.38 MMac, 6.662% MACs, 16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(ml0_blk1_ma_bn1): BatchNorm2d(32, 0.004% Params, 32.77 KMac, 0.092% MACs, 16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(ml0_blk1_relu1): ReLU(0, 0.000% Params, 16.38 KMac, 0.046% MACs, inplace=True)
(ml1_blk0_ma_conv0): Conv2d(4.64 k, 0.650% Params, 1.19 MMac, 3.331% MACs, 16, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(ml1_blk0_ma_bn0): BatchNorm2d(64, 0.009% Params, 16.38 KMac, 0.046% MACs, 32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(ml1_blk0_ma_relu0): ReLU(0, 0.000% Params, 8.19 KMac, 0.023% MACs, inplace=True)
(ml1_blk0_ma_conv1): Conv2d(9.25 k, 1.295% Params, 2.37 MMac, 6.639% MACs, 32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(ml1_blk0_ma_bn1): BatchNorm2d(64, 0.009% Params, 16.38 KMac, 0.046% MACs, 32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(ml1_blk0_ds_conv0): Conv2d(544, 0.076% Params, 139.26 KMac, 0.391% MACs, 16, 32, kernel_size=(1, 1), stride=(2, 2))
(ml1_blk0_ds_bn0): BatchNorm2d(64, 0.009% Params, 16.38 KMac, 0.046% MACs, 32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(ml1_blk0_relu1): ReLU(0, 0.000% Params, 8.19 KMac, 0.023% MACs, inplace=True)
(ml1_blk1_ma_conv0): Conv2d(9.25 k, 1.295% Params, 2.37 MMac, 6.639% MACs, 32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(ml1_blk1_ma_bn0): BatchNorm2d(64, 0.009% Params, 16.38 KMac, 0.046% MACs, 32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(ml1_blk1_ma_relu0): ReLU(0, 0.000% Params, 8.19 KMac, 0.023% MACs, inplace=True)
(ml1_blk1_ma_conv1): Conv2d(9.25 k, 1.295% Params, 2.37 MMac, 6.639% MACs, 32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(ml1_blk1_ma_bn1): BatchNorm2d(64, 0.009% Params, 16.38 KMac, 0.046% MACs, 32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(ml1_blk1_relu1): ReLU(0, 0.000% Params, 8.19 KMac, 0.023% MACs, inplace=True)
(ml2_blk0_ma_conv0): Conv2d(18.5 k, 2.589% Params, 1.18 MMac, 3.319% MACs, 32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(ml2_blk0_ma_bn0): BatchNorm2d(128, 0.018% Params, 8.19 KMac, 0.023% MACs, 64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(ml2_blk0_ma_relu0): ReLU(0, 0.000% Params, 4.1 KMac, 0.011% MACs, inplace=True)
(ml2_blk0_ma_conv1): Conv2d(36.93 k, 5.170% Params, 2.36 MMac, 6.627% MACs, 64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(ml2_blk0_ma_bn1): BatchNorm2d(128, 0.018% Params, 8.19 KMac, 0.023% MACs, 64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(ml2_blk0_ds_conv0): Conv2d(2.11 k, 0.296% Params, 135.17 KMac, 0.379% MACs, 32, 64, kernel_size=(1, 1), stride=(2, 2))
(ml2_blk0_ds_bn0): BatchNorm2d(128, 0.018% Params, 8.19 KMac, 0.023% MACs, 64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(ml2_blk0_relu1): ReLU(0, 0.000% Params, 4.1 KMac, 0.011% MACs, inplace=True)
(ml2_blk1_ma_conv0): Conv2d(36.93 k, 5.170% Params, 2.36 MMac, 6.627% MACs, 64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(ml2_blk1_ma_bn0): BatchNorm2d(128, 0.018% Params, 8.19 KMac, 0.023% MACs, 64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(ml2_blk1_ma_relu0): ReLU(0, 0.000% Params, 4.1 KMac, 0.011% MACs, inplace=True)
(ml2_blk1_ma_conv1): Conv2d(36.93 k, 5.170% Params, 2.36 MMac, 6.627% MACs, 64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(ml2_blk1_ma_bn1): BatchNorm2d(128, 0.018% Params, 8.19 KMac, 0.023% MACs, 64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(ml2_blk1_relu1): ReLU(0, 0.000% Params, 4.1 KMac, 0.011% MACs, inplace=True)
(ml3_blk0_ma_conv0): Conv2d(73.86 k, 10.340% Params, 1.18 MMac, 3.314% MACs, 64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(ml3_blk0_ma_bn0): BatchNorm2d(256, 0.036% Params, 4.1 KMac, 0.011% MACs, 128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(ml3_blk0_ma_relu0): ReLU(0, 0.000% Params, 2.05 KMac, 0.006% MACs, inplace=True)
(ml3_blk0_ma_conv1): Conv2d(147.58 k, 20.662% Params, 2.36 MMac, 6.622% MACs, 128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(ml3_blk0_ma_bn1): BatchNorm2d(256, 0.036% Params, 4.1 KMac, 0.011% MACs, 128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(ml3_blk0_ds_conv0): Conv2d(8.32 k, 1.165% Params, 133.12 KMac, 0.373% MACs, 64, 128, kernel_size=(1, 1), stride=(2, 2))
(ml3_blk0_ds_bn0): BatchNorm2d(256, 0.036% Params, 4.1 KMac, 0.011% MACs, 128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(ml3_blk0_relu1): ReLU(0, 0.000% Params, 2.05 KMac, 0.006% MACs, inplace=True)
(ml3_blk1_ma_conv0): Conv2d(147.58 k, 20.662% Params, 2.36 MMac, 6.622% MACs, 128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(ml3_blk1_ma_bn0): BatchNorm2d(256, 0.036% Params, 4.1 KMac, 0.011% MACs, 128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(ml3_blk1_ma_relu0): ReLU(0, 0.000% Params, 2.05 KMac, 0.006% MACs, inplace=True)
(ml3_blk1_ma_conv1): Conv2d(147.58 k, 20.662% Params, 2.36 MMac, 6.622% MACs, 128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(ml3_blk1_ma_bn1): BatchNorm2d(256, 0.036% Params, 4.1 KMac, 0.011% MACs, 128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(ml3_blk1_relu1): ReLU(0, 0.000% Params, 2.05 KMac, 0.006% MACs, inplace=True)
(aap5): AdaptiveAvgPool2d(0, 0.000% Params, 2.05 KMac, 0.006% MACs, output_size=1)
(fc7): Linear(12.9 k, 1.806% Params, 12.9 KMac, 0.036% MACs, in_features=128, out_features=100, bias=True)
)
Model(
34.02 M, 100.000% Params, 333.73 MMac, 100.000% MACs,
(conv0): Conv2d(1.79 k, 0.005% Params, 1.84 MMac, 0.550% MACs, 3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn0): BatchNorm2d(128, 0.000% Params, 131.07 KMac, 0.039% MACs, 64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu0): ReLU(0, 0.000% Params, 65.54 KMac, 0.020% MACs, inplace=True)
(conv1): Conv2d(36.93 k, 0.109% Params, 37.81 MMac, 11.331% MACs, 64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn1): BatchNorm2d(128, 0.000% Params, 131.07 KMac, 0.039% MACs, 64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu1): ReLU(0, 0.000% Params, 65.54 KMac, 0.020% MACs, inplace=True)
(pool2): MaxPool2d(0, 0.000% Params, 65.54 KMac, 0.020% MACs, kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv3): Conv2d(73.86 k, 0.217% Params, 18.91 MMac, 5.665% MACs, 64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn3): BatchNorm2d(256, 0.001% Params, 65.54 KMac, 0.020% MACs, 128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu3): ReLU(0, 0.000% Params, 32.77 KMac, 0.010% MACs, inplace=True)
(conv4): Conv2d(147.58 k, 0.434% Params, 37.78 MMac, 11.321% MACs, 128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn4): BatchNorm2d(256, 0.001% Params, 65.54 KMac, 0.020% MACs, 128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu4): ReLU(0, 0.000% Params, 32.77 KMac, 0.010% MACs, inplace=True)
(pool5): MaxPool2d(0, 0.000% Params, 32.77 KMac, 0.010% MACs, kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv6): Conv2d(295.17 k, 0.868% Params, 18.89 MMac, 5.661% MACs, 128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn6): BatchNorm2d(512, 0.002% Params, 32.77 KMac, 0.010% MACs, 256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu6): ReLU(0, 0.000% Params, 16.38 KMac, 0.005% MACs, inplace=True)
(conv7): Conv2d(590.08 k, 1.735% Params, 37.77 MMac, 11.316% MACs, 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn7): BatchNorm2d(512, 0.002% Params, 32.77 KMac, 0.010% MACs, 256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu7): ReLU(0, 0.000% Params, 16.38 KMac, 0.005% MACs, inplace=True)
(conv8): Conv2d(590.08 k, 1.735% Params, 37.77 MMac, 11.316% MACs, 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn8): BatchNorm2d(512, 0.002% Params, 32.77 KMac, 0.010% MACs, 256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu8): ReLU(0, 0.000% Params, 16.38 KMac, 0.005% MACs, inplace=True)
(pool9): MaxPool2d(0, 0.000% Params, 16.38 KMac, 0.005% MACs, kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv10): Conv2d(1.18 M, 3.469% Params, 18.88 MMac, 5.658% MACs, 256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn10): BatchNorm2d(1.02 k, 0.003% Params, 16.38 KMac, 0.005% MACs, 512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu10): ReLU(0, 0.000% Params, 8.19 KMac, 0.002% MACs, inplace=True)
(conv11): Conv2d(2.36 M, 6.937% Params, 37.76 MMac, 11.314% MACs, 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn11): BatchNorm2d(1.02 k, 0.003% Params, 16.38 KMac, 0.005% MACs, 512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu11): ReLU(0, 0.000% Params, 8.19 KMac, 0.002% MACs, inplace=True)
(conv12): Conv2d(2.36 M, 6.937% Params, 37.76 MMac, 11.314% MACs, 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn12): BatchNorm2d(1.02 k, 0.003% Params, 16.38 KMac, 0.005% MACs, 512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu12): ReLU(0, 0.000% Params, 8.19 KMac, 0.002% MACs, inplace=True)
(pool13): MaxPool2d(0, 0.000% Params, 8.19 KMac, 0.002% MACs, kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv14): Conv2d(2.36 M, 6.937% Params, 9.44 MMac, 2.828% MACs, 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn14): BatchNorm2d(1.02 k, 0.003% Params, 4.1 KMac, 0.001% MACs, 512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu14): ReLU(0, 0.000% Params, 2.05 KMac, 0.001% MACs, inplace=True)
(conv15): Conv2d(2.36 M, 6.937% Params, 9.44 MMac, 2.828% MACs, 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn15): BatchNorm2d(1.02 k, 0.003% Params, 4.1 KMac, 0.001% MACs, 512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu15): ReLU(0, 0.000% Params, 2.05 KMac, 0.001% MACs, inplace=True)
(conv16): Conv2d(2.36 M, 6.937% Params, 9.44 MMac, 2.828% MACs, 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn16): BatchNorm2d(1.02 k, 0.003% Params, 4.1 KMac, 0.001% MACs, 512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu16): ReLU(0, 0.000% Params, 2.05 KMac, 0.001% MACs, inplace=True)
(pool17): MaxPool2d(0, 0.000% Params, 2.05 KMac, 0.001% MACs, kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(fc19): Linear(2.1 M, 6.177% Params, 2.1 MMac, 0.630% MACs, in_features=512, out_features=4096, bias=True)
(relu20): ReLU(0, 0.000% Params, 4.1 KMac, 0.001% MACs, inplace=True)
(drop21): Dropout(0, 0.000% Params, 0.0 Mac, 0.000% MACs, p=0.5, inplace=False)
(fc22): Linear(16.78 M, 49.334% Params, 16.78 MMac, 5.028% MACs, in_features=4096, out_features=4096, bias=True)
(relu23): ReLU(0, 0.000% Params, 4.1 KMac, 0.001% MACs, inplace=True)
(drop24): Dropout(0, 0.000% Params, 0.0 Mac, 0.000% MACs, p=0.5, inplace=False)
(fc25): Linear(409.7 k, 1.204% Params, 409.7 KMac, 0.123% MACs, in_features=4096, out_features=100, bias=True)
)
Model(
39.33 M, 100.000% Params, 418.77 MMac, 100.000% MACs,
(conv0): Conv2d(1.79 k, 0.005% Params, 1.84 MMac, 0.438% MACs, 3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn0): BatchNorm2d(128, 0.000% Params, 131.07 KMac, 0.031% MACs, 64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu0): ReLU(0, 0.000% Params, 65.54 KMac, 0.016% MACs, inplace=True)
(conv1): Conv2d(36.93 k, 0.094% Params, 37.81 MMac, 9.030% MACs, 64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn1): BatchNorm2d(128, 0.000% Params, 131.07 KMac, 0.031% MACs, 64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu1): ReLU(0, 0.000% Params, 65.54 KMac, 0.016% MACs, inplace=True)
(pool2): MaxPool2d(0, 0.000% Params, 65.54 KMac, 0.016% MACs, kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv3): Conv2d(73.86 k, 0.188% Params, 18.91 MMac, 4.515% MACs, 64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn3): BatchNorm2d(256, 0.001% Params, 65.54 KMac, 0.016% MACs, 128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu3): ReLU(0, 0.000% Params, 32.77 KMac, 0.008% MACs, inplace=True)
(conv4): Conv2d(147.58 k, 0.375% Params, 37.78 MMac, 9.022% MACs, 128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn4): BatchNorm2d(256, 0.001% Params, 65.54 KMac, 0.016% MACs, 128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu4): ReLU(0, 0.000% Params, 32.77 KMac, 0.008% MACs, inplace=True)
(pool5): MaxPool2d(0, 0.000% Params, 32.77 KMac, 0.008% MACs, kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv6): Conv2d(295.17 k, 0.751% Params, 18.89 MMac, 4.511% MACs, 128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn6): BatchNorm2d(512, 0.001% Params, 32.77 KMac, 0.008% MACs, 256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu6): ReLU(0, 0.000% Params, 16.38 KMac, 0.004% MACs, inplace=True)
(conv7): Conv2d(590.08 k, 1.500% Params, 37.77 MMac, 9.018% MACs, 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn7): BatchNorm2d(512, 0.001% Params, 32.77 KMac, 0.008% MACs, 256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu7): ReLU(0, 0.000% Params, 16.38 KMac, 0.004% MACs, inplace=True)
(conv8): Conv2d(590.08 k, 1.500% Params, 37.77 MMac, 9.018% MACs, 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn8): BatchNorm2d(512, 0.001% Params, 32.77 KMac, 0.008% MACs, 256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu8): ReLU(0, 0.000% Params, 16.38 KMac, 0.004% MACs, inplace=True)
(conv9): Conv2d(590.08 k, 1.500% Params, 37.77 MMac, 9.018% MACs, 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn9): BatchNorm2d(512, 0.001% Params, 32.77 KMac, 0.008% MACs, 256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu9): ReLU(0, 0.000% Params, 16.38 KMac, 0.004% MACs, inplace=True)
(pool10): MaxPool2d(0, 0.000% Params, 16.38 KMac, 0.004% MACs, kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv11): Conv2d(1.18 M, 3.001% Params, 18.88 MMac, 4.509% MACs, 256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn11): BatchNorm2d(1.02 k, 0.003% Params, 16.38 KMac, 0.004% MACs, 512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu11): ReLU(0, 0.000% Params, 8.19 KMac, 0.002% MACs, inplace=True)
(conv12): Conv2d(2.36 M, 6.000% Params, 37.76 MMac, 9.016% MACs, 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn12): BatchNorm2d(1.02 k, 0.003% Params, 16.38 KMac, 0.004% MACs, 512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu12): ReLU(0, 0.000% Params, 8.19 KMac, 0.002% MACs, inplace=True)
(conv13): Conv2d(2.36 M, 6.000% Params, 37.76 MMac, 9.016% MACs, 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn13): BatchNorm2d(1.02 k, 0.003% Params, 16.38 KMac, 0.004% MACs, 512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu13): ReLU(0, 0.000% Params, 8.19 KMac, 0.002% MACs, inplace=True)
(conv14): Conv2d(2.36 M, 6.000% Params, 37.76 MMac, 9.016% MACs, 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn14): BatchNorm2d(1.02 k, 0.003% Params, 16.38 KMac, 0.004% MACs, 512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu14): ReLU(0, 0.000% Params, 8.19 KMac, 0.002% MACs, inplace=True)
(pool15): MaxPool2d(0, 0.000% Params, 8.19 KMac, 0.002% MACs, kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv16): Conv2d(2.36 M, 6.000% Params, 9.44 MMac, 2.254% MACs, 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn16): BatchNorm2d(1.02 k, 0.003% Params, 4.1 KMac, 0.001% MACs, 512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu16): ReLU(0, 0.000% Params, 2.05 KMac, 0.000% MACs, inplace=True)
(conv17): Conv2d(2.36 M, 6.000% Params, 9.44 MMac, 2.254% MACs, 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn17): BatchNorm2d(1.02 k, 0.003% Params, 4.1 KMac, 0.001% MACs, 512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu17): ReLU(0, 0.000% Params, 2.05 KMac, 0.000% MACs, inplace=True)
(conv18): Conv2d(2.36 M, 6.000% Params, 9.44 MMac, 2.254% MACs, 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn18): BatchNorm2d(1.02 k, 0.003% Params, 4.1 KMac, 0.001% MACs, 512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu18): ReLU(0, 0.000% Params, 2.05 KMac, 0.000% MACs, inplace=True)
(conv19): Conv2d(2.36 M, 6.000% Params, 9.44 MMac, 2.254% MACs, 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn19): BatchNorm2d(1.02 k, 0.003% Params, 4.1 KMac, 0.001% MACs, 512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu19): ReLU(0, 0.000% Params, 2.05 KMac, 0.000% MACs, inplace=True)
(pool20): MaxPool2d(0, 0.000% Params, 2.05 KMac, 0.000% MACs, kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(fc22): Linear(2.1 M, 5.343% Params, 2.1 MMac, 0.502% MACs, in_features=512, out_features=4096, bias=True)
(relu23): ReLU(0, 0.000% Params, 4.1 KMac, 0.001% MACs, inplace=True)
(drop24): Dropout(0, 0.000% Params, 0.0 Mac, 0.000% MACs, p=0.5, inplace=False)
(fc25): Linear(16.78 M, 42.671% Params, 16.78 MMac, 4.007% MACs, in_features=4096, out_features=4096, bias=True)
(relu26): ReLU(0, 0.000% Params, 4.1 KMac, 0.001% MACs, inplace=True)
(drop27): Dropout(0, 0.000% Params, 0.0 Mac, 0.000% MACs, p=0.5, inplace=False)
(fc28): Linear(409.7 k, 1.042% Params, 409.7 KMac, 0.098% MACs, in_features=4096, out_features=100, bias=True)
)
......@@ -5,7 +5,7 @@
# (TODO)
# Please modify job name
#SBATCH -J ALL-nodiv # The job name
#SBATCH -J PTQ # The job name
#SBATCH -o ret/ret-%j.out # Write the standard output to file named 'ret-<job_number>.out'
#SBATCH -e ret/ret-%j.err # Write the standard error to file named 'ret-<job_number>.err'
......@@ -36,7 +36,8 @@
###
# set constraint for RTX8000 to meet my cuda
#SBATCH --constraint="Ampere|RTX8000|T4"
### #SBATCH --constraint="Ampere|RTX8000|T4"
#SBATCH --constraint="Ampere|RTX8000"
#- Log information
......@@ -82,7 +83,7 @@ echo "Use GPU ${CUDA_VISIBLE_DEVICES}" # which gpus
#- Job step
# [EDIT HERE(TODO)]
python ptq_nodiv.py
python ptq.py
#- End
echo "Job end at $(date "+%Y-%m-%d %H:%M:%S")"
from model import *
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torchvision.transforms.functional import InterpolationMode
from torch.optim.lr_scheduler import CosineAnnealingLR
import os
import os.path as osp
import time
# import sys
def train(model, device, train_loader, optimizer, epoch):
model.train()
total_loss = 0.
lossLayer = nn.CrossEntropyLoss()
start_time = time.time()
for batch_idx, (data, targets) in enumerate(train_loader):
data,targets = data.to(device), targets.to(device)
optimizer.zero_grad()
output = model(data)
loss = lossLayer(output, targets)
loss.backward()
total_loss += loss.item() * len(data)
optimizer.step()
pred = output.argmax(dim=1, keepdim=True)
if batch_idx % 200 == 0 and batch_idx > 0:
cur_loss = total_loss / 200
elapsed = time.time() - start_time
lr = optimizer.param_groups[0]['lr']
print('| epoch {:3d} | {:5d}/{:5d} batches | lr {:02.4f} | ms/batch {:5.2f} | '
'loss {:5.2f}'.format(
epoch, batch_idx, len(train_loader.dataset) // len(data), lr,
elapsed * 1000 / 200, cur_loss))
total_loss = 0.
correct = 0
start_time = time.time()
def evaluate(model, device, eval_loader):
model.eval()
total_loss = 0
correct = 0
lossLayer = nn.CrossEntropyLoss()
with torch.no_grad():
for data, targets in eval_loader:
data,targets = data.to(device), targets.to(device)
output = model(data)
total_loss += len(data) * lossLayer(output, targets).item()
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(targets.view_as(pred)).sum().item()
test_loss = total_loss / len(eval_loader.dataset)
test_acc = 100. * correct / len(eval_loader.dataset)
return test_loss,test_acc
epochs_cfg_table = {
'AlexNet' : [20, 30, 20, 20, 10],
'AlexNet_BN' : [15, 20, 20, 20, 10, 10],
'VGG_16' : [25, 30, 30, 20, 20, 10, 10],
'VGG_19' : [30, 40, 30, 20, 20, 10, 10],
'Inception_BN' : [20, 30, 30, 20, 20, 10, 10],
'ResNet_18' : [30, 25, 25, 20, 10, 10],
'ResNet_50' : [30, 40, 35, 25, 15, 10, 10],
'ResNet_152' : [50, 60, 50, 40, 25, 15, 10, 10],
'MobileNetV2' : [25, 35, 30, 20, 10, 10],
}
lr_cfg_table = {
'AlexNet' : [0.01, 0.005, 0.001, 0.0005, 0.0001],
'AlexNet_BN' : [0.01, 0.005, 0.002, 0.001, 0.0005, 0.0001],
'VGG_16' : [0.01, 0.008, 0.005, 0.002, 0.001, 0.0005, 0.0001],
'VGG_19' : [0.01, 0.008, 0.005, 0.002, 0.001, 0.0005, 0.0001],
'Inception_BN' : [0.01, 0.008, 0.005, 0.002, 0.001, 0.0005, 0.0001],
'ResNet_18' : [0.01, 0.005, 0.002, 0.001, 0.0005, 0.0001],
'ResNet_50' : [0.01, 0.008, 0.005, 0.002, 0.001, 0.0005, 0.0001],
'ResNet_152' : [0.01, 0.008, 0.005, 0.003, 0.002, 0.001, 0.0005, 0.0001],
'MobileNetV2' : [0.01, 0.008, 0.005, 0.002, 0.001, 0.0001],
}
if __name__ == "__main__":
# sys.stdout = open(sys.stdout.fileno(), mode='w', buffering=1)
batch_size = 32
seed = 1111
seed_gpu = 1111
lr = 0.05 # origin lr
# momentum = 0.5
t_epochs = 300 #学习率衰减周期
patience = 30 #早停参数
save_model = True
append = False
torch.manual_seed(seed)
torch.cuda.manual_seed(seed_gpu)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
train_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('../data', train=True, download=True,
transform=transforms.Compose([
transforms.Resize((32, 32), interpolation=InterpolationMode.BICUBIC),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
])),
batch_size=batch_size, shuffle=True, num_workers=1, pin_memory=True
)
test_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('../data', train=False, transform=transforms.Compose([
transforms.Resize((32, 32), interpolation=InterpolationMode.BICUBIC),
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
])),
batch_size=batch_size, shuffle=True, num_workers=1, pin_memory=True
)
if save_model:
if not osp.exists('ckpt'):
os.makedirs('ckpt')
# model_name_list = ['AlexNet', 'AlexNet_BN', 'VGG_16', 'VGG_19', 'Inception_BN',
# 'ResNet_18', 'ResNet_50', 'ResNet_152', 'MobileNetV2']
# model_name_list = ['ResNet_18', 'ResNet_50', 'ResNet_152', 'MobileNetV2']
model_name_list = ['ResNet_152']
for model_name in model_name_list:
save_path = 'ckpt/cifar10_'+model_name+'.pt'
if os.path.exists(save_path) and append:
continue
else:
print('>>>>>>>>>>>>>>>>>>>>>>>> Train: '+model_name+' <<<<<<<<<<<<<<<<<<<<<<<<')
model = Model(model_name).to(device)
best_val_acc = None
optimizer = optim.SGD(model.parameters(), lr=lr)
lr_scheduler = CosineAnnealingLR(optimizer, T_max=t_epochs)
weak_cnt = 0 # 弱于最佳精度的计数器
epoch = 0
while weak_cnt < patience:
epoch += 1
epoch_start_time = time.time()
train(model, device, train_loader, optimizer, epoch)
val_loss, val_acc = evaluate(model, device, test_loader)
if not best_val_acc or val_acc > best_val_acc:
best_val_acc = val_acc
weak_cnt = 0
if save_model:
torch.save(model.state_dict(), save_path)
else:
weak_cnt += 1
print('-' * 89)
print('| end of epoch {:3d} | time: {:5.2f}s | test loss {:5.2f} | '
'test acc {:.2f} | weak_cnt {:d}'.format(epoch, (time.time() - epoch_start_time),
val_loss, val_acc, weak_cnt))
print('-' * 89)
lr_scheduler.step()
print('>>> Early Stop: No improvement after patience(%d) epochs.'%patience)
model = Model(model_name).to(device)
model.load_state_dict(torch.load(save_path))
test_loss,test_acc = evaluate(model, device, test_loader)
print('=' * 89)
print('| Test on {:s} | test loss {:5.2f} | test acc {:.2f}'.format(
model_name, test_loss, test_acc))
print('=' * 89)
\ No newline at end of file
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment