400-123-4657

咨询热线

400-123-4657
地址:广东省广州市天河区88号
QQ:1234567890
传真:+86-123-4567
邮箱:admin@youweb.com

行业资讯

当前位置: 首页 > 蓝狮动态 > 行业资讯

onnxoptimizer、onnxsim使用记录

发布时间:2024-03-11 点击量:58

onnxoptimizer、onnxsim被誉为onnx的优化利器,其中onnxsim可以优化常量,onnxoptimizer可以对节点进行压缩。为此以resnet18为例,测试onnxoptimizer、onnxsim对于模型的优化效果。onnxoptimizer、onnxsim的安装代码如下所示:

pip install onnxoptimizer

pip install onnxsim

1、resnet18的结构

resnet18的结构如下所,可见为多个CBR部件构成

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (layer2): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (layer3): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (layer4): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
  (fc): Linear(in_features=512, out_features=1000, bias=True)
)

2、创建onnx模型

 

此时模型结构如下所示,下图仅为局部。全图请自行运行代码,并上传模型到Netron网站进行可视化。图中的Identity节点是BN运算。

2、onnxoptimizer优化

 

resnet18_optimize.onnx的结构如下所示,直观上感觉与resnet18.onnx没有任何区别。且两个模型的内存大小是一模一样的,也就是说onnxoptimizer没有优化BN运算。

?3、onnxsim优化

 

?resnet18_simplify.onnx的结构如下所示,直观上感觉与resnet18.onnx有一定区别,BN运算被合并到了模型中。

?4、运行时间与显存占用对比

下面代码涉及显存管理,请下载dll库vs2019导出动态链接库(dll)给其他vs项目及python代码使用_万里鹏程转瞬至的博客-CSDN博客通过vs可以导出动态链接库(dll文件)给其他c++项目、c#项目、python项目使用。本案例实现将vs项目导出为动态链接库,给c++项目与python项目使用。涉及全局变量、函数、自定义类的导出。项目创建完成后会得到以下结构, 可以将核心代码写在dllmain.cpp里面(原先的内容可以不用管),头文件信息写入在pch.h里面以下内容可以全部拷贝到pch.h中(博主的代码涉及到了cuda,所以需要配置以下cuda,cuda的配置可以参考libtorch显存管理示例_万里鹏程转瞬至的博客-CSDN博客,各https://hpg123.blog.csdn.net/article/details/125396626

 

对比结果如下所示,基本上没有区别。进行优化后,执行速度略快,但cuda占用没有显著优势?

Start run , used: 11289.984375 Mib free: 998.015625 Mib
resnet18.onnx (1, 1000) 0.03594543933868408
End run , used: 12257.984375 Mib free: 30.015625 Mib

Start run , used: 11289.984375 Mib free: 998.015625 Mib
resnet18_optimize.onnx (1, 1000) 0.024863476753234862
End run , used: 12183.984375 Mib free: 104.015625 Mib

Start run , used: 11289.984375 Mib free: 998.015625 Mib
resnet18_simplify.onnx (1, 1000) 0.026698846817016602
End run , used: 12239.984375 Mib free: 48.015625 Mib

平台注册入口