README.md 5.09 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
<!--- Licensed to the Apache Software Foundation (ASF) under one -->
<!--- or more contributor license agreements.  See the NOTICE file -->
<!--- distributed with this work for additional information -->
<!--- regarding copyright ownership.  The ASF licenses this file -->
<!--- to you under the Apache License, Version 2.0 (the -->
<!--- "License"); you may not use this file except in compliance -->
<!--- with the License.  You may obtain a copy of the License at -->

<!---   http://www.apache.org/licenses/LICENSE-2.0 -->

<!--- Unless required by applicable law or agreed to in writing, -->
<!--- software distributed under the License is distributed on an -->
<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
<!--- KIND, either express or implied.  See the License for the -->
<!--- specific language governing permissions and limitations -->
<!--- under the License. -->


19 20 21 22 23 24 25 26
# Performance Benchmark

## Results

See results on wiki page https://github.com/dmlc/tvm/wiki/Benchmark

## How to Reproduce

27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
To obtain the best performance, we always do auto-tuning for the specific devices and get
the parameters for used kernels. To enable easy reproduction of our results, we release
pre-tuned parameters for popular networks on some common devices.
TVM will download related tuning cache files during compilation.

If you don't have the following listed devices, you can still run these scripts.
You can pick the one that is most similar to your device as argument.
In general, the performance should also be good.

It is recommended that you run tuning by yourself if you have your customized network or devices.
Please follow the tutorial for
[NVIDIA GPU](https://docs.tvm.ai/tutorials/autotvm/tune_nnvm_cuda.html),
[ARM CPU](https://docs.tvm.ai/tutorials/autotvm/tune_nnvm_arm.html),
[Mobile GPU](https://docs.tvm.ai/tutorials/autotvm/tune_nnvm_mobile_gpu.html).

### NVIDIA GPU

Build TVM with LLVM and CUDA enabled. [Help](https://docs.tvm.ai/install/from_source.html)

```bash
python3 gpu_imagenet_bench.py --model 1080ti
python3 gpu_imagenet_bench.py --model titanx
49 50 51 52

# For NVIDIA Jetson TX2, you can run the following command directly on the board,
# or use cross compilation and RPC like what we do for ARM CPU.
python3 gpu_imagenet_bench.py --model tx2
53 54 55
```

### ARM CPU & Mali GPU
56 57 58 59
For embedded devices, we use RPC infrastructure in TVM to make the management easy.
You need to use it for reproducing benchmark results.

**Note**: We use llvm-4.0 in our tuning environment. Mismatch of the LLVM version during tuning and deployment can influence the performance, so you have to use a same version for reproduction.
60 61

0. Build TVM with LLVM enabled. [Help](https://docs.tvm.ai/install/from_source.html)
62 63 64 65 66 67 68 69 70 71 72

1. Start an RPC Tracker on the host machine
```bash
python3 -m tvm.exec.rpc_tracker
```

2. Register devices to the tracker
* For Linux device
  * Build tvm runtime on your device [Help](https://docs.tvm.ai/tutorials/nnvm/deploy_model_on_rasp.html#build-tvm-runtime-on-device)
  * Register your device to tracker by
  ```bash
Siju committed
73
  python3 -m tvm.exec.rpc_server --tracker=[HOST_IP]:9190 --key=[DEVICE_KEY]
74 75
  ```
  replace `[HOST_IP]` with the IP address of the host machine, `[DEVICE_KEY]` with the name of device.
76

77
  E.g. Here is an example command for RK3399,
Siju committed
78
  `python3 -m tvm.exec.rpc_server --tracker=10.77.1.123:9190 --key=rk3399`, where 10.77.1.123 is the IP address of the tracker.
79 80 81 82 83

* For Android device
   * Build and install tvm RPC apk on your device [Help](https://github.com/dmlc/tvm/tree/master/apps/android_rpc).
     Make sure you can pass the android rpc test. Then you have alreadly known how to register.

84
3. Verify the device registration
85 86 87 88 89 90
  We can query all registered devices by
  ```bash
  python3 -m tvm.exec.query_rpc_tracker
  ```
  You should be able to find your devices in `Queue Status`. Make sure the registration is correct before going ahead.

91
  For our test environment, one sample output can be
92
  ```bash
93
  Queue Status
94
  ----------------------------------
95
  key          total  free  pending
96 97
  ----------------------------------
  mate10pro    1      1     0
98
  p20pro       2      2     0
99 100 101
  pixel2       2      2     0
  rk3399       2      2     0
  rasp3b       8      8     0
102 103
  ```

104
4. Run benchmark
105
  ```bash
106 107 108 109 110
  # ARM CPU
  python3 arm_cpu_imagenet_bench.py --model rasp3b --rpc-key rasp3b
  python3 arm_cpu_imagenet_bench.py --model rk3399 --rpc-key rk3399
  python3 arm_cpu_imagenet_bench.py --model pixel2 --rpc-key pixel2
  python3 arm_cpu_imagenet_bench.py --model p20pro --rpc-key p20pro
111
  python3 arm_cpu_imagenet_bench.py --model mate10pro --rpc-key mate10pro
112
  ```
113

114
  ```bash
115
  # Mali GPU
116 117 118 119
  # NOTE: To make the test environment more stable, we close GUI and lock the frequency
  sudo /etc/init.d/lightdm stop
  sudo -i
  echo performance > /sys/class/misc/mali0/device/devfreq/ff9a0000.gpu/governor
120
  python3 mobile_gpu_imagenet_bench.py --model rk3399 --rpc-key rk3399
121
  python3 mobile_gpu_imagenet_bench.py --model rk3399 --rpc-key rk3399 --dtype float16
122
  ```
123

124 125 126 127 128 129
### AMD GPU

Build TVM with LLVM and ROCm enabled. [Help](https://docs.tvm.ai/install/from_source.html)
```bash
python3 gpu_imagenet_bench.py --model gfx900 --target rocm
```