<!--- Licensed to the Apache Software Foundation (ASF) under one --> <!--- or more contributor license agreements. See the NOTICE file --> <!--- distributed with this work for additional information --> <!--- regarding copyright ownership. The ASF licenses this file --> <!--- to you under the Apache License, Version 2.0 (the --> <!--- "License"); you may not use this file except in compliance --> <!--- with the License. You may obtain a copy of the License at --> <!--- http://www.apache.org/licenses/LICENSE-2.0 --> <!--- Unless required by applicable law or agreed to in writing, --> <!--- software distributed under the License is distributed on an --> <!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY --> <!--- KIND, either express or implied. See the License for the --> <!--- specific language governing permissions and limitations --> <!--- under the License. --> # TOPI: TVM Operator Inventory TOPI is the operator collection library for TVM intended at sharing the effort of crafting and optimizing tvm generated kernels. The goal: - Provide sugars for operator declaration - Give common primitives for fused op creation. - Provide commonly used schedules under each architectures ## Organization - [include](include) C++ library, header only - [python](python) python library - [recipe](recipe) Recipe collections containing useful operator examples. ## Guidelines - Use numpy-style naming convention for known ops - Seperate operator declaration from schedule when possible. - This can be inconvenient but enables more general scheduling across ops. - We can always recover the tensors from its outputs by traversing the tree. - Deliberately assert the requirements - Some kernels have requirements on shape and data layout, assert them - Data layout aware, if not specified in argument or in function, assume NCHW by default. ## Testcase - Add testcases to testout the schedule and dataflow in the TOPI workflow - Only do correctness testing without attaching compiler flags and only run it once. ## Performance Tuning Workflow Since TVM is work in progress, some optimization might not be perfect. One quick way I find useful is to do codegen plus manual modification. The workflow is: - Generate the GPU kernels, write them into a file, say ```perf/matexp_generated.cu``` - Copy the generated file into another one, say ```perf/matexp_manual.cu```, do modifications according to your intuition. - Set use_manual flag in the script to continue the codegen workflow as normal, but piggy back the manual written code instead. - Observe the performance difference. - If the performance improves, mark the manual code and think of optimization pass to generate the desired target code.