HLS Backend Example
===================

TVM supports Xilinx FPGA board with SDAccel.  Here is a tutorial for how to deploy TVM to AWS F1 FPGA instance.

***Note***: This feature is still experimental.  We cannot use SDAccel to deploy an end to end neural networks for now.

We use two python scripts for this tutorial.

- build.py - a script to synthesize FPGA bitstream.
```python
import tvm

tgt_host="llvm"
tgt="sdaccel"

n = tvm.var("n")
A = tvm.placeholder((n,), name='A')
B = tvm.placeholder((n,), name='B')
C = tvm.compute(A.shape, lambda i: A[i] + B[i], name="C")

s = tvm.create_schedule(C.op)
px, x = s[C].split(C.op.axis[0], nparts=1)

s[C].bind(px, tvm.thread_axis("pipeline"))

fadd = tvm.build(s, [A, B, C], tgt, target_host=tgt_host, name="myadd")

fadd.save("myadd.o")
fadd.imported_modules[0].save("myadd.xclbin")

tvm.contrib.cc.create_shared("myadd.so", ["myadd.o"])
```

- run.py - a script to use FPGA as an accelerator.
```python
import tvm
import numpy as np
import os

tgt="sdaccel"

fadd = tvm.module.load("myadd.so")
if os.environ.get("XCL_EMULATION_MODE"):
    fadd_dev = tvm.module.load("myadd.xclbin")
else:
    fadd_dev = tvm.module.load("myadd.awsxclbin")
fadd.import_module(fadd_dev)

ctx = tvm.context(tgt, 0)

n = 1024
a = tvm.nd.array(np.random.uniform(size=n).astype("float32"), ctx)
b = tvm.nd.array(np.random.uniform(size=n).astype("float32"), ctx)
c = tvm.nd.array(np.zeros(n, dtype="float32"), ctx)

fadd(a, b, c)
tvm.testing.assert_allclose(c.asnumpy(), a.asnumpy() + b.asnumpy())
```

Setup
-----

- Launch an instance using the FPGA Developer AMI.  We don't need an F1 instance for emulation and synthesis, so it is recommended to use a lower cost instance for them.

- Setup AWS FPGA development kit.
```bash
git clone https://github.com/aws/aws-fpga.git
cd aws-fpga
source sdaccel_setup.sh
source ${XILINX_SDX}/settings64.sh
```

- Setup TVM with OpenCL enabled.

Emulation
---------

- Create emconfig.json for emulation.
```bash
emconfigutil --platform ${AWS_PLATFORM} --nd 1
```

- Copy emconfig.json to the python binary directory.  It is because the current Xilinx toolkit assumes that both host binary and the emconfig.json file are in the same path.
```bash
cp emconfig.json $(dirname $(which python))
```

- Run software emulation
```bash
export XCL_EMULATION_MODE=1
export XCL_TARGET=sw_emu

python build.py
python run.py
```

- Run hardware emulation
```bash
export XCL_EMULATION_MODE=1
export XCL_TARGET=hw_emu

python build.py
python run.py
```


Synthesis
---------

- Run synthesis with the following script.

```bash
unset XCL_EMULATION_MODE
export XCL_TARGET=hw

python build.py
```

- Create AWS FPGA image and upload it to AWS S3.
```
${SDACCEL_DIR}/tools/create_sdaccel_afi.sh -xclbin=myadd.xclbin -o=myadd \
    -s3_bucket=<bucket-name> -s3_dcp_key=<dcp-folder-name> -s3_logs_key=<logs-folder-name>
```
This also generates an awsxclbin file, which is necessary to use the AWS FPGA image on F1 instances.

Run
---

- Launch Amazon EC2 F1 instance.

- Copy `myadd.so`, `myadd.awsxclbin`, and `run.py` to the F1 instance.

- Setup AWS FPGA development kit.
```bash
git clone https://github.com/aws/aws-fpga.git
cd aws-fpga
source sdaccel_setup.sh
```

- Setup TVM with OpenCL enabled.

- Become root and setup environment variables.
```bash
sudo sh
source ${INSTALL_ROOT}/setup.sh
```

- Run
```bash
python run.py
```