* [Codegen] Support broadcast op with symbolic shape * fix case where last dim = 1 * use enum; simplify stride calculation; improve doc * fix lint * improve py doc