* initialize cond 2d transpose scheduling on x86 * refine the scheduler a bit * fix for lint * address review comments; remove duplicate code * fix lint