[perf] fix: set use_reentrant=False when enable gradient checkpointing (#114)
- Set use_reentrant=False to avoid duplicate allgather in backward when gradient checkpointing is enabled. - Optimize temperature computation by using inplace op - Fix testing logics
Showing
Please
register
or
sign in
to comment