-
[feat] Megatron checkpoint support for current Llama and Qwen models (#687) · 5d0a7eaf
# Intro Support Megatron checkpoint for Model, Optimizer States and RNG states, with a new layer of abstraction: `MegatronCheckpointManager` like FSDP. Also add checkpoint tests. # Involved Issues and PRs This solved issue #682 #605 , including PR #510 #634 #368 #330 . Thanks for the great efforts of @uygnef, @ShareLer and @caaatch22 in these contributions. # TODOs - [ ] Support Megatron dist checkpointing mechanism, now use torch.save/load to store/restore model weights. - [x] Quick: Also store hf format model. --------- Co-authored-by: caaatch22 <mr.liumingjie@gmail.com> Co-authored-by: Yu Feng <admin@fengyu.org> Co-authored-by: ShareLer <sharele@163.com>
Blue Space committed
| Name |
Last commit
|
Last update |
|---|---|---|
| .. | ||
| fsdp_workers.rst | Loading commit data... | |
| megatron_workers.rst | Loading commit data... | |
| ray_trainer.rst | Loading commit data... |