| Name |
Last commit
|
Last update |
|---|---|---|
| .. | ||
| _static | ||
| advance | ||
| amd_tutorial | ||
| examples | ||
| experiment | ||
| faq | ||
| perf | ||
| preparation | ||
| start | ||
| workers | ||
| Makefile | ||
| README.md | ||
| README_vllm0.7.md | ||
| README_vllm0.8.md | ||
| conf.py | ||
| data.rst | ||
| hybrid_flow.rst | ||
| index.rst | ||
| requirements-docs.txt |
# Intro Support Megatron checkpoint for Model, Optimizer States and RNG states, with a new layer of abstraction: `MegatronCheckpointManager` like FSDP. Also add checkpoint tests. # Involved Issues and PRs This solved issue #682 #605 , including PR #510 #634 #368 #330 . Thanks for the great efforts of @uygnef, @ShareLer and @caaatch22 in these contributions. # TODOs - [ ] Support Megatron dist checkpointing mechanism, now use torch.save/load to store/restore model weights. - [x] Quick: Also store hf format model. --------- Co-authored-by: caaatch22 <mr.liumingjie@gmail.com> Co-authored-by: Yu Feng <admin@fengyu.org> Co-authored-by: ShareLer <sharele@163.com>
| Name |
Last commit
|
Last update |
|---|---|---|
| .. | ||
| _static | Loading commit data... | |
| advance | Loading commit data... | |
| amd_tutorial | Loading commit data... | |
| examples | Loading commit data... | |
| experiment | Loading commit data... | |
| faq | Loading commit data... | |
| perf | Loading commit data... | |
| preparation | Loading commit data... | |
| start | Loading commit data... | |
| workers | Loading commit data... | |
| Makefile | Loading commit data... | |
| README.md | Loading commit data... | |
| README_vllm0.7.md | Loading commit data... | |
| README_vllm0.8.md | Loading commit data... | |
| conf.py | Loading commit data... | |
| data.rst | Loading commit data... | |
| hybrid_flow.rst | Loading commit data... | |
| index.rst | Loading commit data... | |
| requirements-docs.txt | Loading commit data... |