- 21 Apr, 2025 4 commits
- 18 Apr, 2025 2 commits
- 17 Apr, 2025 3 commits
- 16 Apr, 2025 1 commit
-
-
Yaoyu Zhu committed
-
- 13 Apr, 2025 3 commits
- 12 Apr, 2025 1 commit
-
-
Yaoyu Zhu committed
-
- 11 Apr, 2025 3 commits
-
-
Yaoyu Zhu committed
-
Shi wenxuan committed
-
Shi wenxuan committed
-
- 10 Apr, 2025 3 commits
- 09 Apr, 2025 5 commits
- 08 Apr, 2025 4 commits
- 07 Apr, 2025 1 commit
-
-
ZhangXiaoyun committed
-
- 27 Mar, 2025 1 commit
-
-
Shawn/Yuxuan Tong committed
-
- 25 Mar, 2025 2 commits
-
-
shengguangming committed
-
Add tqdm progress bar to RayPPOTrainer for training visualization This PR enhances the RayPPOTrainer class by implementing a progress bar that visualizes the training process: - Imported tqdm module in verl/trainer/ppo/ray_trainer.py (line 27) - Added progress bar initialization in the fit() method (line 781) - Implemented progress updates during training iterations (line 931) - Added proper cleanup by closing the progress bar at the end of training (line 928) This improvement provides real-time feedback on training progress, making it easier to monitor long-running training sessions. --------- Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn>
HangZhang committed
-
- 23 Mar, 2025 6 commits
-
-
Jiawei Liu committed
-
# What does this PR do? This pr basically does the same thing as this [pr](https://github.com/volcengine/verl/pull/386), but replaces the rollout engine with sglang.
mlmz committed -
# Intro Support Megatron checkpoint for Model, Optimizer States and RNG states, with a new layer of abstraction: `MegatronCheckpointManager` like FSDP. Also add checkpoint tests. # Involved Issues and PRs This solved issue #682 #605 , including PR #510 #634 #368 #330 . Thanks for the great efforts of @uygnef, @ShareLer and @caaatch22 in these contributions. # TODOs - [ ] Support Megatron dist checkpointing mechanism, now use torch.save/load to store/restore model weights. - [x] Quick: Also store hf format model. --------- Co-authored-by: caaatch22 <mr.liumingjie@gmail.com> Co-authored-by: Yu Feng <admin@fengyu.org> Co-authored-by: ShareLer <sharele@163.com>
Blue Space committed -
For longer tests, may check `example/grpo_trainer` folder, these 2 backends can align within 200 steps, but for more steps, megatron seems not able to reach loss convergence. TODO: Extended testing over longer time ranges is required to further validate.
Blue Space committed -
it should skip special tokens here. just like trl do https://github.com/huggingface/trl/blob/fc2b041b58f6fbe766dceaec819bc5a8f9d209da/trl/trainer/grpo_trainer.py#L597 if `skip_special_tokens=False`, completion ``` <think>...</think><answer>....</answer> ``` will be decoded as things such as ``` <think>...</think><answer>....</answer><|im_end|><|endoftext|> ``` which will render typical `format_reward_func` mismatch ```python r"^<think>.*?</think>\s*<answer>.*?</answer>$" ```
G.O.D committed -
Haoyang Zou committed
-
- 22 Mar, 2025 1 commit
-