- 13 Apr, 2025 1 commit
-
-
Yaoyu Zhu committed
-
- 12 Apr, 2025 1 commit
-
-
Yaoyu Zhu committed
-
- 11 Apr, 2025 3 commits
-
-
Yaoyu Zhu committed
-
Shi wenxuan committed
-
Shi wenxuan committed
-
- 10 Apr, 2025 3 commits
- 09 Apr, 2025 5 commits
- 08 Apr, 2025 4 commits
- 07 Apr, 2025 1 commit
-
-
ZhangXiaoyun committed
-
- 27 Mar, 2025 1 commit
-
-
Shawn/Yuxuan Tong committed
-
- 25 Mar, 2025 2 commits
-
-
shengguangming committed
-
Add tqdm progress bar to RayPPOTrainer for training visualization This PR enhances the RayPPOTrainer class by implementing a progress bar that visualizes the training process: - Imported tqdm module in verl/trainer/ppo/ray_trainer.py (line 27) - Added progress bar initialization in the fit() method (line 781) - Implemented progress updates during training iterations (line 931) - Added proper cleanup by closing the progress bar at the end of training (line 928) This improvement provides real-time feedback on training progress, making it easier to monitor long-running training sessions. --------- Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn>
HangZhang committed
-
- 23 Mar, 2025 6 commits
-
-
Jiawei Liu committed
-
# What does this PR do? This pr basically does the same thing as this [pr](https://github.com/volcengine/verl/pull/386), but replaces the rollout engine with sglang.
mlmz committed -
# Intro Support Megatron checkpoint for Model, Optimizer States and RNG states, with a new layer of abstraction: `MegatronCheckpointManager` like FSDP. Also add checkpoint tests. # Involved Issues and PRs This solved issue #682 #605 , including PR #510 #634 #368 #330 . Thanks for the great efforts of @uygnef, @ShareLer and @caaatch22 in these contributions. # TODOs - [ ] Support Megatron dist checkpointing mechanism, now use torch.save/load to store/restore model weights. - [x] Quick: Also store hf format model. --------- Co-authored-by: caaatch22 <mr.liumingjie@gmail.com> Co-authored-by: Yu Feng <admin@fengyu.org> Co-authored-by: ShareLer <sharele@163.com>
Blue Space committed -
For longer tests, may check `example/grpo_trainer` folder, these 2 backends can align within 200 steps, but for more steps, megatron seems not able to reach loss convergence. TODO: Extended testing over longer time ranges is required to further validate.
Blue Space committed -
it should skip special tokens here. just like trl do https://github.com/huggingface/trl/blob/fc2b041b58f6fbe766dceaec819bc5a8f9d209da/trl/trainer/grpo_trainer.py#L597 if `skip_special_tokens=False`, completion ``` <think>...</think><answer>....</answer> ``` will be decoded as things such as ``` <think>...</think><answer>....</answer><|im_end|><|endoftext|> ``` which will render typical `format_reward_func` mismatch ```python r"^<think>.*?</think>\s*<answer>.*?</answer>$" ```
G.O.D committed -
Haoyang Zou committed
-
- 22 Mar, 2025 1 commit
-
- 21 Mar, 2025 4 commits
-
-
Junrong Lin committed
-
Prevents training hangs by validating `num_key_value_heads % ulysses_sequence_parallel_size == 0` before training.
Yu Feng committed -
## What does this PR do? Add document for using vLLM 0.8 in verl ## Who can review? @eric-haibin-lin
hoshi-hiyouga committed -
HL committed
-
- 20 Mar, 2025 8 commits
-
-
Adding Openmanus-RL: a llm agent rl tunning repo with verl
Kunlun Zhu committed -
Add `verl` as the `framework` parameter to the SwanLab config table, so more developers can see that this training comes from `verl`.
Ze-Yi LIN committed -
HL committed
-
https://github.com/volcengine/verl/issues/680 Changes: - Move math-verify to the optional dependencies. Now it can be installed via `cd verl && pip install -e .[math]` - Revert using naive verifier for math dataset. Users can switch to math-verify or custom a new `compute_score` function.
Yuyang Ding committed -
Shawn/Yuxuan Tong committed
-
Shawn/Yuxuan Tong committed
-
Shawn/Yuxuan Tong committed
-
Shawn/Yuxuan Tong committed
-