Commits · a0930e4bdb7f27787d535c448fad16730aa22952 · ZhangXiaoyun / verl

21 Apr, 2025 4 commits
- update templates for rented servers · a0930e4b
  Yaoyu Zhu committed Apr 21, 2025
  
  a0930e4b Browse Files
- no ref · ed46c1a3
  苏舞仙 committed Apr 21, 2025
  
  ed46c1a3 Browse Files
- change config of blockelite scripts · 43b239ac
  Yaoyu Zhu committed Apr 21, 2025
  
  43b239ac Browse Files
- change plotting · afe1b27d
  Yaoyu Zhu committed Apr 21, 2025
  
  afe1b27d Browse Files
18 Apr, 2025 2 commits
- merge small changes · 445e487c
  Yaoyu Zhu committed Apr 18, 2025
  
  445e487c Browse Files
- json · 6276ae83
  苏舞仙 committed Apr 18, 2025
  
  6276ae83 Browse Files
17 Apr, 2025 3 commits
- pad · 8d63ec21
  苏舞仙 committed Apr 17, 2025
  
  8d63ec21 Browse Files
- pad · e5f061dc
  苏舞仙 committed Apr 17, 2025
  
  e5f061dc Browse Files
- filter flags · c3b216bd
  苏舞仙 committed Apr 17, 2025
  
  c3b216bd Browse Files
16 Apr, 2025 1 commit
- update codev dataset and rl config (use_liger) · 014a39f4
  Yaoyu Zhu committed Apr 16, 2025
  
  014a39f4 Browse Files
13 Apr, 2025 3 commits
- fix bugs in using acc as dynamic sampling metric · 31ee9176
  Yaoyu Zhu committed Apr 14, 2025
  
  31ee9176 Browse Files
- change filter_groups.metric to acc · 119a9e2d
  Yaoyu Zhu committed Apr 13, 2025
  
  119a9e2d Browse Files
- update config for dapo 3.1k · fa3bc56a
  Yaoyu Zhu committed Apr 13, 2025
  
  fa3bc56a Browse Files
12 Apr, 2025 1 commit
- add double ground truth to reward function · c3951916
  Yaoyu Zhu committed Apr 12, 2025
  
  c3951916 Browse Files
11 Apr, 2025 3 commits
- add pass@all in metrics · 76fbbcc5
  Yaoyu Zhu committed Apr 11, 2025
  
  76fbbcc5 Browse Files
- feat:add new metrics · 0a9d70c4
  Shi wenxuan committed Apr 11, 2025
  
  0a9d70c4 Browse Files
- feat: new metrics · e87d0923
  Shi wenxuan committed Apr 11, 2025
  
  e87d0923 Browse Files
10 Apr, 2025 3 commits
- fix config for dapo · 56f07a24
  Yaoyu Zhu committed Apr 10, 2025
  
  56f07a24 Browse Files
- fix bugs in dapo config (no dynamic sampling, no token-level loss) · 65ac1294
  Yaoyu Zhu committed Apr 10, 2025
  
  65ac1294 Browse Files
- fix a bug in codev reward function · d58782a4
  Yaoyu Zhu committed Apr 10, 2025
  
  d58782a4 Browse Files
09 Apr, 2025 5 commits
- update gitignore · 87538066
  Yaoyu Zhu committed Apr 09, 2025
  
  87538066 Browse Files
- fix a bug in reward · f0ee0af3
  Yaoyu Zhu committed Apr 09, 2025
  
  f0ee0af3 Browse Files
- update gitignore and compatibilty for blockelite server · 8f63f283
  Yaoyu Zhu committed Apr 09, 2025
  
  8f63f283 Browse Files
- add data preprocess script · 83cce64a
  Yaoyu Zhu committed Apr 09, 2025
  
  83cce64a Browse Files
- add reward_mapping into reward function and add permission · a8d29994
  Yaoyu Zhu committed Apr 09, 2025
  
  a8d29994 Browse Files
08 Apr, 2025 4 commits
- update gitignore · 85eb0b35
  Yaoyu Zhu committed Apr 08, 2025
  
  85eb0b35 Browse Files
- update gitignore · ee74bb52
  Yaoyu Zhu committed Apr 08, 2025
  
  ee74bb52 Browse Files
- update gitignore · f2982c41
  Yaoyu Zhu committed Apr 08, 2025
  
  f2982c41 Browse Files
- update git ignore and template · a657af06
  Yaoyu Zhu committed Apr 08, 2025
  
  a657af06 Browse Files
07 Apr, 2025 1 commit
- dapo · 51c05054
  ZhangXiaoyun committed Apr 07, 2025
  
  51c05054 Browse Files
27 Mar, 2025 1 commit
- chore: wandb run of an early version · 66686b40
  Shawn/Yuxuan Tong committed Mar 27, 2025
  
  66686b40 Browse Files
25 Mar, 2025 2 commits

resolve conflict by merging main · 88cf46d5
shengguangming committed Mar 25, 2025

88cf46d5 Browse Files

Add tqdm progress bar to RayPPOTrainer to visualize training progress (#615) · 36c10bff

Add tqdm progress bar to RayPPOTrainer for training visualization

This PR enhances the RayPPOTrainer class by implementing a progress bar
that visualizes the training process:

- Imported tqdm module in verl/trainer/ppo/ray_trainer.py (line 27)
- Added progress bar initialization in the fit() method (line 781)
- Implemented progress updates during training iterations (line 931)
- Added proper cleanup by closing the progress bar at the end of
training (line 928)

This improvement provides real-time feedback on training progress,
making it easier to monitor long-running training sessions.

---------

Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn>

committed Mar 25, 2025

36c10bff Browse Files

23 Mar, 2025 6 commits

fix: slicing returns DataProto not DataProtoItem (#718) · 44a65f95
Jiawei Liu committed Mar 23, 2025

44a65f95 Browse Files

[feat] support a basic utility of VLM RLHF with sglang (#714) · db1d3251

# What does this PR do?
This pr basically does the same thing as this
[pr](https://github.com/volcengine/verl/pull/386), but replaces the
rollout engine with sglang.

committed Mar 23, 2025

db1d3251 Browse Files

[feat] Megatron checkpoint support for current Llama and Qwen models (#687) · 5d0a7eaf

# Intro

Support Megatron checkpoint for Model, Optimizer States and RNG states,
with a new layer of abstraction: `MegatronCheckpointManager` like FSDP.
Also add checkpoint tests.

# Involved Issues and PRs

This solved issue #682 #605 , including PR #510 #634 #368 #330 . Thanks
for the great efforts of @uygnef, @ShareLer and @caaatch22 in these
contributions.

# TODOs

- [ ] Support Megatron dist checkpointing mechanism, now use
torch.save/load to store/restore model weights.
- [x] Quick: Also store hf format model.

---------

Co-authored-by: caaatch22 <mr.liumingjie@gmail.com>
Co-authored-by: Yu Feng <admin@fengyu.org>
Co-authored-by: ShareLer <sharele@163.com>

committed Mar 23, 2025

5d0a7eaf Browse Files

Add GRPO CI to FSDP and Megatron simple e2e. (#711) · 98a0208c

For longer tests, may check `example/grpo_trainer` folder, these 2
backends can align within 200 steps, but for more steps, megatron seems
not able to reach loss convergence.

TODO: Extended testing over longer time ranges is required to further
validate.

committed Mar 23, 2025

98a0208c Browse Files

skip special tokens (#715) · a339f6ff

it should skip special tokens here. just like trl do
https://github.com/huggingface/trl/blob/fc2b041b58f6fbe766dceaec819bc5a8f9d209da/trl/trainer/grpo_trainer.py#L597


if `skip_special_tokens=False`,  completion 

```
<think>...</think><answer>....</answer>
```

will be decoded as things such as
```
<think>...</think><answer>....</answer><|im_end|><|endoftext|>
```

which will render typical `format_reward_func` mismatch

```python
r"^<think>.*?</think>\s*<answer>.*?</answer>$"
```

committed Mar 23, 2025

a339f6ff Browse Files

Fix checkpoint loading in fsdp_checkpoint_manager.py and ray_trainer.py (#712) · c523a314
Haoyang Zou committed Mar 23, 2025

c523a314 Browse Files

22 Mar, 2025 1 commit
- fix: support transformers==4.50.0 (#704) · 3f6d45d9
```
https://github.com/volcengine/verl/issues/703
```
  Lumeng Wu committed Mar 22, 2025
  3f6d45d9 Browse Files