Commits · 39b008d28369c6e2467cb997654fcfc5d010af74 · ZhangXiaoyun / verl

12 Mar, 2025 7 commits

[misc] add assertion for normalized ppo_mini_batch_size (#552) · 39b008d2
Zheng-Yuxiang committed Mar 13, 2025

39b008d2 Browse Files
[fix] Fix config param issue (#558) · 329dcfe1
BearBiscuit committed Mar 12, 2025

329dcfe1 Browse Files
refactor: remove custom vllm weight loader and use model.load_weights directly (#543) · 6680185c
```
As we're moving to vllm>=0.7.3, we should remove `verl/third_party`
complelely in the future.
```
Joel committed Mar 12, 2025
6680185c Browse Files

Add Math-Verify Support (#545) · d4a00ef0

# Description

https://github.com/volcengine/verl/issues/287,
https://github.com/volcengine/verl/issues/295.
This PR introduces support for
[Math-Verify](https://github.com/huggingface/Math-Verify) as a new
rule-based reward scorer, significantly improving evaluation accuracy.

# Key changes

- Added `math-verify` to the installation dependencies.
- Introduced `reward_score/math_verify.py` and updated
`reward_score/__init__.py`.

# Test

Comparison between the existing scorer in math.py and the newly added
`math_verify.py`, using Qwen2.5-Math-7B-Instruct:

```
# Use scorer in math.py (original)
{'val/test_score/DigitalLearningGmbH/MATH-lighteval': 0.803}

# Use scorer in math_verify.py (newly added)
{'val/test_score/DigitalLearningGmbH/MATH-lighteval': 0.8338}
```

Test scripts:

```bash
set -x

# Data Process
python examples/data_preprocess/math_dataset.py --local_dir /workspace/datasets/math

# Evaluation
export CUDA_VISIBLE_DEVICES=4,5,6,7
export VLLM_ATTENTION_BACKEND=XFORMERS

math_train_path=/workspace/datasets/math/train.parquet
math_test_path=/workspace/datasets/math/test.parquet

python3 -m verl.trainer.main_ppo \
    data.train_files="$math_train_path" \
    data.val_files="$math_test_path" \
    data.max_prompt_length=2048 \
    data.max_response_length=2048 \
    actor_rollout_ref.model.path=Qwen/Qwen2.5-Math-7B-Instruct \
    actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \
    actor_rollout_ref.rollout.n=1 \
    actor_rollout_ref.rollout.temperature=0 \
    trainer.logger=['console'] \
    trainer.project_name='test-math-verify' \
    trainer.experiment_name='test-math-verify' \
    +trainer.val_before_train=True \
    trainer.n_gpus_per_node=4 \
    trainer.nnodes=1 \
    trainer.total_epochs=0 \
    data.train_batch_size=1024 \
    actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=1 \
    actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=1 \
    actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1 \
    algorithm.adv_estimator=grpo $@
```

committed Mar 12, 2025

d4a00ef0 Browse Files

[doc] update megatron core_r0.11.0 documentation (#562) · 1d12fe31
```
Urgently update megatron core_r0.11.0 documentation.
```
Blue Space committed Mar 12, 2025
1d12fe31 Browse Files
Update e2e_vlm_geo3k.yml (#563) · 0a11fc62
Chi Zhang committed Mar 12, 2025

0a11fc62 Browse Files
[Efficiency] feat: remove unnecessary empty_cache (#556) · 51a5ff99
```
This PR removes several unnecessary `empty_cache` to improve efficiency.

Credit to @PeterSH6
```
Shawn/Yuxuan Tong committed Mar 12, 2025
51a5ff99 Browse Files

11 Mar, 2025 2 commits
- [misc] feat: support vllm>0.7 world size 1 generation (#520) · be2c7058
  Guangming Sheng committed Mar 11, 2025
  
  be2c7058 Browse Files
- update README.md (#534) · b14299c8
```
1. add [PRIME](https://arxiv.org/abs/2502.01456) to README.md
2. slightly change the example script to align with the paper
```
  Zefan Wang committed Mar 11, 2025
  b14299c8 Browse Files
10 Mar, 2025 3 commits

recipe: PRIME algorithm (#362) · f0e7f9fc

Refactor and merge PRIME algorithm into verl/main
https://github.com/PRIME-RL/PRIME

Breaking changes:    
`trainer.fsdp_config.min_num_params` is now moved to `trainer.fsdp_config.wrap_policy.min_num_params`.

committed Mar 10, 2025

f0e7f9fc Browse Files

[bugfix] Fix position embedding processing for Qwen2.5-VL (#527) · 830fab2a

[bugfix] Fix position embedding processing for Qwen2.5-VL

In the `RLHFDataset.__getitem__` method, a bug was identified in how
multimodal position IDs (3D in Qwen2.5-VL) are determined. Previously,
the code checked for `self.image_key in row_dict` to decide whether to
use multimodal position IDs. However, since `self.image_key` is popped
from `row_dict` during image token expansion, this check incorrectly
fails for subsequent operations.

This causes the VL model to use incorrect position IDs, resulting in
significant performance degradation.

<img width="349" alt="image"
src="https://github.com/user-attachments/assets/79790bbf-239e-4667-a2c5-d63d91d63165"
/>


The fix introduces an explicit `is_multi_modal` flag to properly track
multimodal content throughout the processing pipeline.

Co-authored-by: songyifan <songyifan3@xiaomi.com>

committed Mar 10, 2025

830fab2a Browse Files

[perf] fix: correct meta weight init error to support hsdp (#508) · 872022d0

Current bugs when enable hsdp:
- **Incorrect Division in Batch Sizes**
- `ppo_micro_batch`, `ppo_minibatch`, etc... should be divided by
`self.device_mesh.size()` instead of `self.device_mesh.shape[0]`.
- **Improper Weight Initialization** in
`get_init_weight_context_manager`
- The `get_init_weight_context_manager` function must initialize empty
weights only on local_rank == 0 within every fsdp mesh.
- When `sync_module_states=True`, PyTorch's FSDP first broadcasts
parameters within the fsdp process group and then within the ddp process
group. If weights are not initialized correctly on `local_rank == 0` of
each fsdp mesh, the synchronization process may fail or produce
incorrect results.
https://github.com/pytorch/pytorch/blob/3f069e7679588d5ee4b1d5b2492ca0e20f9320b5/torch/distributed/fsdp/_init_utils.py#L614-L621
- Ensure initialization occurs only when
`self.device_mesh.get_coordinate()[-1] == 0`, which corresponds to
`local_rank == 0 `within each fsdp mesh.

committed Mar 10, 2025

872022d0 Browse Files

08 Mar, 2025 2 commits
- fix `_build_model_optimizer` when role is rollout, whose `optim_config` is None (#322) · f8acd901
  Haosheng Zou (邹昊晟) committed Mar 08, 2025
  
  f8acd901 Browse Files
- feat: support loading reward function from an external file (#452) · 13a87c76
  Lumeng Wu committed Mar 08, 2025
  
  13a87c76 Browse Files
07 Mar, 2025 8 commits

[CI] feat: auto cancel previous CI in the same PR (#499) · 90109ffd

- [x] Add concurrency to workflows to cancel previous workflows when new
commit is pushed to the same branch.
- [ ] Cancel all workflows/jobs from the same commit if any fails? (Not
sure whether we really need it)

Note: we leave out `secrets_scan.yml` and `scorecard.yml` to avoid any
possible leakage or security risk, which also cost little.

committed Mar 07, 2025

90109ffd Browse Files

Resolve the issue of PRIME getting stuck during math verification. (#469) · b7423038

Since searching for an appropriate `simplify` algorithm may cause
`sympy.simplify` to timeout, and `ProcessPool` may get stuck due to
excessive concurrency, the timeout mechanism in
`verl/verl/workers/reward_manager/prime.py` cannot capture the timeout.
To address this issue, a timeout detection mechanism is added to
`verl/verl/utils/reward_score/prime_math/__init__.py` for
`sympy.simplify` to solve it easily.

committed Mar 07, 2025

b7423038 Browse Files

[misc] feat: make filter long prompt an option (#506) · 386cfabe

# Background

In RLHFDataset, we filter out prompts that are too long. This requires
apply_chat_template to the whole dataset, which is not scalable when the
dataset is large.
https://github.com/volcengine/verl/blob/main/verl/utils/dataset/rl_dataset.py#L132

Instead of performing filtering online, we probably want to move this
process offline and add an assertion to avoid truncation or simply
perform truncation

Reference: #502 

# Key Changes

- Add an option `data.filter_overlong_prompts=True \` to enable the
above data filtering. The default value is set to False, but we enable
it for all the example scripts.
- Add an option `data.truncation` to truncate the input_ids or prompt
length if they
exceed max_prompt_length. The default is 'error', which does not allow
the
max_prompt_length to be exceeded. The users should increase the
max_prompt_length if
  throwing the error. You can also set `left` and `right`.

### Suggestion for large-scale dataset.
For large-scale datasets, filtering overlong prompts could be
time-consuming. You should set `data.filtering_overlong_prompts=False`
and set `truncation='left'`. Also, please note that you should increase
`data.max_prompt_length` to avoid over-truncation of the prompts.

committed Mar 07, 2025

386cfabe Browse Files

fix missing raise keyword in NotImplementedError for hdfs loading (#507) · fbad52e1
zhou fan committed Mar 07, 2025

fbad52e1 Browse Files
misc: precheck resource pool available to prevent pg hang (#505) · 0f0bc5a5
```
close #503
```
Joel committed Mar 07, 2025
0f0bc5a5 Browse Files

Verl's megatron core_r0.11.0 backend successfully tested with 3D parallelism… · 35555d8a

Verl's megatron core_r0.11.0 backend successfully tested with 3D parallelism with multiple bug fixed (#495)

This PR combines multiple modifications.

# QWen2.5 checkpoint saver bug fix

Thanks for the efforts @uygnef contributed to #368 , we use the new
saver for model loader and saver for 3D parallelism support.

# Megatron backend 3D-parallelism test benches

We modify the scripts in `examples/ppo_trainer` and `tests/e2e`, as well
as the CI workflows, all tested.

# Bug Fix for 3D-parallelism

Including configuration bugs as well as the module packing.

Original TP VocabParallelEntropy can lead to CUDA OOM, we refactor the
implementation with `torch.bmm`.

# Fully migration to Megatron Core

Now we only use Megatron core in verl, fully get rid of calling other
components. If they are in need, please integrate them into
`utils/megatron`.

---------

Co-authored-by: uygnef <admin@fengyu.org>

committed Mar 07, 2025

35555d8a Browse Files

test: Added the permission setting on the workflow (#504) · cb97d077
Willem Jiang committed Mar 07, 2025

cb97d077 Browse Files
[ckpt] sort pgs by node ip to make RANK consistent across nodes (#500) · becf7cb1
Joel committed Mar 07, 2025

becf7cb1 Browse Files

06 Mar, 2025 6 commits

fix: (1) skipped last step (2) redundant validation and logging (#409) · 3165d988

This PR solves these 2 following problems.

1. Last step skipped

`self.global_steps += 1` before if `self.global_steps >=
self.total_training_steps` makes the last step skipped.

We start from step 1, and we expect `self.total_training_steps` in
total.


https://github.com/volcengine/verl/blob/82b38e25c72e1b6de7d7d2092af6e1ed5dd2a400/verl/trainer/ppo/ray_trainer.py#L999-L1001

   When `self.global_steps == self.total_training_steps-1`:

   * we have only executed `self.total_training_steps-1` steps

   * `self.global_steps` is updated to `self.total_training_steps`
* `self.global_steps >= self.total_training_steps` is satisfied, and the
training ends.

   Therefore, we should put `self.global_steps += 1` at last

2. redundant validation and logging

If `self.total_training_steps % self.config.trainer.test_freq == 0` :

   * `self._validate()` will be executed twice 

1.
https://github.com/volcengine/verl/blob/82b38e25c72e1b6de7d7d2092af6e1ed5dd2a400/verl/trainer/ppo/ray_trainer.py#L984

2.
https://github.com/volcengine/verl/blob/82b38e25c72e1b6de7d7d2092af6e1ed5dd2a400/verl/trainer/ppo/ray_trainer.py#L1005

   * logging will also be executed twice

1.
https://github.com/volcengine/verl/blob/82b38e25c72e1b6de7d7d2092af6e1ed5dd2a400/verl/trainer/ppo/ray_trainer.py#L985
and
https://github.com/volcengine/verl/blob/82b38e25c72e1b6de7d7d2092af6e1ed5dd2a400/verl/trainer/ppo/ray_trainer.py#L997
2.
https://github.com/volcengine/verl/blob/82b38e25c72e1b6de7d7d2092af6e1ed5dd2a400/verl/trainer/ppo/ray_trainer.py#L1007

committed Mar 06, 2025

3165d988 Browse Files

[misc] feat: add allgather method to dataproto (#497) · 0cc2bdad
```
- Add allgather method to dataproto
- Add tests
- Replace existing raw allgather with this function
```
Chi Zhang committed Mar 06, 2025
0cc2bdad Browse Files
[Hardware] Support AMD (Rocm kernel) (#360) · 4a291fa7
Yusheng (Ethan) Su committed Mar 06, 2025

4a291fa7 Browse Files

[feat] add val_generations_to_log_to_swanlab (#480) · 75dedb57

In this PR, a `val_generations_to_log_to_swanlab` parameter has been
added. When this parameter is set to 1, it supports logging the
generated text from eval in SwanLab.

@hiyouga 

---

This pull request introduces logging of validation generations to
Swanlab in addition to Wandb. The changes include updates to several
configuration files and the addition of a new logging method in the
`ray_trainer.py` file.

Key changes include:

### Configuration Updates:
* Added `val_generations_to_log_to_swanlab` parameter to the `trainer`
section in the following configuration files:
  * `examples/split_placement/config/ppo_trainer_split.yaml`
  * `verl/trainer/config/ppo_megatron_trainer.yaml`
  * `verl/trainer/config/ppo_trainer.yaml`

### Code Updates:
* Added a new method `_maybe_log_val_generations_to_swanlab` to log
validation samples to Swanlab in `verl/trainer/ppo/ray_trainer.py`
* Updated the `_validate` method to call the new Swanlab logging method
in `verl/trainer/ppo/ray_trainer.py`

---

committed Mar 06, 2025

75dedb57 Browse Files

[fix] support for extra_info in prime mode (#476) · 25e1a982

### What does this PR do?
In the `naive` mode, passing `extra_info` information for reward
function calculation is
supported(https://github.com/volcengine/verl/pull/266), but the support
for the `prime` mode is missing. This will cause the reward functions
that use `extra_info` to fail to produce correct results in the `prime`
mode. This commit fixes this issue.
### Who can review?
@PeterSH6 @vermouth1992 @hiyouga or other people who have the authority?

committed Mar 06, 2025

25e1a982 Browse Files

[ci] feat: add ci timeout (#487) · c15c6447
```
Set timeout in CI to avoid infinite hang.
close #468
```
Chi Zhang committed Mar 06, 2025
c15c6447 Browse Files

05 Mar, 2025 5 commits

Add cognitive behavior paper (#489) · d414c479
Chi Zhang committed Mar 05, 2025

d414c479 Browse Files

[docs] update logger documentation (#482) · 686438ca

This pull request includes updates to the `docs/examples/config.rst`
file to enhance the documentation for the `Trainer` configuration. The
most important changes involve expanding the support for various logging
platforms.

Documentation updates:

*
[`docs/examples/config.rst`](diffhunk://#diff-f051f6df5187cb4805be686b3d10c480877a01e9a35ed98cd63cf8da6af03772L352-R354):
Updated the descriptions for `trainer.project_name`,
`trainer.experiment_name`, and `trainer.logger` to include support for
additional logging platforms such as swanlab, mlflow, and tensorboard.

committed Mar 05, 2025

686438ca Browse Files

support speed up downloading model from modelscope (#463) · 7a5e9496

Add support for downloading models from modelscope by setting
`VERL_USE_MODELSCOPE=True`

---------

Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn>

committed Mar 05, 2025

7a5e9496 Browse Files

docs: add meetup info, and skythought (#478) · 1ca4bfaf
HL committed Mar 05, 2025

1ca4bfaf Browse Files
[feat] support mfu calculation for megatron_workers (#475) · 6d7d3707
```
calculate mfu in update actor/critic when using megatron workers
```
Mingjie LIU committed Mar 05, 2025
6d7d3707 Browse Files

04 Mar, 2025 5 commits
- [fix] use bicubic resampler for resizing image (#474) · b0e7a942
  hoshi-hiyouga committed Mar 05, 2025
  
  b0e7a942 Browse Files
- [CI] Add e2e_ascend CI (#465) · d78186d4
```
This PR is a continuing work of #448 , in order to support e2e CI for
Ascend NPU.
```
  Shuqiao Li committed Mar 04, 2025
  d78186d4 Browse Files
- [doc] add DeepRetrieval to awesome work (#464) · 03e0efaa
```
add DeepRetrieval to README Awesome work
```
  Patrick Jiang committed Mar 04, 2025
  03e0efaa Browse Files
- [fix] separate prompt and response in reward manager (#459) · 27d72812
```
## What does this PR do?

1. Separate the prompt part and the response part in reward manager to
avoid the reward leakage of format reward.
2. Update the reward score function for Geometry3k dataset.
3. Update the content in the readme file.

## Who can review?

@vermouth1992 @PeterSH6
```
  hoshi-hiyouga committed Mar 04, 2025
  27d72812 Browse Files
- [doc] add ReSearch to awesome work (#461) · 296e4111
```
add ReSearch to README Awesome work
```
  Mingyang Chen committed Mar 04, 2025
  296e4111 Browse Files
03 Mar, 2025 2 commits

Update install.rst fix typo (#450) · 65cceb3c
Shuqiao Li committed Mar 03, 2025

65cceb3c Browse Files

[feat] Initial support for VLMs, add Qwen2.5VL GRPO example (#386) · b46f55ec

## What does this PR do?

This PR migrates the feature of RL on VLMs in our implementation in
[EasyR1](https://github.com/hiyouga/EasyR1) fork back to veRL. We have
validated this feature using Qwen2.5-VL 7B model on 8*H100 GPUs. The
configuration and data processing script are provided along this PR for
easy reproducing.

## How to reproduce?

1. Download and preprocess the dataset

```bash
python3 examples/data_preprocess/geo3k.py --local_dir ~/data/geo3k
```

2. Start GRPO training

```bash
bash examples/grpo_trainer/run_qwen2_5_vl-7b.sh
```

## Dependencies

- vllm>=0.7.3
- transformers>=4.49.0
- [qwen-vl-utils](https://pypi.org/project/qwen-vl-utils/)
- [mathruler](https://pypi.org/project/mathruler/)

## Major Changes

### New dataflow for multimodal RL

In this PR, we introduce two new concepts in the dataflow,
`multi_modal_data` and `multi_modal_inputs`. The former means the
multi-modal features required by the **rollout** worker (such as vLLM),
while the latter means the multi-modal features required by the
**actor/critic** worker (such as an HF model). They are different
because the rollout and actor workers have their own data format
requirements.

Taking Qwen2-VL + huggingface + vLLM as an example, the data structure
should be:

- **multi_modal_data**: {"image": [PIL.Image, PIL.Image, ...]}
- **multi_modal_inputs**: {"pixel_values": torch.Tensor,
"image_grid_thw": torch.Tensor}

Both of them are converted to numpy objects and placed in the non-tensor
batch in DataProto.

This design can be extended to other modalities/VLMs easily due to the
agnostic of models.

### Other changes

- Data
- Support pre-processing the
[Geometry3k](https://huggingface.co/datasets/hiyouga/geometry3k)
dataset.
- Support `config.data.image_key`, which should be **a list of Pillow
images**.

- Actor/Ref/Critic
  - Support `multi_modal_inputs`.
  - Process position ids to adapt to the m-rope .

- Rollout
- Update dtensor weight loader to adapt to the Qwen2-VL architecture in
vLLM 0.7+.
  - Support `multi_modal_data`.
- Use `raw_prompt_ids` as the vLLM inputs to **avoid unpadding** the
input ids.

- Reward Manager
- Add **mathruler** for more accurate math scores on the Geometry 3k
dataset

- Models
  - Support calculating the position ids for the m-rope in Qwen2-VL.
- Support removing padding in flash attention2 for m-rope (transformers
itself **does not support it**).

- Sharding Manager
  - Support all-gathering the non-tensor batch.

- FSDP Workers / Checkpoint Merger
  - Support `AutoModelForVision2Seq` at model initialization.

Note: The Ulysses parallelism is not completed yet. We will support it
in the next update.

## Performance

We provide the estimated MFU of the language model part for H100 GPUs.
These values are lower than the actual ones because **we did not compute
the FLOPs of the vision tower part**.

- `remove_padding=False`: MFU ~7%
- `remove_padding=True`: MFU ~20%

The training and test reward score curves are presented as follows.


![image](https://github.com/user-attachments/assets/ecb9fc27-8591-4c5b-ae4b-4ba77c6e30f9)

## Who can review?

@vermouth1992 @PeterSH6

committed Mar 03, 2025

b46f55ec Browse Files