Commits · 545746904db9ef8b071cd087790d367967978c80 · ZhangXiaoyun / verl

14 Mar, 2025 4 commits

[Config] Providing an option to turn off `torch.compile` in actor (#554) · 54574690

## Summary

Providing an option in the config to turn off the `torch.compile` used
in `dp_actor.py`

## Usage

Adding the following line to the driver or cli scripts to turn off
`torch.compile`.
```python
+actor_rollout_ref.actor.use_torch_compile=False
```
Otherwise, `torch.compile` will be used by default

## Related Issue

#354 #245

---------

Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>

committed Mar 14, 2025

54574690 Browse Files

misc: remove redundant .to(device) (#565) · d2db7252

As a `DataProto` instance, calling `to(device)` already moves data.batch
to the specified device.

https://github.com/volcengine/verl/blob/329dcfe1dd60f2d736ee55914e2a49e1887718eb/verl/protocol.py#L324-L336

committed Mar 14, 2025

d2db7252 Browse Files

doc: add multinode training and debug tutorial (#585) · 9ae01af2
```
#354
```
Joel committed Mar 14, 2025
9ae01af2 Browse Files
docs: fix hardcoded parameters in the Slurm example (#588) · cae8d2f5
```
Follow-up to https://github.com/volcengine/verl/pull/309
```
Chenhui Zhang committed Mar 14, 2025
cae8d2f5 Browse Files

13 Mar, 2025 6 commits

fix bug #544 that 'left' and 'right' config for truncation don't work (#583) · e7c40b35
none0663 committed Mar 13, 2025

e7c40b35 Browse Files
fix: remove redundant broadcast in fsdp vllm postprocess (#577) · f7e183e4
```
Remove redundant broadcast in fsdp vllm postprocess since vllm output in
each tp rank should be identical.
```
Joel committed Mar 13, 2025
f7e183e4 Browse Files

fix: remove redundant torch.cuda.empty_cache() (#575) · 3fc3e2b7

#556 take effort to remove remove unnecessary empty_cache, but will
cause CUDA oom at vllm wake_up.
```text
  File "/opt/tiger/ray/session_2025-03-13_12-11-30_408315_2895/runtime_resources/working_dir_files/_ray_pkg_a64b690733067c5c/verl/workers/fsdp_workers.py", line 481, in generate_sequences
    with self.rollout_sharding_manager:
  File "/opt/tiger/ray/session_2025-03-13_12-11-30_408315_2895/runtime_resources/working_dir_files/_ray_pkg_a64b690733067c5c/verl/workers/sharding_manager/fsdp_vllm.py", line 82, in __enter__
    self.inference_engine.wake_up()
  File "/usr/local/lib/python3.11/dist-packages/vllm/entrypoints/llm.py", line 1244, in wake_up
    self.llm_engine.wake_up()
  File "/usr/local/lib/python3.11/dist-packages/vllm/engine/llm_engine.py", line 1859, in wake_up
    self.model_executor.wake_up()
  File "/usr/local/lib/python3.11/dist-packages/vllm/executor/executor_base.py", line 216, in wake_up
    self.collective_rpc("wake_up")
  File "/usr/local/lib/python3.11/dist-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
    answer = run_method(self.driver_worker, method, args, kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/vllm/utils.py", line 2196, in run_method
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/vllm/worker/worker.py", line 140, in wake_up
    allocator.wake_up()
  File "/usr/local/lib/python3.11/dist-packages/vllm/device_allocator/cumem.py", line 207, in wake_up
    create_and_map(handle)
  File "/usr/local/lib/python3.11/dist-packages/vllm/device_allocator/cumem.py", line 75, in create_and_map
    python_create_and_map(*allocation_handle)
RuntimeError: CUDA Error: out of memory at /workspace/csrc/cumem_allocator.cpp:62
```
This PR remove all redundant `torch.cuda.empty_cache()` in FSDP worker
and only empty cache before vllm wake_up and after vllm sleep, since
vllm has its own caching memory allocator
[CuMemAllocator](https://github.com/vllm-project/vllm/blob/v0.7.3/vllm/device_allocator/cumem.py#L103).
Out of vllm scope, we should avoid empty cache to let pytorch using
caching memory to speed up memory allocations.

- [x] Cleanup FSDP worker torch.cuda.empty_cache()
- [ ] Cleanup Megatron worker torch.cuda.empty_cache()

committed Mar 13, 2025

3fc3e2b7 Browse Files

[bugfix] PRIME filter overlong propmts & padding side incorrect & use xformers (#570) · 9bb02d27

### Description
- fix filter_overlong_prompts setting in PRIME

- fix padding side incorrect for Qwen in PRIME 

- When I utilize PRIME recipe to train Qwen series models, I got
“*ValueError: You are attempting to perform batched generation with
padding_side='right' this may lead to unexpected behaviour for Flash
Attention version of Qwen2. Make sure to call tokenizer.padding_side =
'left' before tokenizing the input.*” So I set `use_cache = False` when
calling model to calculate output logits.

- fix CUDA error with vllm v0.6.3 

- When I run PRIME, I may get an error — *CUDA error: an illegal memory
access was encountered*. According to
https://github.com/vllm-project/vllm/issues/10389, I set
`VLLM_ATTENTION_BACKEND=XFORMERS` .

committed Mar 13, 2025

9bb02d27 Browse Files

[bugfix] fix: generation script (#542) · 79e072f1

# Description
- Corrected dummy size to avoid faulty communication.
- Fixed batch number calculation.
- Adjusted worker group role to alleviate memory overhead.
- Add ray.init() to prevent failing to register worker.

committed Mar 13, 2025

79e072f1 Browse Files

[rollout] feat: support sampling in validation stage (#553) · d5de9f4c

Currently, eager mode is applied in the validation stage. However, in
some reasoning tasks, we may need to generate n times and average the
scores.

In this PR, we support using non-eager sampling parameters during
validation by specifying the `val_kwargs` in `actor_rollout_ref.rollout`
config field.


**Future work**
- [ ] Merge `vllm_rollout_spmd.py` and `vllm_rollout.py` into one file.

committed Mar 13, 2025

d5de9f4c Browse Files

12 Mar, 2025 7 commits

[misc] add assertion for normalized ppo_mini_batch_size (#552) · 39b008d2
Zheng-Yuxiang committed Mar 13, 2025

39b008d2 Browse Files
[fix] Fix config param issue (#558) · 329dcfe1
BearBiscuit committed Mar 12, 2025

329dcfe1 Browse Files
refactor: remove custom vllm weight loader and use model.load_weights directly (#543) · 6680185c
```
As we're moving to vllm>=0.7.3, we should remove `verl/third_party`
complelely in the future.
```
Joel committed Mar 12, 2025
6680185c Browse Files

Add Math-Verify Support (#545) · d4a00ef0

# Description

https://github.com/volcengine/verl/issues/287,
https://github.com/volcengine/verl/issues/295.
This PR introduces support for
[Math-Verify](https://github.com/huggingface/Math-Verify) as a new
rule-based reward scorer, significantly improving evaluation accuracy.

# Key changes

- Added `math-verify` to the installation dependencies.
- Introduced `reward_score/math_verify.py` and updated
`reward_score/__init__.py`.

# Test

Comparison between the existing scorer in math.py and the newly added
`math_verify.py`, using Qwen2.5-Math-7B-Instruct:

```
# Use scorer in math.py (original)
{'val/test_score/DigitalLearningGmbH/MATH-lighteval': 0.803}

# Use scorer in math_verify.py (newly added)
{'val/test_score/DigitalLearningGmbH/MATH-lighteval': 0.8338}
```

Test scripts:

```bash
set -x

# Data Process
python examples/data_preprocess/math_dataset.py --local_dir /workspace/datasets/math

# Evaluation
export CUDA_VISIBLE_DEVICES=4,5,6,7
export VLLM_ATTENTION_BACKEND=XFORMERS

math_train_path=/workspace/datasets/math/train.parquet
math_test_path=/workspace/datasets/math/test.parquet

python3 -m verl.trainer.main_ppo \
    data.train_files="$math_train_path" \
    data.val_files="$math_test_path" \
    data.max_prompt_length=2048 \
    data.max_response_length=2048 \
    actor_rollout_ref.model.path=Qwen/Qwen2.5-Math-7B-Instruct \
    actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \
    actor_rollout_ref.rollout.n=1 \
    actor_rollout_ref.rollout.temperature=0 \
    trainer.logger=['console'] \
    trainer.project_name='test-math-verify' \
    trainer.experiment_name='test-math-verify' \
    +trainer.val_before_train=True \
    trainer.n_gpus_per_node=4 \
    trainer.nnodes=1 \
    trainer.total_epochs=0 \
    data.train_batch_size=1024 \
    actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=1 \
    actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=1 \
    actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1 \
    algorithm.adv_estimator=grpo $@
```

committed Mar 12, 2025

d4a00ef0 Browse Files

[doc] update megatron core_r0.11.0 documentation (#562) · 1d12fe31
```
Urgently update megatron core_r0.11.0 documentation.
```
Blue Space committed Mar 12, 2025
1d12fe31 Browse Files
Update e2e_vlm_geo3k.yml (#563) · 0a11fc62
Chi Zhang committed Mar 12, 2025

0a11fc62 Browse Files
[Efficiency] feat: remove unnecessary empty_cache (#556) · 51a5ff99
```
This PR removes several unnecessary `empty_cache` to improve efficiency.

Credit to @PeterSH6
```
Shawn/Yuxuan Tong committed Mar 12, 2025
51a5ff99 Browse Files

11 Mar, 2025 2 commits
- [misc] feat: support vllm>0.7 world size 1 generation (#520) · be2c7058
  Guangming Sheng committed Mar 11, 2025
  
  be2c7058 Browse Files
- update README.md (#534) · b14299c8
```
1. add [PRIME](https://arxiv.org/abs/2502.01456) to README.md
2. slightly change the example script to align with the paper
```
  Zefan Wang committed Mar 11, 2025
  b14299c8 Browse Files
10 Mar, 2025 3 commits

recipe: PRIME algorithm (#362) · f0e7f9fc

Refactor and merge PRIME algorithm into verl/main
https://github.com/PRIME-RL/PRIME

Breaking changes:    
`trainer.fsdp_config.min_num_params` is now moved to `trainer.fsdp_config.wrap_policy.min_num_params`.

committed Mar 10, 2025

f0e7f9fc Browse Files

[bugfix] Fix position embedding processing for Qwen2.5-VL (#527) · 830fab2a

[bugfix] Fix position embedding processing for Qwen2.5-VL

In the `RLHFDataset.__getitem__` method, a bug was identified in how
multimodal position IDs (3D in Qwen2.5-VL) are determined. Previously,
the code checked for `self.image_key in row_dict` to decide whether to
use multimodal position IDs. However, since `self.image_key` is popped
from `row_dict` during image token expansion, this check incorrectly
fails for subsequent operations.

This causes the VL model to use incorrect position IDs, resulting in
significant performance degradation.

<img width="349" alt="image"
src="https://github.com/user-attachments/assets/79790bbf-239e-4667-a2c5-d63d91d63165"
/>


The fix introduces an explicit `is_multi_modal` flag to properly track
multimodal content throughout the processing pipeline.

Co-authored-by: songyifan <songyifan3@xiaomi.com>

committed Mar 10, 2025

830fab2a Browse Files

[perf] fix: correct meta weight init error to support hsdp (#508) · 872022d0

Current bugs when enable hsdp:
- **Incorrect Division in Batch Sizes**
- `ppo_micro_batch`, `ppo_minibatch`, etc... should be divided by
`self.device_mesh.size()` instead of `self.device_mesh.shape[0]`.
- **Improper Weight Initialization** in
`get_init_weight_context_manager`
- The `get_init_weight_context_manager` function must initialize empty
weights only on local_rank == 0 within every fsdp mesh.
- When `sync_module_states=True`, PyTorch's FSDP first broadcasts
parameters within the fsdp process group and then within the ddp process
group. If weights are not initialized correctly on `local_rank == 0` of
each fsdp mesh, the synchronization process may fail or produce
incorrect results.
https://github.com/pytorch/pytorch/blob/3f069e7679588d5ee4b1d5b2492ca0e20f9320b5/torch/distributed/fsdp/_init_utils.py#L614-L621
- Ensure initialization occurs only when
`self.device_mesh.get_coordinate()[-1] == 0`, which corresponds to
`local_rank == 0 `within each fsdp mesh.

committed Mar 10, 2025

872022d0 Browse Files

08 Mar, 2025 2 commits
- fix `_build_model_optimizer` when role is rollout, whose `optim_config` is None (#322) · f8acd901
  Haosheng Zou (邹昊晟) committed Mar 08, 2025
  
  f8acd901 Browse Files
- feat: support loading reward function from an external file (#452) · 13a87c76
  Lumeng Wu committed Mar 08, 2025
  
  13a87c76 Browse Files
07 Mar, 2025 8 commits

[CI] feat: auto cancel previous CI in the same PR (#499) · 90109ffd

- [x] Add concurrency to workflows to cancel previous workflows when new
commit is pushed to the same branch.
- [ ] Cancel all workflows/jobs from the same commit if any fails? (Not
sure whether we really need it)

Note: we leave out `secrets_scan.yml` and `scorecard.yml` to avoid any
possible leakage or security risk, which also cost little.

committed Mar 07, 2025

90109ffd Browse Files

Resolve the issue of PRIME getting stuck during math verification. (#469) · b7423038

Since searching for an appropriate `simplify` algorithm may cause
`sympy.simplify` to timeout, and `ProcessPool` may get stuck due to
excessive concurrency, the timeout mechanism in
`verl/verl/workers/reward_manager/prime.py` cannot capture the timeout.
To address this issue, a timeout detection mechanism is added to
`verl/verl/utils/reward_score/prime_math/__init__.py` for
`sympy.simplify` to solve it easily.

committed Mar 07, 2025

b7423038 Browse Files

[misc] feat: make filter long prompt an option (#506) · 386cfabe

# Background

In RLHFDataset, we filter out prompts that are too long. This requires
apply_chat_template to the whole dataset, which is not scalable when the
dataset is large.
https://github.com/volcengine/verl/blob/main/verl/utils/dataset/rl_dataset.py#L132

Instead of performing filtering online, we probably want to move this
process offline and add an assertion to avoid truncation or simply
perform truncation

Reference: #502 

# Key Changes

- Add an option `data.filter_overlong_prompts=True \` to enable the
above data filtering. The default value is set to False, but we enable
it for all the example scripts.
- Add an option `data.truncation` to truncate the input_ids or prompt
length if they
exceed max_prompt_length. The default is 'error', which does not allow
the
max_prompt_length to be exceeded. The users should increase the
max_prompt_length if
  throwing the error. You can also set `left` and `right`.

### Suggestion for large-scale dataset.
For large-scale datasets, filtering overlong prompts could be
time-consuming. You should set `data.filtering_overlong_prompts=False`
and set `truncation='left'`. Also, please note that you should increase
`data.max_prompt_length` to avoid over-truncation of the prompts.

committed Mar 07, 2025

386cfabe Browse Files

fix missing raise keyword in NotImplementedError for hdfs loading (#507) · fbad52e1
zhou fan committed Mar 07, 2025

fbad52e1 Browse Files
misc: precheck resource pool available to prevent pg hang (#505) · 0f0bc5a5
```
close #503
```
Joel committed Mar 07, 2025
0f0bc5a5 Browse Files

Verl's megatron core_r0.11.0 backend successfully tested with 3D parallelism… · 35555d8a

Verl's megatron core_r0.11.0 backend successfully tested with 3D parallelism with multiple bug fixed (#495)

This PR combines multiple modifications.

# QWen2.5 checkpoint saver bug fix

Thanks for the efforts @uygnef contributed to #368 , we use the new
saver for model loader and saver for 3D parallelism support.

# Megatron backend 3D-parallelism test benches

We modify the scripts in `examples/ppo_trainer` and `tests/e2e`, as well
as the CI workflows, all tested.

# Bug Fix for 3D-parallelism

Including configuration bugs as well as the module packing.

Original TP VocabParallelEntropy can lead to CUDA OOM, we refactor the
implementation with `torch.bmm`.

# Fully migration to Megatron Core

Now we only use Megatron core in verl, fully get rid of calling other
components. If they are in need, please integrate them into
`utils/megatron`.

---------

Co-authored-by: uygnef <admin@fengyu.org>

committed Mar 07, 2025

35555d8a Browse Files

test: Added the permission setting on the workflow (#504) · cb97d077
Willem Jiang committed Mar 07, 2025

cb97d077 Browse Files
[ckpt] sort pgs by node ip to make RANK consistent across nodes (#500) · becf7cb1
Joel committed Mar 07, 2025

becf7cb1 Browse Files

06 Mar, 2025 6 commits

fix: (1) skipped last step (2) redundant validation and logging (#409) · 3165d988

This PR solves these 2 following problems.

1. Last step skipped

`self.global_steps += 1` before if `self.global_steps >=
self.total_training_steps` makes the last step skipped.

We start from step 1, and we expect `self.total_training_steps` in
total.


https://github.com/volcengine/verl/blob/82b38e25c72e1b6de7d7d2092af6e1ed5dd2a400/verl/trainer/ppo/ray_trainer.py#L999-L1001

   When `self.global_steps == self.total_training_steps-1`:

   * we have only executed `self.total_training_steps-1` steps

   * `self.global_steps` is updated to `self.total_training_steps`
* `self.global_steps >= self.total_training_steps` is satisfied, and the
training ends.

   Therefore, we should put `self.global_steps += 1` at last

2. redundant validation and logging

If `self.total_training_steps % self.config.trainer.test_freq == 0` :

   * `self._validate()` will be executed twice 

1.
https://github.com/volcengine/verl/blob/82b38e25c72e1b6de7d7d2092af6e1ed5dd2a400/verl/trainer/ppo/ray_trainer.py#L984

2.
https://github.com/volcengine/verl/blob/82b38e25c72e1b6de7d7d2092af6e1ed5dd2a400/verl/trainer/ppo/ray_trainer.py#L1005

   * logging will also be executed twice

1.
https://github.com/volcengine/verl/blob/82b38e25c72e1b6de7d7d2092af6e1ed5dd2a400/verl/trainer/ppo/ray_trainer.py#L985
and
https://github.com/volcengine/verl/blob/82b38e25c72e1b6de7d7d2092af6e1ed5dd2a400/verl/trainer/ppo/ray_trainer.py#L997
2.
https://github.com/volcengine/verl/blob/82b38e25c72e1b6de7d7d2092af6e1ed5dd2a400/verl/trainer/ppo/ray_trainer.py#L1007

committed Mar 06, 2025

3165d988 Browse Files

[misc] feat: add allgather method to dataproto (#497) · 0cc2bdad
```
- Add allgather method to dataproto
- Add tests
- Replace existing raw allgather with this function
```
Chi Zhang committed Mar 06, 2025
0cc2bdad Browse Files
[Hardware] Support AMD (Rocm kernel) (#360) · 4a291fa7
Yusheng (Ethan) Su committed Mar 06, 2025

4a291fa7 Browse Files

[feat] add val_generations_to_log_to_swanlab (#480) · 75dedb57

In this PR, a `val_generations_to_log_to_swanlab` parameter has been
added. When this parameter is set to 1, it supports logging the
generated text from eval in SwanLab.

@hiyouga 

---

This pull request introduces logging of validation generations to
Swanlab in addition to Wandb. The changes include updates to several
configuration files and the addition of a new logging method in the
`ray_trainer.py` file.

Key changes include:

### Configuration Updates:
* Added `val_generations_to_log_to_swanlab` parameter to the `trainer`
section in the following configuration files:
  * `examples/split_placement/config/ppo_trainer_split.yaml`
  * `verl/trainer/config/ppo_megatron_trainer.yaml`
  * `verl/trainer/config/ppo_trainer.yaml`

### Code Updates:
* Added a new method `_maybe_log_val_generations_to_swanlab` to log
validation samples to Swanlab in `verl/trainer/ppo/ray_trainer.py`
* Updated the `_validate` method to call the new Swanlab logging method
in `verl/trainer/ppo/ray_trainer.py`

---

committed Mar 06, 2025

75dedb57 Browse Files

[fix] support for extra_info in prime mode (#476) · 25e1a982

### What does this PR do?
In the `naive` mode, passing `extra_info` information for reward
function calculation is
supported(https://github.com/volcengine/verl/pull/266), but the support
for the `prime` mode is missing. This will cause the reward functions
that use `extra_info` to fail to produce correct results in the `prime`
mode. This commit fixes this issue.
### Who can review?
@PeterSH6 @vermouth1992 @hiyouga or other people who have the authority?

committed Mar 06, 2025

25e1a982 Browse Files

[ci] feat: add ci timeout (#487) · c15c6447
```
Set timeout in CI to avoid infinite hang.
close #468
```
Chi Zhang committed Mar 06, 2025
c15c6447 Browse Files

05 Mar, 2025 2 commits

Add cognitive behavior paper (#489) · d414c479
Chi Zhang committed Mar 05, 2025

d414c479 Browse Files

[docs] update logger documentation (#482) · 686438ca

This pull request includes updates to the `docs/examples/config.rst`
file to enhance the documentation for the `Trainer` configuration. The
most important changes involve expanding the support for various logging
platforms.

Documentation updates:

*
[`docs/examples/config.rst`](diffhunk://#diff-f051f6df5187cb4805be686b3d10c480877a01e9a35ed98cd63cf8da6af03772L352-R354):
Updated the descriptions for `trainer.project_name`,
`trainer.experiment_name`, and `trainer.logger` to include support for
additional logging platforms such as swanlab, mlflow, and tensorboard.

committed Mar 05, 2025

686438ca Browse Files