Commits · dd09d47fe2dd22ffd11d7229e20d5cdf46fb566b · ZhangXiaoyun / verl

19 Feb, 2025 1 commit
- Added content permissions of the workflow (#303) · dd09d47f
```
We need to specify the minimum  permission in the workflow.
```
  Willem Jiang committed Feb 19, 2025
  dd09d47f Browse Files
18 Feb, 2025 3 commits
- fix: fix offload/load optimizer impl (#299) · cf54dd17
```
Avoid CPU-to-device loading or offloading when the optimizer is not
initialized to prevent the incorrect creation of the optimizer.state
```
  Mofan committed Feb 18, 2025
  cf54dd17 Browse Files
- docs: update news · bb939b70
  HL committed Feb 18, 2025
  
  bb939b70 Browse Files
- example: fix the gemma2 example, update NGC dockerfile (#291) · 77f065ea
  HL committed Feb 18, 2025
  
  77f065ea Browse Files
17 Feb, 2025 3 commits

Fix wrongs args desc . (#294) · 0dfcb7f9

1 fix wrong notes description.
2 fix wrong code path.

Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>

committed Feb 17, 2025

0dfcb7f9 Browse Files

[misc] feat: support offload parameter and optimizer during rollout (#284) · 9db52329

- Fixed FSDP1 model offload
- With `actor_rollout_ref.actor.fsdp_config.param_offload=True \` and
`actor_rollout_ref.actor.fsdp_config.optimizer_offload=True \ `. The GPU
memory utilization can increase to 0.9
- With actor, critic and reference offload all enabled, there will only
be one model copy at a time in the GPU memory. Therefore, we can further
increase the `micro_batch_size_per_gpu` or `max_token_per_gpu`

**Specifically:**
- During rollout, only rollout model and KVCache are in the GPU memory.
- During critic compute values, only the critic model will stay in the
GPU memory while its optimizer and other model states are in CPU main
memory
- During actor update, the actor model, optimizer are stored on GPU
while the reference model and critic model, critic optimizer are
offloaded to CPU.

committed Feb 17, 2025

9db52329 Browse Files

Enhancement: Support for `extra_info` in Reward Calculation (#266) · f0e5bdf0

### **Enhancement: Support for `extra_info` in Reward Calculation**  

#### **Summary**  
This update enhances the reward computation process by introducing an
additional `extra_info` parameter. This allows users to pass in more
contextual information when calculating rewards, improving flexibility
for different datasets.

#### **Changes Made**  
- **Updated `_default_compute_score`** to accept an `extra_info`
argument:
  ```python
def _default_compute_score(data_source, solution_str, ground_truth,
extra_info):
  ```
- **Modified the reward manager (`naive.py`)** to pass `extra_info` from
`data_item.non_tensor_batch` to `compute_score`:
  ```python
  extra_info = data_item.non_tensor_batch['extra_info']
  score = self.compute_score(
      data_source=data_source,
      solution_str=sequences_str,
      ground_truth=ground_truth,
      extra_info=extra_info,
  )
  ```
  
#### **Why This Change?**  
- Some datasets require additional context beyond `data_source`,
`solution_str`, and `ground_truth` for accurate reward computation.
- The new `extra_info` field allows users to pass custom metadata,
ideally in dictionary form, as specified in the [official
documentation](https://verl.readthedocs.io/en/latest/preparation/prepare_data.html).
- This change maintains compatibility with existing dataset processing
scripts, as they already include the `extra_info` field.

#### **Impact**  
- **Improved flexibility**: Users can now pass additional contextual
information, making reward computation more adaptable to different
datasets.
- **Backward compatibility**: Since all example datasets already include
`extra_info`, this update should integrate seamlessly.

Let me know if any modifications are needed!

committed Feb 17, 2025

f0e5bdf0 Browse Files

16 Feb, 2025 1 commit
- distro: make liger-kernel optional. do not rely on requirement.txt during setup (#286) · 0c32cf78
  HL committed Feb 16, 2025
  
  0c32cf78 Browse Files
15 Feb, 2025 7 commits

fix the split placement example (#281) · c8b9c355

The split placement example is outdated, I tried it and encountered some
errors. To address this, the following changes were made in this PR
1. Copied the content from `verl/trainer/config/ppo_trainer.yaml` to
`examples/split_placement/config/ppo_trainer_split.yaml`
2. Copied `RayPPOTrainer.fit` method into the `fit` func in
`examples/split_placement/split_monkey_patch.py` and modified it to get
the futures of `critic_output` and `actor_output`

committed Feb 16, 2025

c8b9c355 Browse Files

release: bump up version to v0.2 · 828df7e8
HL committed Feb 15, 2025

828df7e8 Browse Files
[doc] Give an additional instruction in building nightly vLLM (#282) · e47718f6
```
Co-authored-by: zhangshulai <zhangshulai@bytedance.com>
```
ZSL98 committed Feb 15, 2025
e47718f6 Browse Files

feat: Expose `remove_previous_ckpt` option to training entry point an… (#274) · f3afdb33

Related issue: https://github.com/volcengine/verl/issues/273

- Add `remove_previous_ckpt_in_save` and `del_local_ckpt_after_load`
configuration option in `ppo_trainer.yaml`
- Update `RayPPOTrainer` to support optional checkpoint deletion during
loading
- Modify `ActorRolloutRefWorker` and `CriticWorker` to pass checkpoint
removal flag

committed Feb 15, 2025

f3afdb33 Browse Files

[misc] fix install requirement (#279) · f1e13a60
```
- Rollout back vllm version (vllm > 0.7.0 only for testing)
- pyext as an extra requirement
```
Guangming Sheng committed Feb 15, 2025
f1e13a60 Browse Files

Add auto save ckpt at the end of training (#260) · 8003e875

Currently, checkpoints will not be saved until the training steps
satisfy the saving frequency. This PR adds an auto-save ckpt function at
the end of training.

committed Feb 15, 2025

8003e875 Browse Files

Add model merger to save checkpoints in the format of .safetensor and push them… · 1703c341

Add model merger to save checkpoints in the format of .safetensor and push them to the huggingface (#262)

This PR introduces a new script, `scripts/model_merger.py`, which
enables the conversion of model checkpoints saved in `.pt` format to the
`.safetensors` format. The script also includes functionality to
optionally push the converted model to Hugging Face Hub.

### Changes:
1. Added `scripts/model_merger.py` to handle the conversion process.
2. Implemented support for `.pt` to `.safetensors` transformation.
3. Added an option to push the converted model to Hugging Face Hub if
required.

committed Feb 15, 2025

1703c341 Browse Files

14 Feb, 2025 5 commits

clean 'WG_BACKEND' unused code. (#258) · 58a5c46e
湛露先生 committed Feb 15, 2025

58a5c46e Browse Files

[testing][rollout] feat: support integration of vllm>=0.7.0 (spmd-version) (#209) · f8b4d085

This PR aims to integrate vllm>=0.7.0 and preserve:
**Backward compatibility**: 0.3.1, 0.4.2, 0.5.4, 0.6.3 are still
supported
**Forward compatibility**: Future versions of vllm (>= 0.7.0) will be
supported without requiring manual maintenance for each new release.

The readme of this Beta version is located at docs/README_vllm0.7.md,
where users can find the installation method and related features. This
readme is copied as below.

---
# Readme for verl(vllm>=0.7) version
## Installation

Note: This version of veRL supports **FSDP** for training and **vLLM**
for rollout. (Megatron-LM is not supported yet.)

```
# Create the conda environment
conda create -n verl python==3.10
conda activate verl

# Install verl
git clone https://github.com/volcengine/verl.git
cd verl
pip3 install -e .
# Install vLLM>=0.7
pip3 install vllm==0.7.0
# Install flash-attn
pip3 install flash-attn --no-build-isolation

```

For existing stable vllm versions (<=0.7.2), you also need to make some
tiny patches manually on vllm (/path/to/site-packages/vllm after
installation) after the above steps:

- vllm/distributed/parallel_state.py: Remove the assertion below:

```
if (world_size
        != tensor_model_parallel_size * pipeline_model_parallel_size):
    raise RuntimeError(
        f"world_size ({world_size}) is not equal to "
        f"tensor_model_parallel_size ({tensor_model_parallel_size}) x "
        f"pipeline_model_parallel_size ({pipeline_model_parallel_size})")

```

- vllm/executor/uniproc_executor.py: change `local_rank = rank` to
`local_rank = int(os.environ["LOCAL_RANK"])`
- vllm/model_executor/model_loader/weight_utils.py: remove the
`torch.cuda.empty_cache()` in `pt_weights_iterator`

These modifications have already been merged into the main branch of
vLLM. To avoid modifying these files manually, you can directly build
vLLM from source.

## Features

### Use cuda graph

After installation, examples using FSDP as training backends can be
used. By default, the `enforce_eager` is set to True, which disables the
cuda graph. To enjoy cuda graphs and the sleep mode of vLLM>=0.7, add
the following lines to the bash script:

```
actor_rollout_ref.rollout.enforce_eager=False \
actor_rollout_ref.rollout.free_cache_engine=False \

```

For a typical job like examples/ppo_trainer/run_qwen2-7b_seq_balance.sh,
the rollout generation time is 115 seconds with vLLM0.6.3, while it is
85 seconds with vLLM0.7.0. By enabling the cudagraph, the generation
duration is further reduced to 62 seconds.

**Note:** Currently, if the `n` is greater than 1 in `SamplingParams` in
vLLM>=0.7, there is a potential performance issue on the stability of
rollout generation time (Some iterations would see generation time
bursts). We are working with the vLLM team to check this issue.

### Other features in vLLM

1. **num_scheduler_step>1:** not supported yet (weight loading has not
been aligned with `MultiStepModelRunner`)
2. **Prefix caching:** not supported yet (vLLM sleep mode does not
support prefix caching)
3. **Chunked prefill:** supported

---------

Co-authored-by: zhangshulai <zhangshulai@bytedance.com>

committed Feb 15, 2025

f8b4d085 Browse Files

fix the file lock issue (#255) · 63f75138

Previous FileLock in 

https://github.com/volcengine/verl/blob/c46f403479db5d7afca6388800503a3bfe393bf5/verl/utils/checkpoint/checkpoint_manager.py#L75
may cause some errors when the given path is too long. To fix this
issue, use the hash value to replace the original path to avoid the
conflict.

For instance, FileExistsEror: lErmno 17] File exists or BlockingIOError:
[Errno 11] Resource temporarily unavailable.

After modifying this part, the issue could be avoided.

```
@staticmethod
    def local_mkdir(path):
        if not os.path.isabs(path):
            working_dir = os.getcwd()
            path = os.path.join(working_dir, path)

        # Using hash value of path as lock file name to avoid long file name
        lock_filename = f"ckpt_{hash(path) & 0xFFFFFFFF:08x}.lock"
        lock_path = os.path.join(tempfile.gettempdir(), lock_filename)
        
        try:
            with FileLock(lock_path, timeout=60):  # Add timeout
                # make a new dir
                os.makedirs(path, exist_ok=True)
        except Exception as e:
            print(f"Warning: Failed to acquire lock for {path}: {e}")
            # Even if the lock is not acquired, try to create the directory
            os.makedirs(path, exist_ok=True)

        return path
```

committed Feb 14, 2025

63f75138 Browse Files

[misc] Compatibility Issue with Python 3.9 in FSDP Worker for LLaMA Model (#268) · 7346ecf8

**Fix: Compatibility Issue with Python 3.9 in FSDP Worker for LLaMA
Model**

When running the LLaMA model in the FSDP worker, an ImportError occurs
due to the use of the Unpack type from the typing module. This type is
only available in Python 3.11 and later, but the current environment
uses Python 3.9, which does not support it.

**Error Details:**
```
File "/project/Logic-RL-main/verl/models/transformers/llama.py", line 17, in <module>
from typing import Optional, List, Union, Tuple, Unpack, Callable
ImportError: cannot import name 'Unpack' from 'typing' (/opt/miniconda3/envs/verl/lib/python3.9/typing.py)
```
**Solution:**
To resolve this issue, I added conditional imports to handle different
Python versions. For Python versions lower than 3.11, the code now uses
a fallback or alternative approach to avoid relying on Unpack.

Co-authored-by: Yu Feng <fengyufengyu@didiglobal.com>

committed Feb 14, 2025

7346ecf8 Browse Files

[ci] fix: delete ckpts after saving in github runner (#272) · 62e23aee
```
- As titled
```
Guangming Sheng committed Feb 14, 2025
62e23aee Browse Files

13 Feb, 2025 1 commit
- [doc] feat: update remove_padding usage in perf tuning (#264) · ec82e9e5
```
- As titled
```
  Guangming Sheng committed Feb 13, 2025
  ec82e9e5 Browse Files
12 Feb, 2025 4 commits
- Update README.md to include critic-rl (#257) · dbbf95c5
  Chi Zhang committed Feb 12, 2025
  
  dbbf95c5 Browse Files
- docs: fix remax algorithm table format (#254) · 76518218
  HL committed Feb 12, 2025
  
  76518218 Browse Files
- Fix wrong code path in docs. (#256) · 762c7614
  湛露先生 committed Feb 12, 2025
  
  762c7614 Browse Files
- docs: update upcoming features · c46f4034
  HL committed Feb 12, 2025
  
  c46f4034 Browse Files
11 Feb, 2025 2 commits
- docs: add deepscaler · 607b4935
  HL committed Feb 11, 2025
  
  607b4935 Browse Files
- fix: Typo in the rank zero code (#240) · 95560d7d
```
The original Ray controller method `execute_rank_zero_sync()` is not
functional. Fixed.
```
  ExtremeViscent committed Feb 11, 2025
  95560d7d Browse Files
10 Feb, 2025 5 commits

[misc] feat: give main_task num_cpus=1 (#244) · bf9be15c
```
give main_task num_cpus=1 and make sure that main_task should not be
scheduled on head
```
Chi Zhang committed Feb 11, 2025
bf9be15c Browse Files

Feature/add remax support (#234) · 769b8d04

## Description
Added [ReMax](https://arxiv.org/abs/2310.10505) support to verl. ReMax
is a simple, efficient, and stable RL algorithm customized for LLM
training, with theoretical guarantees for variance reduction.

The [HybridFlow](https://arxiv.org/pdf/2409.19256v2) paper experimented
with ReMax, but verl did not provide an implementation. Therefore, ReMax
has been added.


## Changes
- Added RayReMaxTrainer implementation
- Added example scripts for ReMax training
- Added documentation for ReMax algorithm

## Testing
- Tested ReMax example scripts with Qwen models

validation reward of optimizing Qwen2.5-3B-Instruct on the GSM8K
dataset:

<img width="501" alt="截屏2025-02-09 20 51 14"
src="https://github.com/user-attachments/assets/742c2eab-6877-4c3c-b0a2-4159bd109add"
/>

The curve demonstrates the effectiveness of ReMax, though its
performance can be further enhanced through hyperparameter fine-tuning.

## Documentation
- Added ReMax documentation
- Updated example configurations

## Checklist
- [x] Code follows project's style guidelines (yapf formatted)
- [x] Tests added/updated and passing
- [x] Documentation updated
- [x] Example scripts added

committed Feb 10, 2025

769b8d04 Browse Files

Add stronger reward verification sandbox (#233) · 5a66ed26

Add stronger verification support as is used in
https://github.com/PRIME-RL/PRIME

- [x] Batched verification
- [x] Python interpreter
- [x] Stronger math verifier
- [x] Continuous score for code test

Re-opening https://github.com/volcengine/verl/pull/207 to trigger
automatic workflows

committed Feb 10, 2025

5a66ed26 Browse Files

fix: typo (#243) · 16b1984a
```
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
```
湛露先生 committed Feb 10, 2025
16b1984a Browse Files
clean unused package . (#239) · 7e073496
```
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
```
湛露先生 committed Feb 10, 2025
7e073496 Browse Files

09 Feb, 2025 7 commits

[misc]: fix ci and add warning to make sure wandb is used when logging val results (#237) · 3d566adc
```
- As titled
```
Guangming Sheng committed Feb 10, 2025
3d566adc Browse Files
delete redundant append_to_dict (#236) · 6427f50a
```
Closes #227
```
Pan Yinxu committed Feb 09, 2025
6427f50a Browse Files

Add option to log validation generations to wandb (#177) · d0725a62

## Motivation

Often the summary of average/max/min reward is not enough information,
and it's helpful to look at some real-world generations to see how the
model's actual behavior is changing over time. This can be particularly
helpful for debugging issues like the generation being cut off before
reasoning finishes.

## Change

This PR introduces a new `trainer.val_generations_to_log_to_wandb`
config value, with a default of 0. If set to a number larger than 0, it
logs that number of inputs/outputs/scores each time the validation set
is generated and scored. It uses a [wandb
Table](https://docs.wandb.ai/guides/track/log/log-tables/) to do so,
adding a single row for each validation set run.

I choose to log the data in this format because it allows a user to
easily see how the outputs for a given input change over time by looking
down a column vertically.

## Screenshot

<img width="1106" alt="Screenshot 2025-01-31 at 8 02 47 AM"
src="https://github.com/user-attachments/assets/f2ec0079-8464-4735-ad63-d71f349f4332"
/>

Note: if there's already another way to accomplish this easily let me
know! I was surprised not to find a way to see sample generations
because I find that quite useful, so let me know if I'm missing
something.

committed Feb 09, 2025

d0725a62 Browse Files

[doc] fix document deprecated link (#235) · 610c20c7
```
- As titled
```
Guangming Sheng committed Feb 09, 2025
610c20c7 Browse Files
add requirements (#231) · 577a341b
```
add requirements to make some CI tests work
```
Zefan Wang committed Feb 09, 2025
577a341b Browse Files
docs: add programming model guide (#230) · e842b73d
HL committed Feb 09, 2025

e842b73d Browse Files

implement REINFORCE++ algorithm (#228) · bdb50ac3

We have implemented the REINFORCE++ algorithm.

To use it, specify the parameter
`algorithm.adv_estimator=reinforce_plus_plus`.

Preliminary performance evaluations were conducted within the
[Unakar/Logic-RL](https://github.com/Unakar/Logic-RL) project, a
reproduction of DeepSeek R1 Zero on the 2K Tiny Logic Puzzle Dataset.
Results indicate that our REINFORCE++ implementation exhibits
performance and training stability comparable to, or potentially
exceeding, that of PPO and GRPO.

Related issue: #68

committed Feb 09, 2025

bdb50ac3 Browse Files

08 Feb, 2025 1 commit

[ckpt] feat: integrate checkpoint resume in RL ray trainer (#222) · 5a400bf2

**Features:**
- Save actor and critic checkpoint:
  - Model
  - Optimizer
  - lr_scheduler
  - rng_state
  - dataloader
- A complete checkpoint represents that dataloader, actor and critic (if
any) state are properly saved
- By default, we will not save the dataset but only store the dataloader
(with sampler) state

**Usage:**
- Support resume mode: auto, disable and resume_from_path
- auto: veRL will automatically check the latest checkpoint from
`trainer.default_local_dir`
   - disable: veRL will always train from scratch
- resume_from_path: When setting `resume_from_path`=True, then user only
need to set the resume_mode to the checkpoint path that you want to
load.

**TODO:**
- Support SFT resume in the next PR
- Support uploader

**Relevant issue:**
- https://github.com/volcengine/verl/issues/76
- https://github.com/volcengine/verl/issues/143

committed Feb 08, 2025

5a400bf2 Browse Files