Commits · b677a61e2233eda9bf9fa23ac7cd48fb3e5416a9 · ZhangXiaoyun / verl

02 Mar, 2025 5 commits
- rollout: FIRE sampling added. (#58) · b677a61e
  Weizhe Chen committed Mar 02, 2025
  
  b677a61e Browse Files
- vllm: fix issue #438 (#440) · 128781cf
  ZSL98 committed Mar 02, 2025
  
  128781cf Browse Files
- fix: bind the port with IP address (#314) · fb13a07d
```
Specify the IP address when calling the bind method.
```
  Willem Jiang committed Mar 02, 2025
  fb13a07d Browse Files
- [doc] add Code-R1 to readme awesome work (#437) · 5273011d
  Guangming Sheng committed Mar 02, 2025
  
  5273011d Browse Files
- docs: add hf ckpt to faq, and include verl apis in the website (#427) · fe547a33
```
Now APIs can be displayed: 


![image](https://github.com/user-attachments/assets/6592ce68-7bf6-46cb-8dd3-a5fa6cd99f3e)
```
  HL committed Mar 02, 2025
  fe547a33 Browse Files
01 Mar, 2025 2 commits

fix: 2 typos (#435) · 99fb2dde
Lumeng Wu committed Mar 01, 2025

99fb2dde Browse Files

Because of the ongoing updates in vLLM, I noticed that veRL currently
cannot integrate with the nightly build of vLLM directly. The new DP
feature in the nightly version can no longer be bypassed by simply
adjusting the `data_parallel_size` parameter, and resolving this
requires further investigation.

As a temporary workaround, I recommend a customized installation of vLLM
if the V1 engine is required. I have updated the relevant documentation
accordingly to reflect this guidance.

committed Mar 01, 2025

cef4c2de Browse Files

28 Feb, 2025 3 commits
- [Fix] No Shuffling for `val_dataloader` (#423) · 021db112
```
Validation should not have shuffling.
```
  Shawn/Yuxuan Tong committed Feb 28, 2025
  021db112 Browse Files
- [Feature] Assert Single Batch for `val_dataloader` (#424) · 6e4a445f
```
This is an enhancement for the single batch strategy for
`val_dataloader`, making https://github.com/volcengine/verl/pull/353
more robust.
```
  Shawn/Yuxuan Tong committed Feb 28, 2025
  6e4a445f Browse Files
- ci: Added the secrets scan action (#417) · 60c92147
  Willem Jiang committed Feb 27, 2025
  
  60c92147 Browse Files
27 Feb, 2025 6 commits

[feat] tracking support tensorboard (#408) · 82b38e25

Add tensorboard in Tracking backends.

The user can set the environment variable TENSORBOARD_DIR to specify the
TensorBoard log path.

committed Feb 27, 2025

82b38e25 Browse Files

[ckpt] fix: fix oom when resume from ckpt (#402) · a0f05da8
Chi Zhang committed Feb 27, 2025

a0f05da8 Browse Files
[fix] Fix evaluation file path in remax training scripts. (#404) · 052b0a39
```
The current training script utilizes the same file during training and
evaluation. It is surmised that this may be incorrect.
```
yaguang committed Feb 27, 2025
052b0a39 Browse Files

[ckpt] replace DataLoader with StatefulDataLoader to support resume training for… · 96d98ccb

[ckpt] replace DataLoader with StatefulDataLoader to support resume training for SequentialSampler  (#389)

Try to resolve this
[issue](https://github.com/volcengine/verl/issues/356).

As suggested by this issue discussion, I replace default DataLoader with
StatefulDataloader, which provides state_dict and load_state_dict
methods that may support resuming the iterator position of mid-epoch
checkpointing.

committed Feb 27, 2025

96d98ccb Browse Files

[misc] fix: disable chunked-prefill by default (#259) · 558fae54

Thanks: @HillZhang1999

- Related issue: https://github.com/volcengine/verl/issues/189

`[36m(main_task pid=3523385)[0m ValueError: max_num_batched_tokens
(8192) is smaller than max_model_len (9216). This effectively limits the
maximum sequence length to max_num_batched_tokens and makes vLLM reject
longer sequences. Please increase max_num_batched_tokens or decrease
max_model_len.`

When enable_chunked_prefill is activated, the aforementioned issue will
be concealed. Please increase `max_num_batched_tokens` or `decrease
max_model_len`.

committed Feb 27, 2025

558fae54 Browse Files

[ci] fix: fix qwen0.5b megatron ci (#396) · 59643585
Chi Zhang committed Feb 27, 2025

59643585 Browse Files

26 Feb, 2025 2 commits
- apis: add data proto to documentation page. use copy_to_local instead of… · 2440aa69
```
apis: add data proto to documentation page. use copy_to_local instead of copy_local_path_from_hdfs (#358)
```
  HL committed Feb 26, 2025
  2440aa69 Browse Files
- [misc] add assertion for normalized ppo mini_batch_size and ppo micro… (#382) · efd0061a
```
- As titled
```
  Guangming Sheng committed Feb 26, 2025
  efd0061a Browse Files
25 Feb, 2025 4 commits
- [fix] Passing ppo_epochs to dp_actor.py (#346) · b4c13ce6
```
See issue: https://github.com/volcengine/verl/issues/342
```
  Mingjie Liu committed Feb 25, 2025
  b4c13ce6 Browse Files
- [Fix] Using an enumeration class to avoid spelling errors in adv_esti… (#377) · 3b1aef2f
```
#369

---------

Co-authored-by: Thom <zhangyi@zhangyideMacBook-Pro.local>
```
  _T_L_R_ committed Feb 25, 2025
  3b1aef2f Browse Files
- fix spelling error (#374) · 0dc8e859
  kriswang committed Feb 24, 2025
  
  0dc8e859 Browse Files
- [ppo] fix: fix minibatch size when n > 1 for megatron worker (#370) · bbda2e57
  Chi Zhang committed Feb 25, 2025
  
  bbda2e57 Browse Files
24 Feb, 2025 5 commits

rollout: Fix navive_rollout class names. (#361) · 656accb0
```
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
```
湛露先生 committed Feb 24, 2025
656accb0 Browse Files
[docs] modify the comments (#363) · 8eb22b50
BearBiscuit committed Feb 24, 2025

8eb22b50 Browse Files

feat: add support for ulysses sequence parallel for transformers >= 0.48 (#357) · d36422be

close #312 

Add support for ulysses sp for transformers >= 0.48

I've tested transformers 0.45.0, 0.46.0, 0.47.0, 0.48.0 and 0.49.0,
using sp=2 with the following script in my local env
```bash
#!/bin/bash

set -ex
VERSIONS=("4.45.0" "4.46.0" "4.47.0" "4.48.0" "4.49.0")

for version in "${VERSIONS[@]}"; do
    echo "Testing with Transformers version ${version}"
    echo "----------------------------------------"
    
    pip install "transformers==${version}"
    
    PYTHONPATH=./ torchrun --nproc_per_node=2 tests/model/test_transformers_ulysses.py
    
    echo "----------------------------------------"
    echo "Completed testing for version ${version}"
    echo ""
done
```

committed Feb 24, 2025

d36422be Browse Files

[fix] Improve the params template for generation (#351) · e53dcdb9
```
fix the issue[#331](https://github.com/volcengine/verl/issues/331)
```
BearBiscuit committed Feb 24, 2025
e53dcdb9 Browse Files

[Fix] Deprecate `val_batch_size` (#353) · 4011f407

Validation datasets are sent to inference engines as a whole batch,
which will schedule the memory themselves.

- [x] Remove `val_batch_size` from examples
- [x] Set default values of `val_batch_size` in configs as `null` and
add DEPRECATED comments
- [x] Add deprecation warnings about `val_batch_size` in
`_validate_config`

committed Feb 24, 2025

4011f407 Browse Files

23 Feb, 2025 2 commits
- feat: tracking support vemlp (#339) · 7a128c1c
```
Tracking backend support vemlp wandb

---------

Co-authored-by: liudayuan.carrot <liudayuan.carrot@bytedance.com>
```
  liudayuan-carrot committed Feb 23, 2025
  7a128c1c Browse Files
- chore: update optimizer_config.py (#348) · 32a3d628
  Ikko Eltociear Ashimine committed Feb 23, 2025
  
  32a3d628 Browse Files
22 Feb, 2025 2 commits
- docs: add links for rloo and volcengine distributed training doc (#343) · d02b8c78
  HL committed Feb 22, 2025
  
  d02b8c78 Browse Files
- algo: Rloo advantage estimator (#341) · 6a820b61
```
Implement RLOO algorithm according to https://arxiv.org/abs/2402.14740
```
  Zefan Wang committed Feb 22, 2025
  6a820b61 Browse Files
21 Feb, 2025 2 commits

docs: add faq for vllm illegal memory access (#333) · 76352ae9
HL committed Feb 21, 2025

76352ae9 Browse Files

[misc] Add Ray Serve to requirements to support multi-node training (#318) · 0268917a

This PR adds Ray Serve to the requirements to enable support for
multi-node training. It addresses the issue described here:
https://github.com/volcengine/verl/issues/87#issuecomment-2659493418

Co-authored-by: Yu Feng <fengyufengyu@didiglobal.com>

committed Feb 21, 2025

0268917a Browse Files

20 Feb, 2025 1 commit
- distro: bump up version to v0.2.0.dev, limit vllm version (#327) · 0a1b16f8
  HL committed Feb 20, 2025
  
  0a1b16f8 Browse Files
19 Feb, 2025 6 commits
- [megatron] feat: support qwen2 megatron backend (#261) · 94487625
```
Support Qwen2 Megatron backend

The code is primarily adapted from the llama folder, with modifications
to use QKV bias and remove the rope_scaling of RoPE in
`verl/models/qwen2/megatron/layers/parallel_attention.py`.

- Train using Qwen2-7B-Instruct with PPO, GSM8k score can reach 0.87 at
step 75.
- not support saver now
```
  Kinman Lei committed Feb 19, 2025
  94487625 Browse Files
- fix vllm 0.7 documentation link in readme (#317) · 35c8daee
  Chi Zhang committed Feb 19, 2025
  
  35c8daee Browse Files
- fix: specify the hash version of action in scorecard.yml (#313) · 39d175b7
  Willem Jiang committed Feb 19, 2025
  
  39d175b7 Browse Files
- docs: add an example for Ray on Slurm (#309) · 55a4d3c7
```
A working Slurm example adapted from
https://docs.ray.io/en/latest/ray-core/starting-ray.html
```
  Chenhui Zhang committed Feb 19, 2025
  55a4d3c7 Browse Files
- docs: add recent event and blogs (#305) · 8cc37e1d
  HL committed Feb 19, 2025
  
  8cc37e1d Browse Files
- Added the dependabot action (#304) · 68564795
  Willem Jiang committed Feb 19, 2025
  
  68564795 Browse Files