- 21 Mar, 2025 4 commits
-
-
Junrong Lin committed
-
Prevents training hangs by validating `num_key_value_heads % ulysses_sequence_parallel_size == 0` before training.
Yu Feng committed -
## What does this PR do? Add document for using vLLM 0.8 in verl ## Who can review? @eric-haibin-lin
hoshi-hiyouga committed -
HL committed
-
- 20 Mar, 2025 5 commits
-
-
Adding Openmanus-RL: a llm agent rl tunning repo with verl
Kunlun Zhu committed -
Add `verl` as the `framework` parameter to the SwanLab config table, so more developers can see that this training comes from `verl`.
Ze-Yi LIN committed -
HL committed
-
https://github.com/volcengine/verl/issues/680 Changes: - Move math-verify to the optional dependencies. Now it can be installed via `cd verl && pip install -e .[math]` - Revert using naive verifier for math dataset. Users can switch to math-verify or custom a new `compute_score` function.
Yuyang Ding committed -
Chi Zhang committed
-
- 19 Mar, 2025 1 commit
-
-
We propose a more accurate description of DeepRetrieval. Thanks for your awesome work!
Patrick Jiang committed
-
- 18 Mar, 2025 3 commits
-
-
Yuqian Fu committed
-
Use ray actor instead of task to run main_task - Ray task is retried in system error(oom/segmentfault), which may cause unexpectedly behavior - Actor is more trackable in ray dashboard, e.g logging/stacktrace/profile close #539
Joel committed -
Commit c3420692 Rebase caused error. Try to revert and add an assertion check.
Blue Space committed
-
- 17 Mar, 2025 4 commits
-
-
Chi Zhang committed
-
This PR adds **DeepEnlighten** to the "Awesome Work Using Verl" section. Co-authored-by: yu_wang <yuwang@astri.com> Co-authored-by: Chi Zhang <zhangchi.usc1992@bytedance.com>
yuwang91 committed -
- As titled
Guangming Sheng committed -
#22 . WIP, will add more details tomorrow :) --------- Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
Junrong Lin committed
-
- 16 Mar, 2025 3 commits
-
-
Chi Zhang committed
-
add MetaSpatial in Awesome Work using EasyR1
PzySeere committed -
Fengqing Jiang committed
-
- 15 Mar, 2025 2 commits
-
-
Guangming Sheng committed
-
## What does this PR do? Use metric_utils to maintain the logic of computing metrics, avoiding too many lines in ppo trainer ## Who can review? @vermouth1992 @PeterSH6
hoshi-hiyouga committed
-
- 14 Mar, 2025 8 commits
-
-
Support for GRPO with Megatron backend and fix a configuration bug when not using virtual pipeline. Calibrated with FSDP backend.
Blue Space committed -
Yuqian Fu committed
-
This PR adds the `lr_warmup_steps` configuration. Note the `num_warmup_steps` is prior to `lr_warmup_steps_ratio`.
Shawn/Yuxuan Tong committed -
BearBiscuit committed
-
## Summary Providing an option in the config to turn off the `torch.compile` used in `dp_actor.py` ## Usage Adding the following line to the driver or cli scripts to turn off `torch.compile`. ```python +actor_rollout_ref.actor.use_torch_compile=False ``` Otherwise, `torch.compile` will be used by default ## Related Issue #354 #245 --------- Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
Hongpeng Guo committed -
As a `DataProto` instance, calling `to(device)` already moves data.batch to the specified device. https://github.com/volcengine/verl/blob/329dcfe1dd60f2d736ee55914e2a49e1887718eb/verl/protocol.py#L324-L336
Lumeng Wu committed -
#354
Joel committed -
Follow-up to https://github.com/volcengine/verl/pull/309
Chenhui Zhang committed
-
- 13 Mar, 2025 6 commits
-
-
none0663 committed
-
Remove redundant broadcast in fsdp vllm postprocess since vllm output in each tp rank should be identical.
Joel committed -
#556 take effort to remove remove unnecessary empty_cache, but will cause CUDA oom at vllm wake_up. ```text File "/opt/tiger/ray/session_2025-03-13_12-11-30_408315_2895/runtime_resources/working_dir_files/_ray_pkg_a64b690733067c5c/verl/workers/fsdp_workers.py", line 481, in generate_sequences with self.rollout_sharding_manager: File "/opt/tiger/ray/session_2025-03-13_12-11-30_408315_2895/runtime_resources/working_dir_files/_ray_pkg_a64b690733067c5c/verl/workers/sharding_manager/fsdp_vllm.py", line 82, in __enter__ self.inference_engine.wake_up() File "/usr/local/lib/python3.11/dist-packages/vllm/entrypoints/llm.py", line 1244, in wake_up self.llm_engine.wake_up() File "/usr/local/lib/python3.11/dist-packages/vllm/engine/llm_engine.py", line 1859, in wake_up self.model_executor.wake_up() File "/usr/local/lib/python3.11/dist-packages/vllm/executor/executor_base.py", line 216, in wake_up self.collective_rpc("wake_up") File "/usr/local/lib/python3.11/dist-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc answer = run_method(self.driver_worker, method, args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/vllm/utils.py", line 2196, in run_method return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/vllm/worker/worker.py", line 140, in wake_up allocator.wake_up() File "/usr/local/lib/python3.11/dist-packages/vllm/device_allocator/cumem.py", line 207, in wake_up create_and_map(handle) File "/usr/local/lib/python3.11/dist-packages/vllm/device_allocator/cumem.py", line 75, in create_and_map python_create_and_map(*allocation_handle) RuntimeError: CUDA Error: out of memory at /workspace/csrc/cumem_allocator.cpp:62 ``` This PR remove all redundant `torch.cuda.empty_cache()` in FSDP worker and only empty cache before vllm wake_up and after vllm sleep, since vllm has its own caching memory allocator [CuMemAllocator](https://github.com/vllm-project/vllm/blob/v0.7.3/vllm/device_allocator/cumem.py#L103). Out of vllm scope, we should avoid empty cache to let pytorch using caching memory to speed up memory allocations. - [x] Cleanup FSDP worker torch.cuda.empty_cache() - [ ] Cleanup Megatron worker torch.cuda.empty_cache()
Joel committed -
### Description - fix filter_overlong_prompts setting in PRIME - fix padding side incorrect for Qwen in PRIME - When I utilize PRIME recipe to train Qwen series models, I got “*ValueError: You are attempting to perform batched generation with padding_side='right' this may lead to unexpected behaviour for Flash Attention version of Qwen2. Make sure to call tokenizer.padding_side = 'left' before tokenizing the input.*” So I set `use_cache = False` when calling model to calculate output logits. - fix CUDA error with vllm v0.6.3 - When I run PRIME, I may get an error — *CUDA error: an illegal memory access was encountered*. According to https://github.com/vllm-project/vllm/issues/10389, I set `VLLM_ATTENTION_BACKEND=XFORMERS` .
CajZella committed -
# Description - Corrected dummy size to avoid faulty communication. - Fixed batch number calculation. - Adjusted worker group role to alleviate memory overhead. - Add ray.init() to prevent failing to register worker.
Dai, Weinan committed -
Currently, eager mode is applied in the validation stage. However, in some reasoning tasks, we may need to generate n times and average the scores. In this PR, we support using non-eager sampling parameters during validation by specifying the `val_kwargs` in `actor_rollout_ref.rollout` config field. **Future work** - [ ] Merge `vllm_rollout_spmd.py` and `vllm_rollout.py` into one file.
Guangming Sheng committed
-
- 12 Mar, 2025 4 commits
-
-
Zheng-Yuxiang committed
-
BearBiscuit committed
-
As we're moving to vllm>=0.7.3, we should remove `verl/third_party` complelely in the future.
Joel committed -
# Description https://github.com/volcengine/verl/issues/287, https://github.com/volcengine/verl/issues/295. This PR introduces support for [Math-Verify](https://github.com/huggingface/Math-Verify) as a new rule-based reward scorer, significantly improving evaluation accuracy. # Key changes - Added `math-verify` to the installation dependencies. - Introduced `reward_score/math_verify.py` and updated `reward_score/__init__.py`. # Test Comparison between the existing scorer in math.py and the newly added `math_verify.py`, using Qwen2.5-Math-7B-Instruct: ``` # Use scorer in math.py (original) {'val/test_score/DigitalLearningGmbH/MATH-lighteval': 0.803} # Use scorer in math_verify.py (newly added) {'val/test_score/DigitalLearningGmbH/MATH-lighteval': 0.8338} ``` Test scripts: ```bash set -x # Data Process python examples/data_preprocess/math_dataset.py --local_dir /workspace/datasets/math # Evaluation export CUDA_VISIBLE_DEVICES=4,5,6,7 export VLLM_ATTENTION_BACKEND=XFORMERS math_train_path=/workspace/datasets/math/train.parquet math_test_path=/workspace/datasets/math/test.parquet python3 -m verl.trainer.main_ppo \ data.train_files="$math_train_path" \ data.val_files="$math_test_path" \ data.max_prompt_length=2048 \ data.max_response_length=2048 \ actor_rollout_ref.model.path=Qwen/Qwen2.5-Math-7B-Instruct \ actor_rollout_ref.rollout.tensor_model_parallel_size=1 \ actor_rollout_ref.rollout.name=vllm \ actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \ actor_rollout_ref.rollout.n=1 \ actor_rollout_ref.rollout.temperature=0 \ trainer.logger=['console'] \ trainer.project_name='test-math-verify' \ trainer.experiment_name='test-math-verify' \ +trainer.val_before_train=True \ trainer.n_gpus_per_node=4 \ trainer.nnodes=1 \ trainer.total_epochs=0 \ data.train_batch_size=1024 \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=1 \ actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=1 \ actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1 \ algorithm.adv_estimator=grpo $@ ```
Yuyang Ding committed
-