1. 21 Mar, 2025 4 commits
  2. 20 Mar, 2025 5 commits
  3. 19 Mar, 2025 1 commit
  4. 18 Mar, 2025 3 commits
  5. 17 Mar, 2025 4 commits
  6. 16 Mar, 2025 3 commits
  7. 15 Mar, 2025 2 commits
  8. 14 Mar, 2025 8 commits
  9. 13 Mar, 2025 6 commits
    • fix: remove redundant broadcast in fsdp vllm postprocess (#577) · f7e183e4
      Remove redundant broadcast in fsdp vllm postprocess since vllm output in
      each tp rank should be identical.
      Joel committed
    • fix: remove redundant torch.cuda.empty_cache() (#575) · 3fc3e2b7
      #556 take effort to remove remove unnecessary empty_cache, but will
      cause CUDA oom at vllm wake_up.
      ```text
        File "/opt/tiger/ray/session_2025-03-13_12-11-30_408315_2895/runtime_resources/working_dir_files/_ray_pkg_a64b690733067c5c/verl/workers/fsdp_workers.py", line 481, in generate_sequences
          with self.rollout_sharding_manager:
        File "/opt/tiger/ray/session_2025-03-13_12-11-30_408315_2895/runtime_resources/working_dir_files/_ray_pkg_a64b690733067c5c/verl/workers/sharding_manager/fsdp_vllm.py", line 82, in __enter__
          self.inference_engine.wake_up()
        File "/usr/local/lib/python3.11/dist-packages/vllm/entrypoints/llm.py", line 1244, in wake_up
          self.llm_engine.wake_up()
        File "/usr/local/lib/python3.11/dist-packages/vllm/engine/llm_engine.py", line 1859, in wake_up
          self.model_executor.wake_up()
        File "/usr/local/lib/python3.11/dist-packages/vllm/executor/executor_base.py", line 216, in wake_up
          self.collective_rpc("wake_up")
        File "/usr/local/lib/python3.11/dist-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
          answer = run_method(self.driver_worker, method, args, kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/usr/local/lib/python3.11/dist-packages/vllm/utils.py", line 2196, in run_method
          return func(*args, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^
        File "/usr/local/lib/python3.11/dist-packages/vllm/worker/worker.py", line 140, in wake_up
          allocator.wake_up()
        File "/usr/local/lib/python3.11/dist-packages/vllm/device_allocator/cumem.py", line 207, in wake_up
          create_and_map(handle)
        File "/usr/local/lib/python3.11/dist-packages/vllm/device_allocator/cumem.py", line 75, in create_and_map
          python_create_and_map(*allocation_handle)
      RuntimeError: CUDA Error: out of memory at /workspace/csrc/cumem_allocator.cpp:62
      ```
      This PR remove all redundant `torch.cuda.empty_cache()` in FSDP worker
      and only empty cache before vllm wake_up and after vllm sleep, since
      vllm has its own caching memory allocator
      [CuMemAllocator](https://github.com/vllm-project/vllm/blob/v0.7.3/vllm/device_allocator/cumem.py#L103).
      Out of vllm scope, we should avoid empty cache to let pytorch using
      caching memory to speed up memory allocations.
      
      - [x] Cleanup FSDP worker torch.cuda.empty_cache()
      - [ ] Cleanup Megatron worker torch.cuda.empty_cache()
      Joel committed
    • [bugfix] PRIME filter overlong propmts & padding side incorrect & use xformers (#570) · 9bb02d27
      ### Description
      - fix filter_overlong_prompts setting in PRIME
      
      - fix padding side incorrect for Qwen in PRIME 
      
      - When I utilize PRIME recipe to train Qwen series models, I got
      “*ValueError: You are attempting to perform batched generation with
      padding_side='right' this may lead to unexpected behaviour for Flash
      Attention version of Qwen2. Make sure to call tokenizer.padding_side =
      'left' before tokenizing the input.*” So I set `use_cache = False` when
      calling model to calculate output logits.
      
      - fix CUDA error with vllm v0.6.3 
      
      - When I run PRIME, I may get an error — *CUDA error: an illegal memory
      access was encountered*. According to
      https://github.com/vllm-project/vllm/issues/10389, I set
      `VLLM_ATTENTION_BACKEND=XFORMERS` .
      CajZella committed
    • [bugfix] fix: generation script (#542) · 79e072f1
      # Description
      - Corrected dummy size to avoid faulty communication.
      - Fixed batch number calculation.
      - Adjusted worker group role to alleviate memory overhead.
      - Add ray.init() to prevent failing to register worker.
      Dai, Weinan committed
    • [rollout] feat: support sampling in validation stage (#553) · d5de9f4c
      Currently, eager mode is applied in the validation stage. However, in
      some reasoning tasks, we may need to generate n times and average the
      scores.
      
      In this PR, we support using non-eager sampling parameters during
      validation by specifying the `val_kwargs` in `actor_rollout_ref.rollout`
      config field.
      
      
      **Future work**
      - [ ] Merge `vllm_rollout_spmd.py` and `vllm_rollout.py` into one file.
      Guangming Sheng committed
  10. 12 Mar, 2025 4 commits
    • refactor: remove custom vllm weight loader and use model.load_weights directly (#543) · 6680185c
      As we're moving to vllm>=0.7.3, we should remove `verl/third_party`
      complelely in the future.
      Joel committed
    • Add Math-Verify Support (#545) · d4a00ef0
      # Description
      
      https://github.com/volcengine/verl/issues/287,
      https://github.com/volcengine/verl/issues/295.
      This PR introduces support for
      [Math-Verify](https://github.com/huggingface/Math-Verify) as a new
      rule-based reward scorer, significantly improving evaluation accuracy.
      
      # Key changes
      
      - Added `math-verify` to the installation dependencies.
      - Introduced `reward_score/math_verify.py` and updated
      `reward_score/__init__.py`.
      
      # Test
      
      Comparison between the existing scorer in math.py and the newly added
      `math_verify.py`, using Qwen2.5-Math-7B-Instruct:
      
      ```
      # Use scorer in math.py (original)
      {'val/test_score/DigitalLearningGmbH/MATH-lighteval': 0.803}
      
      # Use scorer in math_verify.py (newly added)
      {'val/test_score/DigitalLearningGmbH/MATH-lighteval': 0.8338}
      ```
      
      Test scripts:
      
      ```bash
      set -x
      
      # Data Process
      python examples/data_preprocess/math_dataset.py --local_dir /workspace/datasets/math
      
      # Evaluation
      export CUDA_VISIBLE_DEVICES=4,5,6,7
      export VLLM_ATTENTION_BACKEND=XFORMERS
      
      math_train_path=/workspace/datasets/math/train.parquet
      math_test_path=/workspace/datasets/math/test.parquet
      
      python3 -m verl.trainer.main_ppo \
          data.train_files="$math_train_path" \
          data.val_files="$math_test_path" \
          data.max_prompt_length=2048 \
          data.max_response_length=2048 \
          actor_rollout_ref.model.path=Qwen/Qwen2.5-Math-7B-Instruct \
          actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
          actor_rollout_ref.rollout.name=vllm \
          actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \
          actor_rollout_ref.rollout.n=1 \
          actor_rollout_ref.rollout.temperature=0 \
          trainer.logger=['console'] \
          trainer.project_name='test-math-verify' \
          trainer.experiment_name='test-math-verify' \
          +trainer.val_before_train=True \
          trainer.n_gpus_per_node=4 \
          trainer.nnodes=1 \
          trainer.total_epochs=0 \
          data.train_batch_size=1024 \
          actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=1 \
          actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=1 \
          actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1 \
          algorithm.adv_estimator=grpo $@
      ```
      Yuyang Ding committed