1. 19 Feb, 2025 1 commit
  2. 18 Feb, 2025 3 commits
  3. 17 Feb, 2025 3 commits
    • Fix wrongs args desc . (#294) · 0dfcb7f9
      1 fix wrong notes description.
      2 fix wrong code path.
      
      Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
      湛露先生 committed
    • [misc] feat: support offload parameter and optimizer during rollout (#284) · 9db52329
      - Fixed FSDP1 model offload
      - With `actor_rollout_ref.actor.fsdp_config.param_offload=True \` and
      `actor_rollout_ref.actor.fsdp_config.optimizer_offload=True \ `. The GPU
      memory utilization can increase to 0.9
      - With actor, critic and reference offload all enabled, there will only
      be one model copy at a time in the GPU memory. Therefore, we can further
      increase the `micro_batch_size_per_gpu` or `max_token_per_gpu`
      
      **Specifically:**
      - During rollout, only rollout model and KVCache are in the GPU memory.
      - During critic compute values, only the critic model will stay in the
      GPU memory while its optimizer and other model states are in CPU main
      memory
      - During actor update, the actor model, optimizer are stored on GPU
      while the reference model and critic model, critic optimizer are
      offloaded to CPU.
      Guangming Sheng committed
    • Enhancement: Support for `extra_info` in Reward Calculation (#266) · f0e5bdf0
      ### **Enhancement: Support for `extra_info` in Reward Calculation**  
      
      #### **Summary**  
      This update enhances the reward computation process by introducing an
      additional `extra_info` parameter. This allows users to pass in more
      contextual information when calculating rewards, improving flexibility
      for different datasets.
      
      #### **Changes Made**  
      - **Updated `_default_compute_score`** to accept an `extra_info`
      argument:
        ```python
      def _default_compute_score(data_source, solution_str, ground_truth,
      extra_info):
        ```
      - **Modified the reward manager (`naive.py`)** to pass `extra_info` from
      `data_item.non_tensor_batch` to `compute_score`:
        ```python
        extra_info = data_item.non_tensor_batch['extra_info']
        score = self.compute_score(
            data_source=data_source,
            solution_str=sequences_str,
            ground_truth=ground_truth,
            extra_info=extra_info,
        )
        ```
        
      #### **Why This Change?**  
      - Some datasets require additional context beyond `data_source`,
      `solution_str`, and `ground_truth` for accurate reward computation.
      - The new `extra_info` field allows users to pass custom metadata,
      ideally in dictionary form, as specified in the [official
      documentation](https://verl.readthedocs.io/en/latest/preparation/prepare_data.html).
      - This change maintains compatibility with existing dataset processing
      scripts, as they already include the `extra_info` field.
      
      #### **Impact**  
      - **Improved flexibility**: Users can now pass additional contextual
      information, making reward computation more adaptable to different
      datasets.
      - **Backward compatibility**: Since all example datasets already include
      `extra_info`, this update should integrate seamlessly.
      
      Let me know if any modifications are needed!
      Taiwei Shi committed
  4. 16 Feb, 2025 1 commit
  5. 15 Feb, 2025 7 commits
  6. 14 Feb, 2025 5 commits
    • [testing][rollout] feat: support integration of vllm>=0.7.0 (spmd-version) (#209) · f8b4d085
      This PR aims to integrate vllm>=0.7.0 and preserve:
      **Backward compatibility**: 0.3.1, 0.4.2, 0.5.4, 0.6.3 are still
      supported
      **Forward compatibility**: Future versions of vllm (>= 0.7.0) will be
      supported without requiring manual maintenance for each new release.
      
      The readme of this Beta version is located at docs/README_vllm0.7.md,
      where users can find the installation method and related features. This
      readme is copied as below.
      
      ---
      # Readme for verl(vllm>=0.7) version
      ## Installation
      
      Note: This version of veRL supports **FSDP** for training and **vLLM**
      for rollout. (Megatron-LM is not supported yet.)
      
      ```
      # Create the conda environment
      conda create -n verl python==3.10
      conda activate verl
      
      # Install verl
      git clone https://github.com/volcengine/verl.git
      cd verl
      pip3 install -e .
      # Install vLLM>=0.7
      pip3 install vllm==0.7.0
      # Install flash-attn
      pip3 install flash-attn --no-build-isolation
      
      ```
      
      For existing stable vllm versions (<=0.7.2), you also need to make some
      tiny patches manually on vllm (/path/to/site-packages/vllm after
      installation) after the above steps:
      
      - vllm/distributed/parallel_state.py: Remove the assertion below:
      
      ```
      if (world_size
              != tensor_model_parallel_size * pipeline_model_parallel_size):
          raise RuntimeError(
              f"world_size ({world_size}) is not equal to "
              f"tensor_model_parallel_size ({tensor_model_parallel_size}) x "
              f"pipeline_model_parallel_size ({pipeline_model_parallel_size})")
      
      ```
      
      - vllm/executor/uniproc_executor.py: change `local_rank = rank` to
      `local_rank = int(os.environ["LOCAL_RANK"])`
      - vllm/model_executor/model_loader/weight_utils.py: remove the
      `torch.cuda.empty_cache()` in `pt_weights_iterator`
      
      These modifications have already been merged into the main branch of
      vLLM. To avoid modifying these files manually, you can directly build
      vLLM from source.
      
      ## Features
      
      ### Use cuda graph
      
      After installation, examples using FSDP as training backends can be
      used. By default, the `enforce_eager` is set to True, which disables the
      cuda graph. To enjoy cuda graphs and the sleep mode of vLLM>=0.7, add
      the following lines to the bash script:
      
      ```
      actor_rollout_ref.rollout.enforce_eager=False \
      actor_rollout_ref.rollout.free_cache_engine=False \
      
      ```
      
      For a typical job like examples/ppo_trainer/run_qwen2-7b_seq_balance.sh,
      the rollout generation time is 115 seconds with vLLM0.6.3, while it is
      85 seconds with vLLM0.7.0. By enabling the cudagraph, the generation
      duration is further reduced to 62 seconds.
      
      **Note:** Currently, if the `n` is greater than 1 in `SamplingParams` in
      vLLM>=0.7, there is a potential performance issue on the stability of
      rollout generation time (Some iterations would see generation time
      bursts). We are working with the vLLM team to check this issue.
      
      ### Other features in vLLM
      
      1. **num_scheduler_step>1:** not supported yet (weight loading has not
      been aligned with `MultiStepModelRunner`)
      2. **Prefix caching:** not supported yet (vLLM sleep mode does not
      support prefix caching)
      3. **Chunked prefill:** supported
      
      ---------
      
      Co-authored-by: zhangshulai <zhangshulai@bytedance.com>
      ZSL98 committed
    • fix the file lock issue (#255) · 63f75138
      Previous FileLock in 
      
      https://github.com/volcengine/verl/blob/c46f403479db5d7afca6388800503a3bfe393bf5/verl/utils/checkpoint/checkpoint_manager.py#L75
      may cause some errors when the given path is too long. To fix this
      issue, use the hash value to replace the original path to avoid the
      conflict.
      
      For instance, FileExistsEror: lErmno 17] File exists or BlockingIOError:
      [Errno 11] Resource temporarily unavailable.
      
      After modifying this part, the issue could be avoided.
      
      ```
      @staticmethod
          def local_mkdir(path):
              if not os.path.isabs(path):
                  working_dir = os.getcwd()
                  path = os.path.join(working_dir, path)
      
              # Using hash value of path as lock file name to avoid long file name
              lock_filename = f"ckpt_{hash(path) & 0xFFFFFFFF:08x}.lock"
              lock_path = os.path.join(tempfile.gettempdir(), lock_filename)
              
              try:
                  with FileLock(lock_path, timeout=60):  # Add timeout
                      # make a new dir
                      os.makedirs(path, exist_ok=True)
              except Exception as e:
                  print(f"Warning: Failed to acquire lock for {path}: {e}")
                  # Even if the lock is not acquired, try to create the directory
                  os.makedirs(path, exist_ok=True)
      
              return path
      ```
      Wei Liu committed
    • [misc] Compatibility Issue with Python 3.9 in FSDP Worker for LLaMA Model (#268) · 7346ecf8
      **Fix: Compatibility Issue with Python 3.9 in FSDP Worker for LLaMA
      Model**
      
      When running the LLaMA model in the FSDP worker, an ImportError occurs
      due to the use of the Unpack type from the typing module. This type is
      only available in Python 3.11 and later, but the current environment
      uses Python 3.9, which does not support it.
      
      **Error Details:**
      ```
      File "/project/Logic-RL-main/verl/models/transformers/llama.py", line 17, in <module>
      from typing import Optional, List, Union, Tuple, Unpack, Callable
      ImportError: cannot import name 'Unpack' from 'typing' (/opt/miniconda3/envs/verl/lib/python3.9/typing.py)
      ```
      **Solution:**
      To resolve this issue, I added conditional imports to handle different
      Python versions. For Python versions lower than 3.11, the code now uses
      a fallback or alternative approach to avoid relying on Unpack.
      
      Co-authored-by: Yu Feng <fengyufengyu@didiglobal.com>
      Yu Feng committed
  7. 13 Feb, 2025 1 commit
  8. 12 Feb, 2025 4 commits
  9. 11 Feb, 2025 2 commits
  10. 10 Feb, 2025 5 commits
  11. 09 Feb, 2025 7 commits
  12. 08 Feb, 2025 1 commit
    • [ckpt] feat: integrate checkpoint resume in RL ray trainer (#222) · 5a400bf2
      **Features:**
      - Save actor and critic checkpoint:
        - Model
        - Optimizer
        - lr_scheduler
        - rng_state
        - dataloader
      - A complete checkpoint represents that dataloader, actor and critic (if
      any) state are properly saved
      - By default, we will not save the dataset but only store the dataloader
      (with sampler) state
      
      **Usage:**
      - Support resume mode: auto, disable and resume_from_path
      - auto: veRL will automatically check the latest checkpoint from
      `trainer.default_local_dir`
         - disable: veRL will always train from scratch
      - resume_from_path: When setting `resume_from_path`=True, then user only
      need to set the resume_mode to the checkpoint path that you want to
      load.
      
      **TODO:**
      - Support SFT resume in the next PR
      - Support uploader
      
      **Relevant issue:**
      - https://github.com/volcengine/verl/issues/76
      - https://github.com/volcengine/verl/issues/143
      Guangming Sheng committed