1. 27 Feb, 2025 2 commits
  2. 26 Feb, 2025 2 commits
  3. 25 Feb, 2025 4 commits
  4. 24 Feb, 2025 5 commits
  5. 23 Feb, 2025 2 commits
  6. 22 Feb, 2025 2 commits
  7. 21 Feb, 2025 2 commits
  8. 20 Feb, 2025 1 commit
  9. 19 Feb, 2025 7 commits
  10. 18 Feb, 2025 3 commits
  11. 17 Feb, 2025 3 commits
    • Fix wrongs args desc . (#294) · 0dfcb7f9
      1 fix wrong notes description.
      2 fix wrong code path.
      
      Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
      湛露先生 committed
    • [misc] feat: support offload parameter and optimizer during rollout (#284) · 9db52329
      - Fixed FSDP1 model offload
      - With `actor_rollout_ref.actor.fsdp_config.param_offload=True \` and
      `actor_rollout_ref.actor.fsdp_config.optimizer_offload=True \ `. The GPU
      memory utilization can increase to 0.9
      - With actor, critic and reference offload all enabled, there will only
      be one model copy at a time in the GPU memory. Therefore, we can further
      increase the `micro_batch_size_per_gpu` or `max_token_per_gpu`
      
      **Specifically:**
      - During rollout, only rollout model and KVCache are in the GPU memory.
      - During critic compute values, only the critic model will stay in the
      GPU memory while its optimizer and other model states are in CPU main
      memory
      - During actor update, the actor model, optimizer are stored on GPU
      while the reference model and critic model, critic optimizer are
      offloaded to CPU.
      Guangming Sheng committed
    • Enhancement: Support for `extra_info` in Reward Calculation (#266) · f0e5bdf0
      ### **Enhancement: Support for `extra_info` in Reward Calculation**  
      
      #### **Summary**  
      This update enhances the reward computation process by introducing an
      additional `extra_info` parameter. This allows users to pass in more
      contextual information when calculating rewards, improving flexibility
      for different datasets.
      
      #### **Changes Made**  
      - **Updated `_default_compute_score`** to accept an `extra_info`
      argument:
        ```python
      def _default_compute_score(data_source, solution_str, ground_truth,
      extra_info):
        ```
      - **Modified the reward manager (`naive.py`)** to pass `extra_info` from
      `data_item.non_tensor_batch` to `compute_score`:
        ```python
        extra_info = data_item.non_tensor_batch['extra_info']
        score = self.compute_score(
            data_source=data_source,
            solution_str=sequences_str,
            ground_truth=ground_truth,
            extra_info=extra_info,
        )
        ```
        
      #### **Why This Change?**  
      - Some datasets require additional context beyond `data_source`,
      `solution_str`, and `ground_truth` for accurate reward computation.
      - The new `extra_info` field allows users to pass custom metadata,
      ideally in dictionary form, as specified in the [official
      documentation](https://verl.readthedocs.io/en/latest/preparation/prepare_data.html).
      - This change maintains compatibility with existing dataset processing
      scripts, as they already include the `extra_info` field.
      
      #### **Impact**  
      - **Improved flexibility**: Users can now pass additional contextual
      information, making reward computation more adaptable to different
      datasets.
      - **Backward compatibility**: Since all example datasets already include
      `extra_info`, this update should integrate seamlessly.
      
      Let me know if any modifications are needed!
      Taiwei Shi committed
  12. 16 Feb, 2025 1 commit
  13. 15 Feb, 2025 6 commits