1. 09 Feb, 2025 4 commits
  2. 08 Feb, 2025 3 commits
    • [ckpt] feat: integrate checkpoint resume in RL ray trainer (#222) · 5a400bf2
      **Features:**
      - Save actor and critic checkpoint:
        - Model
        - Optimizer
        - lr_scheduler
        - rng_state
        - dataloader
      - A complete checkpoint represents that dataloader, actor and critic (if
      any) state are properly saved
      - By default, we will not save the dataset but only store the dataloader
      (with sampler) state
      
      **Usage:**
      - Support resume mode: auto, disable and resume_from_path
      - auto: veRL will automatically check the latest checkpoint from
      `trainer.default_local_dir`
         - disable: veRL will always train from scratch
      - resume_from_path: When setting `resume_from_path`=True, then user only
      need to set the resume_mode to the checkpoint path that you want to
      load.
      
      **TODO:**
      - Support SFT resume in the next PR
      - Support uploader
      
      **Relevant issue:**
      - https://github.com/volcengine/verl/issues/76
      - https://github.com/volcengine/verl/issues/143
      Guangming Sheng committed
    • Fix typo tips in bash sft. (#226) · 62a065b9
      Fix typo tips in bash sft.
      
      Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
      湛露先生 committed
    • Memory efficiency improvement to logprobs_from_logits_v2 (#220) · 4b516249
      Existing `logprobs_from_logits_v2` doesnt achieve the memory savings it
      claims. This is because `logsumexp` still allocates a `bs*seqlen*vocab`
      tensor internally to hold the element-wise application of `exp`.
      However, by applying a loop over `logsumexp`, we can iteratively compute
      logsumexp outputs.
      
      Benchmarks show this uses significantly less memory to compute logprobs.
      
      Fix provided, as well as a separate memory-efficient approach for
      bfloat16 case.
      Tyler Romero committed
  3. 07 Feb, 2025 3 commits
    • [TRACKING] feat: Integrate SwanLab for experiment tracking with online/offline… · 958a3267
      [TRACKING] feat: Integrate SwanLab for experiment tracking with online/offline mode and local dashboard support (#218)
      
      ---
      
      ### Pull Request Description  
      
      This PR introduces **SwanLab**, a lightweight open-source experiment
      tracking tool, as a new logging option for the training framework. The
      integration provides both online and offline tracking capabilities,
      along with a local dashboard for visualizing results. Below is a
      detailed overview of the changes and usage instructions:
      
      ---
      
      #### **Key Features of SwanLab Integration**
      
      1. **Online and Offline Tracking**:
      - **Online Mode**: Track experiments remotely and store data on
      SwanLab's cloud platform.
      - **Offline Mode**: Use a local dashboard to visualize training logs
      without an internet connection.
      
      2. **Hardware Monitoring**:
      - Automatically tracks GPU usage, power consumption, temperature, and
      other hardware metrics.
         - Supports NVIDIA GPUs and Huawei Ascend NPUs.
      
      3. **Remote Access**:
      - View training progress remotely via the SwanLab web interface or
      mobile app.
      
      4. **Local Dashboard**:
      - Includes an open-source local dashboard for offline visualization of
      training logs.
      
      ---
      
      #### **Usage Instructions**
      
      ##### **Step 1: Set Up Online Tracking (Optional)**
      
      To use SwanLab's online tracking, log in to the [SwanLab
      website](https://swanlab.cn) and obtain your API key from the [Settings
      page](https://swanlab.cn/space/~/settings). Then, authenticate using the
      following command:
      
      ```bash
      swanlab login
      ```
      
      If you prefer offline mode, skip this step.
      
      ---
      
      ##### **Step 2: Configure SwanLab as the Logger**
      
      To enable SwanLab as the experiment tracker, add
      `trainer.logger=['swanlab']` to your training command. For example,
      using the [Post-train a LLM using PPO with GSM8K
      dataset](https://verl.readthedocs.io/en/latest/start/quickstart.html)
      workflow:
      
      ```bash
      PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \
       data.train_files=$HOME/data/gsm8k/train.parquet \
       data.val_files=$HOME/data/gsm8k/test.parquet \
       data.train_batch_size=256 \
       data.val_batch_size=1312 \
       data.max_prompt_length=512 \
       data.max_response_length=256 \
       actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \
       actor_rollout_ref.actor.optim.lr=1e-6 \
       actor_rollout_ref.actor.ppo_mini_batch_size=64 \
       actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \
       actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \
       actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
       actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \
       actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \
       critic.optim.lr=1e-5 \
       critic.model.path=Qwen/Qwen2.5-0.5B-Instruct \
       critic.ppo_micro_batch_size_per_gpu=4 \
       algorithm.kl_ctrl.kl_coef=0.001 \
       trainer.logger=['console','swanlab'] \
       +trainer.val_before_train=False \
       trainer.default_hdfs_dir=null \
       trainer.n_gpus_per_node=1 \
       trainer.nnodes=1 \
       trainer.save_freq=10 \
       trainer.test_freq=10 \
       trainer.total_epochs=15 2>&1 | tee verl_demo.log
      ```
      
      If you are not logged in, you will be prompted to choose a tracking
      mode:
      
      1. **Cloud Mode**: Upload logs to SwanLab's cloud platform.
      2. **Cloud-Only Mode**: Upload logs to the cloud but do not save them
      locally.
      3. **Local Mode**: Save logs locally for offline tracking.
      
      <img width="1325" alt="select"
      src="https://github.com/user-attachments/assets/5c55fc45-79a9-4673-ae4e-ea9d0623dd29"
      />
      
      Alternatively, you can configure SwanLab using environment variables:
      
      ```bash
      export SWANLAB_API_KEY=<your_api_key>          # Set API key for online tracking
      export SWANLAB_LOG_DIR=<local_log_path>        # Set local log directory
      export SWANLAB_MODE=<mode>                    # Set tracking mode: cloud (default), cloud-only, local, or disabled
      ```
      
      ---
      
      ##### **Step 3: View Training Logs**
      
      After logging in, you will see a confirmation message:
      
      <img width="1415" alt="track"
      src="https://github.com/user-attachments/assets/87c4ff2f-c8c4-4e7a-a41e-21afa935cb56"
      />
      
      - **Online Tracking**: View logs on the [SwanLab
      website](https://swanlab.cn).
      
      <img width="1900" alt="remote"
      src="https://github.com/user-attachments/assets/5b44b9f3-948f-4f93-9873-572bce56daf7"
      />
      
      For more details, refer to the [SwanLab Cloud
      Documentation](https://docs.swanlab.cn/guide_cloud/experiment_track/view-result.html).
      
      - **Offline Tracking**: Use the local dashboard to visualize logs:
      
        ```bash
        swanlab watch
        ```
      
      For advanced configurations, such as setting a custom port, refer to the
      [Offline Dashboard
      Documentation](https://docs.swanlab.cn/guide_cloud/self_host/offline-board.html)
      and [CLI
      Documentation](https://docs.swanlab.cn/api/cli-swanlab-watch.html#%E8%AE%BE%E7%BD%AEip%E5%92%8C%E7%AB%AF%E5%8F%A3%E5%8F%B7).
      
      ---
      
      #### **Impact**
      
      - Provides a lightweight, flexible, and user-friendly experiment
      tracking solution.
      - Supports both online and offline use cases, making it suitable for
      environments with restricted internet access.
      - Enhances hardware monitoring capabilities for better resource
      utilization.
      
      ---
      
      This PR is ready for review. Feedback and suggestions are welcome!
      Shaohon Chen committed
    • [rollout]: fix incorrect response_attention_mask in vLLM rollout (#213) · 3140cc2f
      This PR addresses issue https://github.com/volcengine/verl/issues/212.
      
      The changes include:
      - read eos_token_id from generation_config to ensure alignment with vLLM
      - modified the get_eos_mask function to accept both int and list types
      for the eos_token parameter.
      Kinman Lei committed
    • [misc] feat: add ckpt manager in utils (#216) · 27484a7b
      - Support FSDPCheckpointManager
      - Support hdfs_io import if installed
      - Add CI for FSDPCheckpointManager
      
      TODO:
      - Will integrate in the next PR
      Guangming Sheng committed
  4. 06 Feb, 2025 3 commits
  5. 05 Feb, 2025 7 commits
  6. 04 Feb, 2025 3 commits
  7. 03 Feb, 2025 4 commits
  8. 02 Feb, 2025 1 commit
  9. 01 Feb, 2025 2 commits
  10. 31 Jan, 2025 4 commits
  11. 30 Jan, 2025 6 commits
    • docs: add split placement to readme · b6068eca
      HL committed
    • Allow users to pass in custom compute_score function (#162) · ce862ce8
      This is a follow-up to https://github.com/volcengine/verl/issues/151
      
      ## Motivation
      
      Currently, in order to add a custom score function you need to fork verl
      and update the `_select_rm_score_fn` to define your logic. This makes it
      harder to use verl as part of a larger application while staying up to
      date with upstream improvements in verl.
      
      It would be convenient to allow end users to directly pass in a reward
      function they wish to use, without requiring them to clone/fork verl to
      do so.
      
      ## Design
      
      In this PR I slightly modify `main_ppo.py` to allow users to import a
      new function `run_ppo`. `run_ppo` behaves very similarly to the existing
      `main`, with the important addition of a new `compute_score` argument.
      This argument, if passed in, is used to compute the score of every
      generation. This is the change that allows
      
      The `compute_score` function is similar in shape to the existing
      `compute_score` on gsm8k and math. However, I have added a new
      `data_source` parameter so that the user can compute the score
      differently if desired depending on the task shape.
      
      ## Example Usage
      
      This is a sample script showing how you can use the new functionality. I
      have tested that this works.
      
      ```python
      from verl.trainer.main_ppo import run_ppo
      from omegaconf import OmegaConf
      
      
      def custom_compute_score(data_source, solution_str, ground_truth):
          """Dummy compute_score function that reward the model for generations of exactly 20 characters :)
          """
          return abs(len(solution_str) - 20)
      
      
      config = OmegaConf.load("vendor/verl/verl/trainer/config/ppo_trainer.yaml")
      
      # Update config as needed
      config.data.train_files = "path/to/train.parquet"
      config.data.val_files = "path/to/test.parquet"
      # ...
      
      run_ppo(config, custom_compute_score)
      ```
      
      ## Breaking changes
      
      There are no breaking changes in this PR. It is still possible to call
      `python -m verl.trainer.main_ppo ...` as before (although if you want to
      pass in a custom compute_score you will need to use the new method
      described above).
      
      ## Possible future work
      
      It would be great to move to [structured
      configs](https://omegaconf.readthedocs.io/en/2.1_branch/structured_config.html)
      as well since they'd allow us to have typesafe, autocompletable
      configurations from Python. I thought about adding those changes here as
      well but they would be much more extensive and I'm not sure whether
      there's interest from the project.
      Kyle Corbitt committed
    • fix: typo (#166) · e7fd415a
      Franz Srambical committed
    • [Liger-kernel] Add an option to use `_apply_liger_kernel_to_instance()` to load model (#133) · dd418779
      ## Summary
      
      This PR enables to use Liger Kernel's `_apply_liger_kernel_to_instance`
      to init a fsdp worker model.
      
      ## Main Changes
      
      1. Adding an option of using
      `liger_kernel.transformers.AutoLigerKernelForCausalLM` to load a model
      from pretained, instead of the default
      `transformers.AutoModelForCausalLM`
      2. Added a test case using configuration file
      `tests/e2e/run_qwen_gsm8k_model_rm_liger_kernel.sh`
      
      ## Related Issue
      
      #96 
      
      ## TODO
      
      #97 optimize the memory usage when computing entropy & log_probs
      
      https://github.com/volcengine/verl/blob/6d96fda3d47f057caaa8f494ca7804181903e911/verl/workers/actor/dp_actor.py#L94-L106
      
      ---------
      
      Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
      Hongpeng Guo committed
    • [nit] Explicitly show logits dimension in `dp_worker.py::forward_micro_batch` (#164) · df03aa6d
      The logits is of shape `(bsz, response_length, vocab_size)`. This PR
      doesn't change any code execution, but explicitly show the logits shape
      and easier for readers to understand the code.
      
      Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
      Hongpeng Guo committed