1. 09 Feb, 2025 1 commit
    • implement REINFORCE++ algorithm (#228) · bdb50ac3
      We have implemented the REINFORCE++ algorithm.
      
      To use it, specify the parameter
      `algorithm.adv_estimator=reinforce_plus_plus`.
      
      Preliminary performance evaluations were conducted within the
      [Unakar/Logic-RL](https://github.com/Unakar/Logic-RL) project, a
      reproduction of DeepSeek R1 Zero on the 2K Tiny Logic Puzzle Dataset.
      Results indicate that our REINFORCE++ implementation exhibits
      performance and training stability comparable to, or potentially
      exceeding, that of PPO and GRPO.
      
      Related issue: #68
      4332001876 committed
  2. 08 Feb, 2025 3 commits
    • [ckpt] feat: integrate checkpoint resume in RL ray trainer (#222) · 5a400bf2
      **Features:**
      - Save actor and critic checkpoint:
        - Model
        - Optimizer
        - lr_scheduler
        - rng_state
        - dataloader
      - A complete checkpoint represents that dataloader, actor and critic (if
      any) state are properly saved
      - By default, we will not save the dataset but only store the dataloader
      (with sampler) state
      
      **Usage:**
      - Support resume mode: auto, disable and resume_from_path
      - auto: veRL will automatically check the latest checkpoint from
      `trainer.default_local_dir`
         - disable: veRL will always train from scratch
      - resume_from_path: When setting `resume_from_path`=True, then user only
      need to set the resume_mode to the checkpoint path that you want to
      load.
      
      **TODO:**
      - Support SFT resume in the next PR
      - Support uploader
      
      **Relevant issue:**
      - https://github.com/volcengine/verl/issues/76
      - https://github.com/volcengine/verl/issues/143
      Guangming Sheng committed
    • Fix typo tips in bash sft. (#226) · 62a065b9
      Fix typo tips in bash sft.
      
      Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
      湛露先生 committed
    • Memory efficiency improvement to logprobs_from_logits_v2 (#220) · 4b516249
      Existing `logprobs_from_logits_v2` doesnt achieve the memory savings it
      claims. This is because `logsumexp` still allocates a `bs*seqlen*vocab`
      tensor internally to hold the element-wise application of `exp`.
      However, by applying a loop over `logsumexp`, we can iteratively compute
      logsumexp outputs.
      
      Benchmarks show this uses significantly less memory to compute logprobs.
      
      Fix provided, as well as a separate memory-efficient approach for
      bfloat16 case.
      Tyler Romero committed
  3. 07 Feb, 2025 3 commits
    • [TRACKING] feat: Integrate SwanLab for experiment tracking with online/offline… · 958a3267
      [TRACKING] feat: Integrate SwanLab for experiment tracking with online/offline mode and local dashboard support (#218)
      
      ---
      
      ### Pull Request Description  
      
      This PR introduces **SwanLab**, a lightweight open-source experiment
      tracking tool, as a new logging option for the training framework. The
      integration provides both online and offline tracking capabilities,
      along with a local dashboard for visualizing results. Below is a
      detailed overview of the changes and usage instructions:
      
      ---
      
      #### **Key Features of SwanLab Integration**
      
      1. **Online and Offline Tracking**:
      - **Online Mode**: Track experiments remotely and store data on
      SwanLab's cloud platform.
      - **Offline Mode**: Use a local dashboard to visualize training logs
      without an internet connection.
      
      2. **Hardware Monitoring**:
      - Automatically tracks GPU usage, power consumption, temperature, and
      other hardware metrics.
         - Supports NVIDIA GPUs and Huawei Ascend NPUs.
      
      3. **Remote Access**:
      - View training progress remotely via the SwanLab web interface or
      mobile app.
      
      4. **Local Dashboard**:
      - Includes an open-source local dashboard for offline visualization of
      training logs.
      
      ---
      
      #### **Usage Instructions**
      
      ##### **Step 1: Set Up Online Tracking (Optional)**
      
      To use SwanLab's online tracking, log in to the [SwanLab
      website](https://swanlab.cn) and obtain your API key from the [Settings
      page](https://swanlab.cn/space/~/settings). Then, authenticate using the
      following command:
      
      ```bash
      swanlab login
      ```
      
      If you prefer offline mode, skip this step.
      
      ---
      
      ##### **Step 2: Configure SwanLab as the Logger**
      
      To enable SwanLab as the experiment tracker, add
      `trainer.logger=['swanlab']` to your training command. For example,
      using the [Post-train a LLM using PPO with GSM8K
      dataset](https://verl.readthedocs.io/en/latest/start/quickstart.html)
      workflow:
      
      ```bash
      PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \
       data.train_files=$HOME/data/gsm8k/train.parquet \
       data.val_files=$HOME/data/gsm8k/test.parquet \
       data.train_batch_size=256 \
       data.val_batch_size=1312 \
       data.max_prompt_length=512 \
       data.max_response_length=256 \
       actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \
       actor_rollout_ref.actor.optim.lr=1e-6 \
       actor_rollout_ref.actor.ppo_mini_batch_size=64 \
       actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \
       actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \
       actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
       actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \
       actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \
       critic.optim.lr=1e-5 \
       critic.model.path=Qwen/Qwen2.5-0.5B-Instruct \
       critic.ppo_micro_batch_size_per_gpu=4 \
       algorithm.kl_ctrl.kl_coef=0.001 \
       trainer.logger=['console','swanlab'] \
       +trainer.val_before_train=False \
       trainer.default_hdfs_dir=null \
       trainer.n_gpus_per_node=1 \
       trainer.nnodes=1 \
       trainer.save_freq=10 \
       trainer.test_freq=10 \
       trainer.total_epochs=15 2>&1 | tee verl_demo.log
      ```
      
      If you are not logged in, you will be prompted to choose a tracking
      mode:
      
      1. **Cloud Mode**: Upload logs to SwanLab's cloud platform.
      2. **Cloud-Only Mode**: Upload logs to the cloud but do not save them
      locally.
      3. **Local Mode**: Save logs locally for offline tracking.
      
      <img width="1325" alt="select"
      src="https://github.com/user-attachments/assets/5c55fc45-79a9-4673-ae4e-ea9d0623dd29"
      />
      
      Alternatively, you can configure SwanLab using environment variables:
      
      ```bash
      export SWANLAB_API_KEY=<your_api_key>          # Set API key for online tracking
      export SWANLAB_LOG_DIR=<local_log_path>        # Set local log directory
      export SWANLAB_MODE=<mode>                    # Set tracking mode: cloud (default), cloud-only, local, or disabled
      ```
      
      ---
      
      ##### **Step 3: View Training Logs**
      
      After logging in, you will see a confirmation message:
      
      <img width="1415" alt="track"
      src="https://github.com/user-attachments/assets/87c4ff2f-c8c4-4e7a-a41e-21afa935cb56"
      />
      
      - **Online Tracking**: View logs on the [SwanLab
      website](https://swanlab.cn).
      
      <img width="1900" alt="remote"
      src="https://github.com/user-attachments/assets/5b44b9f3-948f-4f93-9873-572bce56daf7"
      />
      
      For more details, refer to the [SwanLab Cloud
      Documentation](https://docs.swanlab.cn/guide_cloud/experiment_track/view-result.html).
      
      - **Offline Tracking**: Use the local dashboard to visualize logs:
      
        ```bash
        swanlab watch
        ```
      
      For advanced configurations, such as setting a custom port, refer to the
      [Offline Dashboard
      Documentation](https://docs.swanlab.cn/guide_cloud/self_host/offline-board.html)
      and [CLI
      Documentation](https://docs.swanlab.cn/api/cli-swanlab-watch.html#%E8%AE%BE%E7%BD%AEip%E5%92%8C%E7%AB%AF%E5%8F%A3%E5%8F%B7).
      
      ---
      
      #### **Impact**
      
      - Provides a lightweight, flexible, and user-friendly experiment
      tracking solution.
      - Supports both online and offline use cases, making it suitable for
      environments with restricted internet access.
      - Enhances hardware monitoring capabilities for better resource
      utilization.
      
      ---
      
      This PR is ready for review. Feedback and suggestions are welcome!
      Shaohon Chen committed
    • [rollout]: fix incorrect response_attention_mask in vLLM rollout (#213) · 3140cc2f
      This PR addresses issue https://github.com/volcengine/verl/issues/212.
      
      The changes include:
      - read eos_token_id from generation_config to ensure alignment with vLLM
      - modified the get_eos_mask function to accept both int and list types
      for the eos_token parameter.
      Kinman Lei committed
    • [misc] feat: add ckpt manager in utils (#216) · 27484a7b
      - Support FSDPCheckpointManager
      - Support hdfs_io import if installed
      - Add CI for FSDPCheckpointManager
      
      TODO:
      - Will integrate in the next PR
      Guangming Sheng committed
  4. 06 Feb, 2025 3 commits
  5. 05 Feb, 2025 7 commits
  6. 04 Feb, 2025 3 commits
  7. 03 Feb, 2025 4 commits
  8. 02 Feb, 2025 1 commit
  9. 01 Feb, 2025 2 commits
  10. 31 Jan, 2025 4 commits
  11. 30 Jan, 2025 8 commits
  12. 29 Jan, 2025 1 commit