Commits · bdb50ac333fad7315eee2d009cc98013ce0c1e8a · ZhangXiaoyun / verl

09 Feb, 2025 1 commit

implement REINFORCE++ algorithm (#228) · bdb50ac3

We have implemented the REINFORCE++ algorithm.

To use it, specify the parameter
`algorithm.adv_estimator=reinforce_plus_plus`.

Preliminary performance evaluations were conducted within the
[Unakar/Logic-RL](https://github.com/Unakar/Logic-RL) project, a
reproduction of DeepSeek R1 Zero on the 2K Tiny Logic Puzzle Dataset.
Results indicate that our REINFORCE++ implementation exhibits
performance and training stability comparable to, or potentially
exceeding, that of PPO and GRPO.

Related issue: #68

committed Feb 09, 2025

bdb50ac3 Browse Files

08 Feb, 2025 3 commits

[ckpt] feat: integrate checkpoint resume in RL ray trainer (#222) · 5a400bf2

**Features:**
- Save actor and critic checkpoint:
  - Model
  - Optimizer
  - lr_scheduler
  - rng_state
  - dataloader
- A complete checkpoint represents that dataloader, actor and critic (if
any) state are properly saved
- By default, we will not save the dataset but only store the dataloader
(with sampler) state

**Usage:**
- Support resume mode: auto, disable and resume_from_path
- auto: veRL will automatically check the latest checkpoint from
`trainer.default_local_dir`
   - disable: veRL will always train from scratch
- resume_from_path: When setting `resume_from_path`=True, then user only
need to set the resume_mode to the checkpoint path that you want to
load.

**TODO:**
- Support SFT resume in the next PR
- Support uploader

**Relevant issue:**
- https://github.com/volcengine/verl/issues/76
- https://github.com/volcengine/verl/issues/143

committed Feb 08, 2025

5a400bf2 Browse Files

Fix typo tips in bash sft. (#226) · 62a065b9
```
Fix typo tips in bash sft.

Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
```
湛露先生 committed Feb 08, 2025
62a065b9 Browse Files

Memory efficiency improvement to logprobs_from_logits_v2 (#220) · 4b516249

Existing `logprobs_from_logits_v2` doesnt achieve the memory savings it
claims. This is because `logsumexp` still allocates a `bs*seqlen*vocab`
tensor internally to hold the element-wise application of `exp`.
However, by applying a loop over `logsumexp`, we can iteratively compute
logsumexp outputs.

Benchmarks show this uses significantly less memory to compute logprobs.

Fix provided, as well as a separate memory-efficient approach for
bfloat16 case.

committed Feb 08, 2025

4b516249 Browse Files

07 Feb, 2025 3 commits

[TRACKING] feat: Integrate SwanLab for experiment tracking with online/offline… · 958a3267

[TRACKING] feat: Integrate SwanLab for experiment tracking with online/offline mode and local dashboard support (#218)

---

### Pull Request Description  

This PR introduces **SwanLab**, a lightweight open-source experiment
tracking tool, as a new logging option for the training framework. The
integration provides both online and offline tracking capabilities,
along with a local dashboard for visualizing results. Below is a
detailed overview of the changes and usage instructions:

---

#### **Key Features of SwanLab Integration**

1. **Online and Offline Tracking**:
- **Online Mode**: Track experiments remotely and store data on
SwanLab's cloud platform.
- **Offline Mode**: Use a local dashboard to visualize training logs
without an internet connection.

2. **Hardware Monitoring**:
- Automatically tracks GPU usage, power consumption, temperature, and
other hardware metrics.
   - Supports NVIDIA GPUs and Huawei Ascend NPUs.

3. **Remote Access**:
- View training progress remotely via the SwanLab web interface or
mobile app.

4. **Local Dashboard**:
- Includes an open-source local dashboard for offline visualization of
training logs.

---

#### **Usage Instructions**

##### **Step 1: Set Up Online Tracking (Optional)**

To use SwanLab's online tracking, log in to the [SwanLab
website](https://swanlab.cn) and obtain your API key from the [Settings
page](https://swanlab.cn/space/~/settings). Then, authenticate using the
following command:

```bash
swanlab login
```

If you prefer offline mode, skip this step.

---

##### **Step 2: Configure SwanLab as the Logger**

To enable SwanLab as the experiment tracker, add
`trainer.logger=['swanlab']` to your training command. For example,
using the [Post-train a LLM using PPO with GSM8K
dataset](https://verl.readthedocs.io/en/latest/start/quickstart.html)
workflow:

```bash
PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \
 data.train_files=$HOME/data/gsm8k/train.parquet \
 data.val_files=$HOME/data/gsm8k/test.parquet \
 data.train_batch_size=256 \
 data.val_batch_size=1312 \
 data.max_prompt_length=512 \
 data.max_response_length=256 \
 actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \
 actor_rollout_ref.actor.optim.lr=1e-6 \
 actor_rollout_ref.actor.ppo_mini_batch_size=64 \
 actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \
 actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \
 actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
 actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \
 actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \
 critic.optim.lr=1e-5 \
 critic.model.path=Qwen/Qwen2.5-0.5B-Instruct \
 critic.ppo_micro_batch_size_per_gpu=4 \
 algorithm.kl_ctrl.kl_coef=0.001 \
 trainer.logger=['console','swanlab'] \
 +trainer.val_before_train=False \
 trainer.default_hdfs_dir=null \
 trainer.n_gpus_per_node=1 \
 trainer.nnodes=1 \
 trainer.save_freq=10 \
 trainer.test_freq=10 \
 trainer.total_epochs=15 2>&1 | tee verl_demo.log
```

If you are not logged in, you will be prompted to choose a tracking
mode:

1. **Cloud Mode**: Upload logs to SwanLab's cloud platform.
2. **Cloud-Only Mode**: Upload logs to the cloud but do not save them
locally.
3. **Local Mode**: Save logs locally for offline tracking.

<img width="1325" alt="select"
src="https://github.com/user-attachments/assets/5c55fc45-79a9-4673-ae4e-ea9d0623dd29"
/>

Alternatively, you can configure SwanLab using environment variables:

```bash
export SWANLAB_API_KEY=<your_api_key>          # Set API key for online tracking
export SWANLAB_LOG_DIR=<local_log_path>        # Set local log directory
export SWANLAB_MODE=<mode>                    # Set tracking mode: cloud (default), cloud-only, local, or disabled
```

---

##### **Step 3: View Training Logs**

After logging in, you will see a confirmation message:

<img width="1415" alt="track"
src="https://github.com/user-attachments/assets/87c4ff2f-c8c4-4e7a-a41e-21afa935cb56"
/>

- **Online Tracking**: View logs on the [SwanLab
website](https://swanlab.cn).

<img width="1900" alt="remote"
src="https://github.com/user-attachments/assets/5b44b9f3-948f-4f93-9873-572bce56daf7"
/>

For more details, refer to the [SwanLab Cloud
Documentation](https://docs.swanlab.cn/guide_cloud/experiment_track/view-result.html).

- **Offline Tracking**: Use the local dashboard to visualize logs:

  ```bash
  swanlab watch
  ```

For advanced configurations, such as setting a custom port, refer to the
[Offline Dashboard
Documentation](https://docs.swanlab.cn/guide_cloud/self_host/offline-board.html)
and [CLI
Documentation](https://docs.swanlab.cn/api/cli-swanlab-watch.html#%E8%AE%BE%E7%BD%AEip%E5%92%8C%E7%AB%AF%E5%8F%A3%E5%8F%B7).

---

#### **Impact**

- Provides a lightweight, flexible, and user-friendly experiment
tracking solution.
- Supports both online and offline use cases, making it suitable for
environments with restricted internet access.
- Enhances hardware monitoring capabilities for better resource
utilization.

---

This PR is ready for review. Feedback and suggestions are welcome!

committed Feb 08, 2025

958a3267 Browse Files

[rollout]: fix incorrect response_attention_mask in vLLM rollout (#213) · 3140cc2f

This PR addresses issue https://github.com/volcengine/verl/issues/212.

The changes include:
- read eos_token_id from generation_config to ensure alignment with vLLM
- modified the get_eos_mask function to accept both int and list types
for the eos_token parameter.

committed Feb 08, 2025

3140cc2f Browse Files

[misc] feat: add ckpt manager in utils (#216) · 27484a7b

- Support FSDPCheckpointManager
- Support hdfs_io import if installed
- Add CI for FSDPCheckpointManager

TODO:
- Will integrate in the next PR

committed Feb 07, 2025

27484a7b Browse Files

06 Feb, 2025 3 commits
- Create scorecard.yml for security scan (#215) · ee23b7bb
```
Install the scorecard workflow
```
  Willem Jiang committed Feb 06, 2025
  ee23b7bb Browse Files
- example: switch the default model ckpt for Megatron, add wandb logs (#210) · ced8ecbf
```
use the general purpose LLM for the math task instead of code LLM.

---------

Co-authored-by: Your Name <you@example.com>
```
  HL committed Feb 05, 2025
  ced8ecbf Browse Files
- docs: simple use name verl. add meetup info · 22d56a8b
  HL committed Feb 05, 2025
  
  22d56a8b Browse Files
05 Feb, 2025 7 commits
- [misc] fix: load and offload in compute log prob (#208) · 6872dbef
```
- As titled
- Relevant: https://github.com/volcengine/verl/issues/181
```
  Guangming Sheng committed Feb 05, 2025
  6872dbef Browse Files
- [CI] feat: make the digit completion ci use fsdp's ppo_config (#206) · 89ba48e7
```
- As titled
```
  Guangming Sheng committed Feb 05, 2025
  89ba48e7 Browse Files
- [misc] refactor: refactor config, add grpo ci (#204) · ac9d2467
```
- Move config to a class method of `RayPPOTrainer`
- Fix config problem when adv_estimator=grpo
- Add GRPO e2e CI
```
  Chi Zhang committed Feb 05, 2025
  ac9d2467 Browse Files
- Update README.md (#203) · ce956ccb
  Chi Zhang committed Feb 05, 2025
  
  ce956ccb Browse Files
- assert to make sure flash-attn>=2.4.3 (#202) · 49913bdc
```
https://github.com/volcengine/verl/pull/182

add a assert statement to make sure flash-attn>=2.4.3 where
cross_entropy_loss returns Tuple[losses, z_losses]🤯
```
  be betterest committed Feb 05, 2025
  49913bdc Browse Files
- fix critic save error (#199) · 4669ab93
```
This PR is similar to PR https://github.com/volcengine/verl/pull/174 but
fix the critic save error

I move the old PR to this one due to some redundant commit
```
  Wei Xiong committed Feb 05, 2025
  4669ab93 Browse Files
- docs: fixed link in gsm8k_example.rst (#200) · ddfc04e4
```
sry missed this last one, should be it 

cc @vermouth1992

Co-authored-by: Jayson Francis <jaysonfrancis@users.noreply.github.com>
```
  jaysonfrancis committed Feb 04, 2025
  ddfc04e4 Browse Files
04 Feb, 2025 3 commits

Call wandb.finish when the tracker is destructed if wandb is in use (#191) · 483fa8ac

runs always show "crashed" on my wandb, despite finishing successfully.
"Crashed" indicates that wandb did not finish sending the "success"
signal to the server so the server believes the client was terminated
unexpectedly. Furthermore, wandb log is incomplete (last lines missing).

This PR adds a call to `wandb.finish` when the Tracker was destructed
(oftentimes when `trainer.fit` finished) so that signals are sent to the
server and a data sync is performed.

Without this change:
<img width="526" alt="image"
src="https://github.com/user-attachments/assets/869da24e-c5b8-415c-b15a-bb79c49f96ce"
/>

With this change:
<img width="548" alt="image"
src="https://github.com/user-attachments/assets/16f0a40d-ea3b-48ed-93a4-f40ee01cb7c6"
/>

committed Feb 04, 2025

483fa8ac Browse Files

Remove unused variable in gsm8k preprocessing code (#193) · 4d420fe5
Neil Chowdhury committed Feb 04, 2025

4d420fe5 Browse Files
fix link in quickstart.rst (#192) · 686a9c3b
```
Co-authored-by: Jayson Francis <jaysonfrancis@users.noreply.github.com>
```
jaysonfrancis committed Feb 04, 2025
686a9c3b Browse Files

03 Feb, 2025 4 commits
- docs: Add LigerKernel performance tuning documentation (#178) · 3fe77fa7
```
This PR adds documentation for the LigerKernel option in a new
performance tuning section, addressing the comment from
volcengine/verl#173.

Changes:
- Created new performance tuning section in docs
- Documented LigerKernel option for SFT
- Added performance tuning section to documentation index

Related to volcengine/verl#173

---------

Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: HL <linhaibin.eric@gmail.com>
```
  Xingyao Wang committed Feb 02, 2025
  3fe77fa7 Browse Files
- docs: update ray_trainer.rst (#187) · 13762f43
```
runnning -> running
```
  Ikko Eltociear Ashimine committed Feb 03, 2025
  13762f43 Browse Files
- megatron: fix config error and add compute log prob interface (#186) · 818e4de2
  HL committed Feb 02, 2025
  
  818e4de2 Browse Files
- docs: update readme with links to examples · fbc8fe82
  HL committed Feb 02, 2025
  
  fbc8fe82 Browse Files
02 Feb, 2025 1 commit
- exchange the mini_batch_size calculation logic (#183) · 6e003c0b
  Chujie Zheng committed Feb 02, 2025
  
  6e003c0b Browse Files
01 Feb, 2025 2 commits
- data: fix the math dataset source (#175) · 677e120a
```
since 'lighteval/MATH' is no longer available on huggingface.
```
  HL committed Feb 01, 2025
  677e120a Browse Files
- [misc] fix: grpo kl loss should be add when do minimization (#179) · a65c9157
```
- As titled
```
  Guangming Sheng committed Feb 01, 2025
  a65c9157 Browse Files
31 Jan, 2025 4 commits
- docs: add twitter link to readme · 38ac5255
  HL committed Jan 31, 2025
  
  38ac5255 Browse Files
- [feat, SFT] Support LigerKernel for SFT (#173) · 25fc194a
  Xingyao Wang committed Feb 01, 2025
  
  25fc194a Browse Files
- validate `use_remove_padding` when applying sequence parallelism (#153) · fb3793ab
  Chujie Zheng committed Feb 01, 2025
  
  fb3793ab Browse Files
- fix bug：fix checkpoint save with existing dirs (#174) · 679798cd
```
![image](https://github.com/user-attachments/assets/f0bae990-a4ef-49da-aa1e-58894b41db5f)

---------

Co-authored-by: HL <linhaibin.eric@gmail.com>
```
  dignfei committed Jan 30, 2025
  679798cd Browse Files
30 Jan, 2025 8 commits

docs: add split placement to readme · b6068eca
HL committed Jan 30, 2025

b6068eca Browse Files

Allow users to pass in custom compute_score function (#162) · ce862ce8

This is a follow-up to https://github.com/volcengine/verl/issues/151

## Motivation

Currently, in order to add a custom score function you need to fork verl
and update the `_select_rm_score_fn` to define your logic. This makes it
harder to use verl as part of a larger application while staying up to
date with upstream improvements in verl.

It would be convenient to allow end users to directly pass in a reward
function they wish to use, without requiring them to clone/fork verl to
do so.

## Design

In this PR I slightly modify `main_ppo.py` to allow users to import a
new function `run_ppo`. `run_ppo` behaves very similarly to the existing
`main`, with the important addition of a new `compute_score` argument.
This argument, if passed in, is used to compute the score of every
generation. This is the change that allows

The `compute_score` function is similar in shape to the existing
`compute_score` on gsm8k and math. However, I have added a new
`data_source` parameter so that the user can compute the score
differently if desired depending on the task shape.

## Example Usage

This is a sample script showing how you can use the new functionality. I
have tested that this works.

```python
from verl.trainer.main_ppo import run_ppo
from omegaconf import OmegaConf


def custom_compute_score(data_source, solution_str, ground_truth):
    """Dummy compute_score function that reward the model for generations of exactly 20 characters :)
    """
    return abs(len(solution_str) - 20)


config = OmegaConf.load("vendor/verl/verl/trainer/config/ppo_trainer.yaml")

# Update config as needed
config.data.train_files = "path/to/train.parquet"
config.data.val_files = "path/to/test.parquet"
# ...

run_ppo(config, custom_compute_score)
```

## Breaking changes

There are no breaking changes in this PR. It is still possible to call
`python -m verl.trainer.main_ppo ...` as before (although if you want to
pass in a custom compute_score you will need to use the new method
described above).

## Possible future work

It would be great to move to [structured
configs](https://omegaconf.readthedocs.io/en/2.1_branch/structured_config.html)
as well since they'd allow us to have typesafe, autocompletable
configurations from Python. I thought about adding those changes here as
well but they would be much more extensive and I'm not sure whether
there's interest from the project.

committed Jan 30, 2025

ce862ce8 Browse Files

fix: typo (explanation) (#167) · 41b7c583
Franz Srambical committed Jan 30, 2025

41b7c583 Browse Files
fix: typo (#166) · e7fd415a
Franz Srambical committed Jan 30, 2025

e7fd415a Browse Files

[Liger-kernel] Add an option to use `_apply_liger_kernel_to_instance()` to load model (#133) · dd418779

## Summary

This PR enables to use Liger Kernel's `_apply_liger_kernel_to_instance`
to init a fsdp worker model.

## Main Changes

1. Adding an option of using
`liger_kernel.transformers.AutoLigerKernelForCausalLM` to load a model
from pretained, instead of the default
`transformers.AutoModelForCausalLM`
2. Added a test case using configuration file
`tests/e2e/run_qwen_gsm8k_model_rm_liger_kernel.sh`

## Related Issue

#96 

## TODO

#97 optimize the memory usage when computing entropy & log_probs

https://github.com/volcengine/verl/blob/6d96fda3d47f057caaa8f494ca7804181903e911/verl/workers/actor/dp_actor.py#L94-L106

---------

Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>

committed Jan 30, 2025

dd418779 Browse Files

[nit] Explicitly show logits dimension in `dp_worker.py::forward_micro_batch` (#164) · df03aa6d

The logits is of shape `(bsz, response_length, vocab_size)`. This PR
doesn't change any code execution, but explicitly show the logits shape
and easier for readers to understand the code.

Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>

committed Jan 30, 2025

df03aa6d Browse Files

Update README.md (#165) · 8f6e5f5b
```
Add contribution guide
```
Chi Zhang committed Jan 30, 2025
8f6e5f5b Browse Files
[misc] fix: fix ray requirement (#163) · 29935aed
Chi Zhang committed Jan 30, 2025

29935aed Browse Files

29 Jan, 2025 1 commit
- fix: redundant non_zero_mask (#152) · f9784cdf
```
`token_level_rewards == (token_level_rewards * non_zero_mask)`
```
  Franz Srambical committed Jan 29, 2025
  f9784cdf Browse Files