Commits · ce956ccba8a1706f6695d6d499e10bbcd8192c94 · ZhangXiaoyun / verl

05 Feb, 2025 4 commits

Update README.md (#203) · ce956ccb
Chi Zhang committed Feb 05, 2025

ce956ccb Browse Files

assert to make sure flash-attn>=2.4.3 (#202) · 49913bdc

https://github.com/volcengine/verl/pull/182

add a assert statement to make sure flash-attn>=2.4.3 where
cross_entropy_loss returns Tuple[losses, z_losses]🤯

committed Feb 05, 2025

49913bdc Browse Files

fix critic save error (#199) · 4669ab93

This PR is similar to PR https://github.com/volcengine/verl/pull/174 but
fix the critic save error

I move the old PR to this one due to some redundant commit

committed Feb 05, 2025

4669ab93 Browse Files

docs: fixed link in gsm8k_example.rst (#200) · ddfc04e4

sry missed this last one, should be it 

cc @vermouth1992

Co-authored-by: Jayson Francis <jaysonfrancis@users.noreply.github.com>

committed Feb 04, 2025

ddfc04e4 Browse Files

04 Feb, 2025 3 commits

Call wandb.finish when the tracker is destructed if wandb is in use (#191) · 483fa8ac

runs always show "crashed" on my wandb, despite finishing successfully.
"Crashed" indicates that wandb did not finish sending the "success"
signal to the server so the server believes the client was terminated
unexpectedly. Furthermore, wandb log is incomplete (last lines missing).

This PR adds a call to `wandb.finish` when the Tracker was destructed
(oftentimes when `trainer.fit` finished) so that signals are sent to the
server and a data sync is performed.

Without this change:
<img width="526" alt="image"
src="https://github.com/user-attachments/assets/869da24e-c5b8-415c-b15a-bb79c49f96ce"
/>

With this change:
<img width="548" alt="image"
src="https://github.com/user-attachments/assets/16f0a40d-ea3b-48ed-93a4-f40ee01cb7c6"
/>

committed Feb 04, 2025

483fa8ac Browse Files

Remove unused variable in gsm8k preprocessing code (#193) · 4d420fe5
Neil Chowdhury committed Feb 04, 2025

4d420fe5 Browse Files
fix link in quickstart.rst (#192) · 686a9c3b
```
Co-authored-by: Jayson Francis <jaysonfrancis@users.noreply.github.com>
```
jaysonfrancis committed Feb 04, 2025
686a9c3b Browse Files

03 Feb, 2025 4 commits
- docs: Add LigerKernel performance tuning documentation (#178) · 3fe77fa7
```
This PR adds documentation for the LigerKernel option in a new
performance tuning section, addressing the comment from
volcengine/verl#173.

Changes:
- Created new performance tuning section in docs
- Documented LigerKernel option for SFT
- Added performance tuning section to documentation index

Related to volcengine/verl#173

---------

Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: HL <linhaibin.eric@gmail.com>
```
  Xingyao Wang committed Feb 02, 2025
  3fe77fa7 Browse Files
- docs: update ray_trainer.rst (#187) · 13762f43
```
runnning -> running
```
  Ikko Eltociear Ashimine committed Feb 03, 2025
  13762f43 Browse Files
- megatron: fix config error and add compute log prob interface (#186) · 818e4de2
  HL committed Feb 02, 2025
  
  818e4de2 Browse Files
- docs: update readme with links to examples · fbc8fe82
  HL committed Feb 02, 2025
  
  fbc8fe82 Browse Files
02 Feb, 2025 1 commit
- exchange the mini_batch_size calculation logic (#183) · 6e003c0b
  Chujie Zheng committed Feb 02, 2025
  
  6e003c0b Browse Files
01 Feb, 2025 2 commits
- data: fix the math dataset source (#175) · 677e120a
```
since 'lighteval/MATH' is no longer available on huggingface.
```
  HL committed Feb 01, 2025
  677e120a Browse Files
- [misc] fix: grpo kl loss should be add when do minimization (#179) · a65c9157
```
- As titled
```
  Guangming Sheng committed Feb 01, 2025
  a65c9157 Browse Files
31 Jan, 2025 4 commits
- docs: add twitter link to readme · 38ac5255
  HL committed Jan 31, 2025
  
  38ac5255 Browse Files
- [feat, SFT] Support LigerKernel for SFT (#173) · 25fc194a
  Xingyao Wang committed Feb 01, 2025
  
  25fc194a Browse Files
- validate `use_remove_padding` when applying sequence parallelism (#153) · fb3793ab
  Chujie Zheng committed Feb 01, 2025
  
  fb3793ab Browse Files
- fix bug：fix checkpoint save with existing dirs (#174) · 679798cd
```
![image](https://github.com/user-attachments/assets/f0bae990-a4ef-49da-aa1e-58894b41db5f)

---------

Co-authored-by: HL <linhaibin.eric@gmail.com>
```
  dignfei committed Jan 30, 2025
  679798cd Browse Files
30 Jan, 2025 8 commits

docs: add split placement to readme · b6068eca
HL committed Jan 30, 2025

b6068eca Browse Files

Allow users to pass in custom compute_score function (#162) · ce862ce8

This is a follow-up to https://github.com/volcengine/verl/issues/151

## Motivation

Currently, in order to add a custom score function you need to fork verl
and update the `_select_rm_score_fn` to define your logic. This makes it
harder to use verl as part of a larger application while staying up to
date with upstream improvements in verl.

It would be convenient to allow end users to directly pass in a reward
function they wish to use, without requiring them to clone/fork verl to
do so.

## Design

In this PR I slightly modify `main_ppo.py` to allow users to import a
new function `run_ppo`. `run_ppo` behaves very similarly to the existing
`main`, with the important addition of a new `compute_score` argument.
This argument, if passed in, is used to compute the score of every
generation. This is the change that allows

The `compute_score` function is similar in shape to the existing
`compute_score` on gsm8k and math. However, I have added a new
`data_source` parameter so that the user can compute the score
differently if desired depending on the task shape.

## Example Usage

This is a sample script showing how you can use the new functionality. I
have tested that this works.

```python
from verl.trainer.main_ppo import run_ppo
from omegaconf import OmegaConf


def custom_compute_score(data_source, solution_str, ground_truth):
    """Dummy compute_score function that reward the model for generations of exactly 20 characters :)
    """
    return abs(len(solution_str) - 20)


config = OmegaConf.load("vendor/verl/verl/trainer/config/ppo_trainer.yaml")

# Update config as needed
config.data.train_files = "path/to/train.parquet"
config.data.val_files = "path/to/test.parquet"
# ...

run_ppo(config, custom_compute_score)
```

## Breaking changes

There are no breaking changes in this PR. It is still possible to call
`python -m verl.trainer.main_ppo ...` as before (although if you want to
pass in a custom compute_score you will need to use the new method
described above).

## Possible future work

It would be great to move to [structured
configs](https://omegaconf.readthedocs.io/en/2.1_branch/structured_config.html)
as well since they'd allow us to have typesafe, autocompletable
configurations from Python. I thought about adding those changes here as
well but they would be much more extensive and I'm not sure whether
there's interest from the project.

committed Jan 30, 2025

ce862ce8 Browse Files

fix: typo (explanation) (#167) · 41b7c583
Franz Srambical committed Jan 30, 2025

41b7c583 Browse Files
fix: typo (#166) · e7fd415a
Franz Srambical committed Jan 30, 2025

e7fd415a Browse Files

[Liger-kernel] Add an option to use `_apply_liger_kernel_to_instance()` to load model (#133) · dd418779

## Summary

This PR enables to use Liger Kernel's `_apply_liger_kernel_to_instance`
to init a fsdp worker model.

## Main Changes

1. Adding an option of using
`liger_kernel.transformers.AutoLigerKernelForCausalLM` to load a model
from pretained, instead of the default
`transformers.AutoModelForCausalLM`
2. Added a test case using configuration file
`tests/e2e/run_qwen_gsm8k_model_rm_liger_kernel.sh`

## Related Issue

#96 

## TODO

#97 optimize the memory usage when computing entropy & log_probs

https://github.com/volcengine/verl/blob/6d96fda3d47f057caaa8f494ca7804181903e911/verl/workers/actor/dp_actor.py#L94-L106

---------

Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>

committed Jan 30, 2025

dd418779 Browse Files

[nit] Explicitly show logits dimension in `dp_worker.py::forward_micro_batch` (#164) · df03aa6d

The logits is of shape `(bsz, response_length, vocab_size)`. This PR
doesn't change any code execution, but explicitly show the logits shape
and easier for readers to understand the code.

Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>

committed Jan 30, 2025

df03aa6d Browse Files

Update README.md (#165) · 8f6e5f5b
```
Add contribution guide
```
Chi Zhang committed Jan 30, 2025
8f6e5f5b Browse Files
[misc] fix: fix ray requirement (#163) · 29935aed
Chi Zhang committed Jan 30, 2025

29935aed Browse Files

29 Jan, 2025 3 commits
- fix: redundant non_zero_mask (#152) · f9784cdf
```
`token_level_rewards == (token_level_rewards * non_zero_mask)`
```
  Franz Srambical committed Jan 29, 2025
  f9784cdf Browse Files
- fix: fix missing version file dependency when installing without -e (#156) · e3cc0ebd
  HL committed Jan 28, 2025
  
  e3cc0ebd Browse Files
- docs: add ragen as a related work · 33e374aa
  HL committed Jan 28, 2025
  
  33e374aa Browse Files
28 Jan, 2025 1 commit
- [misc]fix: pad dataproto when pad size is larger than len(dataproto) (#150) · ab525bce
```
- As titled
- Solved: #149 

Waiting for testing from @chujiezheng

---------

Co-authored-by: Chi Zhang <zhangchi.usc1992@bytedance.com>
```
  Guangming Sheng committed Jan 28, 2025
  ab525bce Browse Files
27 Jan, 2025 10 commits

[misc] feat: enable grad ckpt as default and enable chunk prefill as default (#147) · 9fca71d2
Guangming Sheng committed Jan 28, 2025

9fca71d2 Browse Files
[perf] docs: fix typo · 54603cbd
HL committed Jan 27, 2025

54603cbd Browse Files
docs: add news for doubao-1.5-pro · b2c6ff7a
HL committed Jan 27, 2025

b2c6ff7a Browse Files
Update README.md (#146) · 12b0b59e
```
- Add link to performance tuning
```
Chi Zhang committed Jan 27, 2025
12b0b59e Browse Files

[misc] fix: gradient accumulation in seq balance and modify default vllm log level (#141) · 695bdbb0

- Previous gradient accumulation value is computed by micro_batch_size,
which is wrong when using dynamic_bsz
- Fix ci script to avoid overlooking this issue
- Change vLLM state log default value to True to disable log.
- We will check the `self.config.actor.ppo_mini_batch_size %
self.config.actor.ppo_micro_batch_size_per_gpu == 0` after normalization
in fsdp_workers instead of in dp_actor and dp_critic.

committed Jan 27, 2025

695bdbb0 Browse Files

[doc] perf: add performance tuning guide for FSDP backend (#142) · 78a4606d
```
- As titled
```
Guangming Sheng committed Jan 27, 2025
78a4606d Browse Files

[SFT] Support context parallelism for SFT (#132) · 077173f2

# Add Sequence Parallelism and Padding Removal to SFT Trainer

This PR adds sequence parallelism (SP) and padding removal optimizations
to the SFT trainer, which can help improve training efficiency for large
language models.

## Key Changes

### Core Features
1. **Sequence Parallelism**: Added support for sequence parallelism
through the Ulysses framework
   - Configurable via `ulysses_sequence_parallel_size` parameter
   - Properly handles data distribution across SP ranks
   - Maintains consistent loss computation across distributed setup

2. **Padding Removal**: Added support for efficient handling of
variable-length sequences
   - Enabled via `use_remove_padding` flag (requires SP to be enabled)
   - Uses flash-attention's padding removal utilities
   - Handles proper re-padding and loss computation

3. **Training Improvements**:
   - Added label smoothing support to loss computation
   - Added progress bar with epoch information
   - Added RoPE scaling configuration support
   - Improved error messages for batch size validation

### Testing
- Added comprehensive test suite (`test_trainer.py`) to verify:
- Forward pass consistency between original and SP+rmpad implementations
  - Loss computation correctness across distributed setup
  - Proper handling of micro-batches

### Example Usage
Added example script `examples/sft/gsm8k/run_qwen_05_sp2.sh`
demonstrating how to use the new features with Qwen-2.5B model.

## Implementation Details
- Uses device mesh for proper distributed training setup
- Handles data distribution ensuring same sequences within SP groups but
different across DP groups
- Carefully manages backward pass timing with gradient checkpointing
- Maintains compatibility with existing FSDP features

## Testing Instructions
1. Run the example script with sequence parallelism:
```bash
bash examples/sft/gsm8k/run_qwen_05_sp2.sh <nproc_per_node> <save_path>
```

2. Run the test suite:
```bash tests/sft/run_sft_sp_loss_match.sh```


^^ These are PR description generated by [OpenHands](https://github.com/All-Hands-AI/OpenHands)

---------

Co-authored-by: Jiayi Pan <i@jiayipan.me>
Co-authored-by: openhands <openhands@all-hands.dev>

committed Jan 27, 2025

077173f2 Browse Files

[VLLM] Set max_num_batched_tokens for vllm rollout (#140) · c99df03f

We set `max_num_batched_tokens` in config `.rollout`, but they weren't
actually being passed to VLLM -- causing potential insufficient use of
GPUs.

This PR:

- properly pass `max_num_batched_tokens` from config to vLLM
- set `disable_log_stats` to False, so vLLM performance information can
be properly displayed (to spot issues)

committed Jan 27, 2025

c99df03f Browse Files

[BREAKING][misc] feat: change micro_batch_size to micro_batch_size_per_gpu (#136) · f2a76acd

## Summary

This PR changes all the micro_batch_size to micro_batch_size_per_gpu.

**The Core logic of setting batch size:**
- **All algorithmic metrics** (train batch size, ppo mini batch size):
are global (from the perspective of single-controller), which will be
normalized in each Worker.
- **All performance-related parameters** (micro batch size, max token
length in dynamic batch size) are local parameters, which represent the
data sizes per GPU (i.e., each Worker).

## Main Changes

1. Change the scripts and config and delete the normalization for
micro_bsz
2. Fix CI for SFT

committed Jan 27, 2025

f2a76acd Browse Files

docs: add reference for tiny-zero · c17e6c62
HL committed Jan 26, 2025

c17e6c62 Browse Files