- 07 Feb, 2025 2 commits
-
-
This PR addresses issue https://github.com/volcengine/verl/issues/212. The changes include: - read eos_token_id from generation_config to ensure alignment with vLLM - modified the get_eos_mask function to accept both int and list types for the eos_token parameter.
Kinman Lei committed -
- Support FSDPCheckpointManager - Support hdfs_io import if installed - Add CI for FSDPCheckpointManager TODO: - Will integrate in the next PR
Guangming Sheng committed
-
- 06 Feb, 2025 3 commits
-
-
Install the scorecard workflow
Willem Jiang committed -
use the general purpose LLM for the math task instead of code LLM. --------- Co-authored-by: Your Name <you@example.com>
HL committed -
HL committed
-
- 05 Feb, 2025 7 commits
-
-
- As titled - Relevant: https://github.com/volcengine/verl/issues/181
Guangming Sheng committed -
- As titled
Guangming Sheng committed -
- Move config to a class method of `RayPPOTrainer` - Fix config problem when adv_estimator=grpo - Add GRPO e2e CI
Chi Zhang committed -
Chi Zhang committed
-
https://github.com/volcengine/verl/pull/182 add a assert statement to make sure flash-attn>=2.4.3 where cross_entropy_loss returns Tuple[losses, z_losses]🤯
be betterest committed -
This PR is similar to PR https://github.com/volcengine/verl/pull/174 but fix the critic save error I move the old PR to this one due to some redundant commit
Wei Xiong committed -
sry missed this last one, should be it cc @vermouth1992 Co-authored-by: Jayson Francis <jaysonfrancis@users.noreply.github.com>
jaysonfrancis committed
-
- 04 Feb, 2025 3 commits
-
-
runs always show "crashed" on my wandb, despite finishing successfully. "Crashed" indicates that wandb did not finish sending the "success" signal to the server so the server believes the client was terminated unexpectedly. Furthermore, wandb log is incomplete (last lines missing). This PR adds a call to `wandb.finish` when the Tracker was destructed (oftentimes when `trainer.fit` finished) so that signals are sent to the server and a data sync is performed. Without this change: <img width="526" alt="image" src="https://github.com/user-attachments/assets/869da24e-c5b8-415c-b15a-bb79c49f96ce" /> With this change: <img width="548" alt="image" src="https://github.com/user-attachments/assets/16f0a40d-ea3b-48ed-93a4-f40ee01cb7c6" />
Long(Tony) Lian committed -
Neil Chowdhury committed
-
Co-authored-by: Jayson Francis <jaysonfrancis@users.noreply.github.com>
jaysonfrancis committed
-
- 03 Feb, 2025 4 commits
-
-
This PR adds documentation for the LigerKernel option in a new performance tuning section, addressing the comment from volcengine/verl#173. Changes: - Created new performance tuning section in docs - Documented LigerKernel option for SFT - Added performance tuning section to documentation index Related to volcengine/verl#173 --------- Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: HL <linhaibin.eric@gmail.com>
Xingyao Wang committed -
runnning -> running
Ikko Eltociear Ashimine committed -
HL committed
-
HL committed
-
- 02 Feb, 2025 1 commit
-
-
Chujie Zheng committed
-
- 01 Feb, 2025 2 commits
-
-
since 'lighteval/MATH' is no longer available on huggingface.
HL committed -
- As titled
Guangming Sheng committed
-
- 31 Jan, 2025 4 commits
-
-
HL committed
-
Xingyao Wang committed
-
Chujie Zheng committed
-
 --------- Co-authored-by: HL <linhaibin.eric@gmail.com>
dignfei committed
-
- 30 Jan, 2025 8 commits
-
-
HL committed
-
This is a follow-up to https://github.com/volcengine/verl/issues/151 ## Motivation Currently, in order to add a custom score function you need to fork verl and update the `_select_rm_score_fn` to define your logic. This makes it harder to use verl as part of a larger application while staying up to date with upstream improvements in verl. It would be convenient to allow end users to directly pass in a reward function they wish to use, without requiring them to clone/fork verl to do so. ## Design In this PR I slightly modify `main_ppo.py` to allow users to import a new function `run_ppo`. `run_ppo` behaves very similarly to the existing `main`, with the important addition of a new `compute_score` argument. This argument, if passed in, is used to compute the score of every generation. This is the change that allows The `compute_score` function is similar in shape to the existing `compute_score` on gsm8k and math. However, I have added a new `data_source` parameter so that the user can compute the score differently if desired depending on the task shape. ## Example Usage This is a sample script showing how you can use the new functionality. I have tested that this works. ```python from verl.trainer.main_ppo import run_ppo from omegaconf import OmegaConf def custom_compute_score(data_source, solution_str, ground_truth): """Dummy compute_score function that reward the model for generations of exactly 20 characters :) """ return abs(len(solution_str) - 20) config = OmegaConf.load("vendor/verl/verl/trainer/config/ppo_trainer.yaml") # Update config as needed config.data.train_files = "path/to/train.parquet" config.data.val_files = "path/to/test.parquet" # ... run_ppo(config, custom_compute_score) ``` ## Breaking changes There are no breaking changes in this PR. It is still possible to call `python -m verl.trainer.main_ppo ...` as before (although if you want to pass in a custom compute_score you will need to use the new method described above). ## Possible future work It would be great to move to [structured configs](https://omegaconf.readthedocs.io/en/2.1_branch/structured_config.html) as well since they'd allow us to have typesafe, autocompletable configurations from Python. I thought about adding those changes here as well but they would be much more extensive and I'm not sure whether there's interest from the project.
Kyle Corbitt committed -
Franz Srambical committed
-
Franz Srambical committed
-
## Summary This PR enables to use Liger Kernel's `_apply_liger_kernel_to_instance` to init a fsdp worker model. ## Main Changes 1. Adding an option of using `liger_kernel.transformers.AutoLigerKernelForCausalLM` to load a model from pretained, instead of the default `transformers.AutoModelForCausalLM` 2. Added a test case using configuration file `tests/e2e/run_qwen_gsm8k_model_rm_liger_kernel.sh` ## Related Issue #96 ## TODO #97 optimize the memory usage when computing entropy & log_probs https://github.com/volcengine/verl/blob/6d96fda3d47f057caaa8f494ca7804181903e911/verl/workers/actor/dp_actor.py#L94-L106 --------- Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
Hongpeng Guo committed -
The logits is of shape `(bsz, response_length, vocab_size)`. This PR doesn't change any code execution, but explicitly show the logits shape and easier for readers to understand the code. Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
Hongpeng Guo committed -
Add contribution guide
Chi Zhang committed -
Chi Zhang committed
-
- 29 Jan, 2025 3 commits
-
-
`token_level_rewards == (token_level_rewards * non_zero_mask)`
Franz Srambical committed -
HL committed
-
HL committed
-
- 28 Jan, 2025 1 commit
-
-
- As titled - Solved: #149 Waiting for testing from @chujiezheng --------- Co-authored-by: Chi Zhang <zhangchi.usc1992@bytedance.com>
Guangming Sheng committed
-
- 27 Jan, 2025 2 commits
-
-
Guangming Sheng committed
-
HL committed
-